If a measurement is repeated, it is likely the measurement value is different each time. If the variation is small, the repeatability is high. Obviously, there is a difference if the measurement is repeated by the same person or another person. ** Intra observer variation** is the variation that occurs when the same persons repeats the measurements. Whilst

**is the variation that occurs when a different person repeats the measurements. We would normally expect the inter-observer variation to be larger than the intra-observer variation.**

*Inter observer variation*The American Society for Testing and Materials (ASTM) has defined repeatability and reproducibility 1.

**Repeatability**

Repeatability is precision determined from multiple tests done under repeatability conditions: the test is conducted by the same operator, using the same equipment and laboratory within a short period of time so that neither the equipment nor the environment is likely to change significantly.

**Reproducibility**

Reproducibility is precision determined from multiple tests done under reproducibility conditions: the test is conducted in different laboratories with different operators and different environmental conditions.

When an accepted reference value is known, the bias can be expressed.

The **precision limits **of the repeatability and reproducibility can be calculated from the sample standard deviation of the test results. The number of tests should be at least 30, so that the sample standard deviation is a reasonable estimate of the population standard deviation. The repeatability precision limit (r) and the reproducibility precision limit (R) are useful for comparing test results within and between laboratories. They are calculated by multiplying the repeatability standard deviation (sr) or the reproducibility standard deviation (sR) by 2.8 respectively. The factor 2.8 is derived from 1.96 (95% of the population is within 1.96 standard deviations of the mean) times the square root of 2.

For example, consider the measurements of femoral heads in the heads.rda file. The data frame is called heads and there are five variables: number, the accepted reference value (measurement of the femoral head using callipers at the time of excision at total hip replacement) and three radiological measurements performed before the surgery. Measurements m1 and m2 were done under repeatability conditions and measurement m3 was done under reproducibility conditions.

Load the data file in JGR / R and check the data frame:

*heads*

* number reference m1 m2 m3*

*1 1 52 54 55 53*

*2 2 50 50 49 56*

*3 3 50 51 47 52*

*4 4 52 53 53 53*

*5 5 52 50 51 52*

*6 6 48 49 48 51*

*7 7 55 53 56 56*

*8 8 55 52 55 59*

*9 9 53 54 54 53*

*10 10 48 47 49 52*

*11 11 50 51 48 54*

*12 12 48 47 47 53*

*13 13 50 49 51 50*

*14 14 49 50 49 55*

*15 15 52 51 51 52*

*16 16 51 53 52 50*

*17 17 51 50 49 53*

*18 18 51 52 51 56*

*19 19 50 51 52 53*

*20 20 54 56 55 56*

*21 21 49 48 48 53*

*22 22 50 51 51 55*

*23 23 53 52 52 55*

*24 24 48 48 50 52*

*25 25 52 52 53 55*

*26 26 51 51 51 56*

*27 27 55 53 55 59*

*28 28 50 53 49 52*

*29 29 54 52 55 57*

*30 30 50 52 49 59*

As there is an accepted reference value (the calliper measurements), first calculate the biases of the measurements:

*bias1<-heads$m1-heads$reference*

*bias2<-heads$m2-heads$reference*

*bias3<-heads$m3-heads$reference*

Are the biases Normally distributed? This can be checked with the Shapiro-Wilks test for Normality:

*shapiro.test(bias1)*

* Shapiro-Wilk normality test*

*data: bias1*

*W = 0.9422, p-value = 0.1041*

*shapiro.test(bias2)*

* Shapiro-Wilk normality test*

*data: bias2*

*W = 0.9421, p-value = 0.1039*

*shapiro.test(bias3)*

* Shapiro-Wilk normality test*

*data: bias3*

*W = 0.9612, p-value = 0.3315*

All three tests are non significant. It can therefore be concluded that it is reasonable to assume a Normal distribution as a model for the data.

Next, check if there is bias by performing a t-test (one sample two sided):

*t.test(bias1)*

* One Sample t-test*

*data: bias1*

*t = 0.2423, df = 29, p-value = 0.8103*

*alternative hypothesis: true mean is not equal to 0*

*95 percent confidence interval:*

*-0.4960831 0.6294164*

*sample estimates:*

*mean of x*

*0.06666667*

*t.test(bias2)*

* One Sample t-test*

*data: bias2*

*t = 0.273, df = 29, p-value = 0.7868*

*alternative hypothesis: true mean is not equal to 0*

*95 percent confidence interval:*

*-0.4327081 0.5660415*

*sample estimates:*

*mean of x*

*0.06666667*

*t.test(bias3)*

* One Sample t-test*

*data: bias3*

*t = 7.2677, df = 29, p-value = 5.285e-08*

*alternative hypothesis: true mean is not equal to 0*

*95 percent confidence interval:*

*2.131801 3.801532*

*sample estimates:*

*mean of x*

*2.966667*

The biases of m1 (bias1) and m2 (bias2), performed under repeatability conditions are not significantly different from zero (p=0.81 and p=0.79 respectively). So the repeatability measurements are unbiased. However, the bias in the measurements performed under reproducibility conditions are significantly different from zero (p=0.00000005). So, the reproducibility measurements are biased by:

*mean(bias3)*

*[1] 2.966667*

or 3 mm.

To express the **repeatability** according to the ASTM standard 1:

Subtract measurement m2 from m1 and call this ‘rep':

*rep<-heads$m1-heads$m2*

*rep*

* [1] -1 1 4 0 -1 1 -3 -3 0 -2 3 0 -2 1 0 1 1 1 -1 1 0 0 0 -2 -1 0 -2 4 -3 3*

So, the repeatability standard deviation (sr) is:

*sr<-sd(rep)*

*sr*

*[1] 1.893728*

and the repeatability (r) is:

*repeatability<-qnorm(0.975)*sqrt(2)*sr*

*repeatability*

*[1] 5.249051*

or 5.2 mm.

To express the **reproducibility** according the the ASTM standard 1:

Subtract measurement m3 from m1 and call this ‘repro':

*repro<-heads$m1-heads$m3*

*repro*

* [1] 1 -6 -1 0 -2 -2 -3 -7 1 -5 -3 -6 -1 -5 -1 3 -3 -4 -2 0 -5 -4 -3 -4 -3 -5 -6 1 -5 -7*

So, the reproducibility standard deviation (sR) is:

*sR<-sd(repro)*

*sR*

*[1] 2.61758*

and the reproducibility R is:

*reproducibility<-qnorm(0.975)*sqrt(2)*sR
reproducibility
[1] 7.255428*

or 7.3 mm.

```
```

```
1. ASTM: standard practice for use of the terms precision and bias in ASTM test methods. In: E177 ed: Subcommittee E1120 on test method evaluation and quality control. ASTM International; 2013.
```

```
```