Repeatability and Reproducibility

If a measurement is repeated, it is likely the measurement value is different each time. If the variation is small, the repeatability is high. Obviously, there is a difference if the measurement is repeated by the same person or another person. Intra observer variation is the variation that occurs when the same persons repeats the measurements. Whilst Inter observer variation is the variation that occurs when a different person repeats the measurements. We would normally expect the inter-observer variation to be larger than the intra-observer variation.

The American Society for Testing and Materials (ASTM) has defined repeatability and reproducibility 1.

Repeatability

Repeatability is precision determined from multiple tests done under repeatability conditions: the test is conducted by the same operator, using the same equipment and laboratory within a short period of time so that neither the equipment nor the environment is likely to change significantly.

Reproducibility

Reproducibility is precision determined from multiple tests done under reproducibility conditions: the test is conducted in different laboratories with different operators and different environmental conditions.

When an accepted reference value is known, the bias can be expressed.

The precision limits of the repeatability and reproducibility can be calculated from the sample standard deviation of the test results. The number of tests should be at least 30, so that the sample standard deviation is a reasonable estimate of the population standard deviation. The repeatability precision limit (r) and the reproducibility precision limit (R) are useful for comparing test results within and between laboratories. They are calculated by multiplying the repeatability standard deviation (sr) or the reproducibility standard deviation (sR) by 2.8 respectively. The factor 2.8 is derived from 1.96 (95% of the population is within 1.96 standard deviations of the mean) times the square root of 2.

For example, consider the measurements of femoral heads in the heads.rda file. The data frame is called heads and there are five variables: number, the accepted reference value (measurement of the femoral head using callipers at the time of excision at total hip replacement) and three radiological measurements performed before the surgery. Measurements m1 and m2 were done under repeatability conditions and measurement m3 was done under reproducibility conditions.

Load the data file in JGR / R and check the data frame:

heads
   number reference m1 m2 m3
1       1        52 54 55 53
2       2        50 50 49 56
3       3        50 51 47 52
4       4        52 53 53 53
5       5        52 50 51 52
6       6        48 49 48 51
7       7        55 53 56 56
8       8        55 52 55 59
9       9        53 54 54 53
10     10        48 47 49 52
11     11        50 51 48 54
12     12        48 47 47 53
13     13        50 49 51 50
14     14        49 50 49 55
15     15        52 51 51 52
16     16        51 53 52 50
17     17        51 50 49 53
18     18        51 52 51 56
19     19        50 51 52 53
20     20        54 56 55 56
21     21        49 48 48 53
22     22        50 51 51 55
23     23        53 52 52 55
24     24        48 48 50 52
25     25        52 52 53 55
26     26        51 51 51 56
27     27        55 53 55 59
28     28        50 53 49 52
29     29        54 52 55 57
30     30        50 52 49 59

As there is an accepted reference value (the calliper measurements), first calculate the biases of the measurements:

bias1<-heads$m1-heads$reference
bias2<-heads$m2-heads$reference
bias3<-heads$m3-heads$reference

Are the biases Normally distributed? This can be checked with the Shapiro-Wilks test for Normality:

shapiro.test(bias1)

    Shapiro-Wilk normality test

data:  bias1
W = 0.9422, p-value = 0.1041

shapiro.test(bias2)

    Shapiro-Wilk normality test

data:  bias2
W = 0.9421, p-value = 0.1039

shapiro.test(bias3)

    Shapiro-Wilk normality test

data:  bias3
W = 0.9612, p-value = 0.3315

All three tests are non significant. It can therefore be concluded that it is reasonable to assume a Normal distribution as a model for the data.

Next, check if there is bias by performing a t-test (one sample two sided):

t.test(bias1)

    One Sample t-test

data:  bias1
t = 0.2423, df = 29, p-value = 0.8103
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.4960831  0.6294164
sample estimates:
 mean of x
0.06666667

t.test(bias2)

    One Sample t-test

data:  bias2
t = 0.273, df = 29, p-value = 0.7868
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.4327081  0.5660415
sample estimates:
 mean of x
0.06666667

t.test(bias3)

    One Sample t-test

data:  bias3
t = 7.2677, df = 29, p-value = 5.285e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 2.131801 3.801532
sample estimates:
mean of x
 2.966667

The biases of m1 (bias1) and m2 (bias2), performed under repeatability conditions are not significantly different from zero (p=0.81 and p=0.79 respectively). So the repeatability measurements are unbiased. However, the bias in the measurements performed under reproducibility conditions are significantly different from zero (p=0.00000005). So, the reproducibility measurements are biased by:

mean(bias3)
[1] 2.966667

or 3 mm.

To express the repeatability according to the ASTM standard 1:

Subtract measurement m2 from m1 and call this ‘rep':

rep<-heads$m1-heads$m2
rep
 [1] -1  1  4  0 -1  1 -3 -3  0 -2  3  0 -2  1  0  1  1  1 -1  1  0  0  0 -2 -1  0 -2  4 -3  3

So, the repeatability standard deviation (sr) is:

sr<-sd(rep)
sr
[1] 1.893728

and the repeatability (r) is:

repeatability<-qnorm(0.975)*sqrt(2)*sr
repeatability
[1] 5.249051

or 5.2 mm.

To express the reproducibility according the the ASTM standard 1:

Subtract measurement m3 from m1 and call this ‘repro':

repro<-heads$m1-heads$m3
repro
 [1]  1 -6 -1  0 -2 -2 -3 -7  1 -5 -3 -6 -1 -5 -1  3 -3 -4 -2  0 -5 -4 -3 -4 -3 -5 -6  1 -5 -7

So, the reproducibility standard deviation (sR) is:

sR<-sd(repro)
sR
[1] 2.61758

and the reproducibility R is:

reproducibility<-qnorm(0.975)*sqrt(2)*sR
reproducibility
[1] 7.255428

or 7.3 mm.

1.
ASTM: standard practice for use of the terms precision and bias in ASTM test methods. In: E177 ed: Subcommittee E1120 on test method evaluation and quality control. ASTM International; 2013.