The kappa test is used to measure agreement between two or more observers for categorical items. A kappa of 1 indicates perfect agreement and a kappa of zero agreement less than chance. Substantial agreement requires a kappa between 0.6 and 0.8 and almost perfect agreement a kappa higher than 0.8 1.
The xray.rda dataset is used to show how to perform a kappa test in JGR / R. The data frame is called xray and contains two variables: Observer1 and Observer2 with categorical data regarding the interpretation of radiographs (OA or RA).
To show the data frame:
xray
Observer1 Observer2
1 RA RA
2 OA OA
3 OA OA
4 RA OA
5 OA OA
6 RA RA
7 OA OA
8 OA OA
9 OA OA
10 OA RA
11 OA OA
12 OA OA
13 OA OA
14 OA OA
15 OA OA
16 OA OA
17 RA RA
18 RA RA
19 OA OA
20 OA OA
It can be seen that most of the time the observers agree with each other. To perform a kappa test is straight forward:
kappa2(xray)
Cohen’s Kappa for 2 Raters (Weights: unweighted)
Subjects = 20
Raters = 2
Kappa = 0.733
z = 3.28
p-value = 0.00104
The kappa is 0.733 confirming substantial agreement between the two observers. The p-value to test the null hypothesis of no association and is highly significant.