Kappa Test

The kappa test is used to measure agreement between two or more observers for categorical items. A kappa of 1 indicates perfect agreement and a kappa of zero agreement less than chance. Substantial agreement requires a kappa between 0.6 and 0.8 and almost perfect agreement a kappa higher than 0.8¹.

The xray.rda dataset is used to show how to perform a kappa test in R. The data frame is called xray and contains two variables: Observer1 and Observer2 with categorical data regarding the interpretation of radiographs (OA or RA).

To show the data frame:

xray
   Observer1 Observer2
1         RA        RA
2         OA        OA
3         OA        OA
4         RA        OA
5         OA        OA
6         RA        RA
7         OA        OA
8         OA        OA
9         OA        OA
10        OA        RA
11        OA        OA
12        OA        OA
13        OA        OA
14        OA        OA
15        OA        OA
16        OA        OA
17        RA        RA
18        RA        RA
19        OA        OA
20        OA        OA
str(xray)
'data.frame':	20 obs. of  2 variables:
 $ Observer1: Factor w/ 2 levels "OA","RA": 2 1 1 2 1 2 1 1 1 1 ...
 $ Observer2: Factor w/ 2 levels "OA","RA": 2 1 1 1 1 2 1 1 1 2 ...

It can be seen that most of the time the observers agree with each other. The irr package² should be installed and loaded. To perform a kappa test:

library(irr)
kappa2(xray)
 Cohen's Kappa for 2 Raters (Weights: unweighted)

 Subjects = 20 
   Raters = 2 
    Kappa = 0.733 

        z = 3.28 
  p-value = 0.00104

The kappa is 0.733 confirming substantial agreement between the two observers. The p-value to test the null hypothesis of no association is significant.

References

↑ (en) Viera and Garrett, « Understanding interobserver agreement: the kappa statistic », 2005-05-01

↑ Gamer et al., « irr: Various Coefficients of Interrater Reliability and Agreement », 2025-08-21