Interrater Agreement 3

Supplemental Materials

Interrater Agreement Statistics With Skewed Data:

Evaluation of Alternatives to Cohen’s Kappa

by S. Xu & M. F. Lorber, 2014, Journal of Consulting and Clinical Psychology

http://dx.doi.org/10.1037/a0037489

An anonymous reviewer suggested analyses of the little known rescaled κ statistic (κ¢). Briefly, κ¢ is an adjustment to Cohen’s κ that takes into account “the level of matched agreement contingent on a particular set of marginal distributions” (Karelitz & Budescu, 2013, p. 923). We did not emphasize this statistic in the main article where the focus was on statistics that have been offered as possible alternatives to κ that are less sensitive to behavior base rates; κ¢ is not in this category. It was originally described by Cohen (1960), who viewed it as applicable to only a limited and uncommon set of research settings. Although their rationale is beyond the scope of the present investigation, Karelitz and Budescu have recently offered a counterpoint to Cohen’s argument and emphasized the utility of κ¢.

In exploratory analyses, we evaluated κ¢ using the same methodology described in the main article in reference to the other statistics. The results are summarized in Figure S1. Relative to the criteria we stated in the main article, κ¢ performed poorly. In both the unbiased and the biased rater conditions, it exhibited much sensitivity to base rate. It was also sensitive to rater bias. At a given level of observed agreement and agreed on base rate, κ¢ offered less protection against chance agreement in the biased rater condition. Finally, with 90% interrater agreement and a biased rater, protection against chance agreement approached nil as base rates approached .50.

Figure S1. The performance of κ¢ as a function of behavior or diagnosis agreed on base rate, rater bias, and observed interrater agreement.

References

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. doi:10.1177/001316446002000104

Karelitz, T. M., & Budescu, D. V. (2013). The effect of the raters' marginal distributions on their matched agreement: A rescaling framework for interpreting kappa. Multivariate Behavioral Research, 48, 923–952. doi:10.1080/00273171.2013.830064