Journal of Applied Statistics | 2021

Methods of assessing categorical agreement between correlated screening tests in clinical studies

 
 
 

Abstract


ABSTRACT Advances in breast imaging and other screening tests have prompted studies to evaluate and compare the consistency between experts ratings of existing with new screening tests. In clinical settings, medical experts make subjective assessments of screening test results such as mammograms. Consistency between experts ratings is evaluated by measures of inter-rater agreement or association. However, conventional measures, such as Cohen s and Fleiss kappas, are unable to be applied or may perform poorly when studies consist of many experts, unbalanced data, or dependencies between experts ratings exist. Here we assess the performance of existing approaches including recently developed summary measures for assessing the agreement between experts binary and ordinal ratings when patients undergo two screening procedures. Methods to assess consistency between repeated measurements by the same experts are also described. We present applications to three large-scale clinical screening studies. Properties of these agreement measures are illustrated via simulation studies. Generally, a model-based approach provides several advantages over alternative methods including the ability to flexibly incorporate various measurement scales (i.e. binary or ordinal), large numbers of experts and patients, sparse data, and robustness to prevalence of underlying disease.

Volume 48
Pages 1861 - 1881
DOI 10.1080/02664763.2020.1777394
Language English
Journal Journal of Applied Statistics

Full Text