Many cancer diagnostic tests involve the classification of a patient by a medical expert using an ordered categorical scale. Such tests involve elements of subjectivity and estimation on the part of the expert due to the necessity to interpret imperfect diagnostic test results, leading to discrepancies between experts' classifications, often severely so, even in common diagnostic procedures such as mammography and in the classification of breast density, an important predictor of breast cancer. This has motivated many large-scale studies to be conducted to examine levels of agreement between experts in common diagnostic settings and to investigate if factors such as rater experience affect the consistency of ratings made by different experts. However, limited statistical methods currently exist to assess agreement in large-scale studies such as these. Our overall goals are two-fold: (1) to develop novel and flexible statistical methods and agreement measures for assessing reliability in large-scale studies involving two or more medical experts when using one or more diagnostic tests with ordered categorical scales, and (2) to use these methods to assess reliability in recently conducted large-scale breast cancer and breast density studies and to examine the impact of factors such as rater experience and the patient's prior history that can play important roles in reliability in these population- based settings. Due to widespread use of screening mammography in the community, conclusions drawn from our analyses of large-scale agreement studies in diagnostic testing will have significant and far-reaching implications for breast cancer screening and diagnosis in the community. The proposed methods in our application provide a novel and comprehensive approach to examine agreement in large-scale studies and focus on assessing and comparing agreement between experts when they classify subjects according to ordered categorical classification scales in diagnostic tests. Methods developed will be made freely available and easily implemented using standard statistical software. Our analyses of large-scale cancer agreement studies using our proposed methods will provide new insights into the screening interpretative performance of radiologists.

Public Health Relevance

Due to widespread use of screening mammography in the community, development of our methods to study agreement and conclusions drawn from our analyses of large-scale agreement studies in diagnostic testing will have a significant and far-reaching impact on public health in breast cancer screening and diagnosis.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Epidemiology of Cancer Study Section (EPIC)
Program Officer
Lewis, Denise
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Boston University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Lin, Xiaoyan; Chen, Hua; Edwards, Don et al. (2018) Modeling rater diagnostic skills in binary classification processes. Stat Med 37:557-571
Nelson, Kerrie P; Edwards, Don (2018) A measure of association for ordered categorical data in population-based studies. Stat Methods Med Res 27:812-831
Mitani, Aya A; Freer, Phoebe E; Nelson, Kerrie P (2017) Summary measures of agreement and association between many raters' ordinal classifications. Ann Epidemiol 27:677-685.e4
Nelson, Kerrie P; Mitani, Aya A; Edwards, Don (2017) Assessing the influence of rater and subject characteristics on measures of agreement for ordinal ratings. Stat Med 36:3181-3199
Nelson, Kerrie P; Edwards, Don (2015) Measures of agreement between many raters for ordinal classifications. Stat Med 34:3116-32