Current clinical practice in screening tests involves subjective interpretation of patients' test results such as mammograms by trained experts. Substantial variability is often reported between radiologists' visual classifications of breast images, impacting the accuracy and consistency of common screening tests including mammography. Factors related to patients and raters and the technology itself may impact experts' ratings of breast cancer and density, an important predictor of breast cancer. However, the study of accuracy and consistency between radiologists' ratings in large-scale cancer longitudinal screening studies is challenging due to the ordinal nature of the classifications and many experts each contributing ratings. Newly emerging processes including automated 3-D procedures provide exciting potential for estimating breast density in routine clinical settings. Currently very few statistical approaches and summary measures exist to model the consistency and accuracy between several radiologists' ordinal ratings. Further, few methods can investigate the influence of patient and radiologist characteristics, the use of automated procedures and comparison of the different technologies upon accuracy and consistency. Our goals are to develop new statistical methods based upon generalized linear mixed models and latent variable models to study accuracy and consistency amongst many experts in large-scale screening studies. Our approach can flexibly accommodate many experts' ratings and other factors to examine their influence on consistency and accuracy. We will derive novel model-based summary measures of agreement and accuracy. We will implement our new statistical methods in recent large-scale breast imaging studies. A key strength of our proposed research is to provide medical researchers with a flexible modeling approach and novel summary measures that utilize all the data simultaneously, where conclusions can be drawn about the consistency between typical experts and patients in the populations, greatly increasing efficiency and power. The study of patient and rater characteristics on the levels of consistency and accuracy between raters' classifications will translate to improvements in training radiologists and practice of interpreting mammograms, and ultimately, a more effective breast screening procedure.
Our novel statistical methods will provide valuable insight into improving accuracy and reproducibility of cancer screening tests. We study and compare the use of newly emerging processes for classifying patients' breast images using mammography with current clinical practice of radiologists' visual interpretation. Due to widespread use of screening mammography in the community, conclusions drawn from our analyses of large- scale breast imaging studies will have a significant and far-reaching impact on public health in breast cancer screening and diagnosis.