Assessment of patients in common diagnostic medical procedures, such as the diagnosis of breast cancer from viewing a mammogram, is often based on the expert opinion of physicians, where strong agreement between experienced raters is suggestive of an accurate diagnostic procedure. Substantial variability is commonly observed between raters in these subjective types of classifications. This concern has prompted researchers to develop statistical methods to assess the reliability of diagnostic procedures. Current methods for assessing inter-rater agreement, including Cohen's kappa, are prone to bias, usually model the raters and items as fixed effects thus not allowing inference about the general process, and do not easily incorporate multiple raters, dichotomous outcomes or unbalanced data. The primary focus of the proposed research is to accurately measure agreement between raters in a flexible and realistic manner, to yield inference about a general underlying medical diagnostic process, and to identify important factors that influence the rating process. This information can consequently be used in the training of physicians and other biomedical professionals to improve their diagnosis skills. Data from a number of studies measuring agreement between qualified physicians in the diagnosis of cancers and other diseases, including mammograms for diagnosing breast cancer, the Gleason grading scale for assessing prostate cancers, will be analyzed using the proposed methodology. The models can incorporate multiple raters and items, and dichotomous outcomes (presence/absence of disease). Interpretation of inter-rater agreement in these studies will be emphasized. An important feature of the proposed research is the development of an overall measure of agreement which is easily interpretable by biomedical professionals and avoids flaws observed in the use of Cohen's kappa statistic. Extensive simulation studies will be carried out to assess the performance of the statistical methods. User-friendly software to fit the proposed models and measure of agreement will be developed and made publicly available. ? ? ?
Nelson, Kerrie P; Edwards, Don (2010) Improving the reliability of diagnostic tests in population-based agreement studies. Stat Med 29:617-26 |