Great progress has been made in the last two decades in our understanding of the use of observer performance studies in the evaluation of diagnostic systems performance, both in absolute and in relative terms. Analytical approaches that account for variability of cases, readers, and modes are rapidly gaining credibility; hence, their use is becoming common not only in scientific investigations, but also in industry demonstrations of system utility as a condition for regulatory approvals. ROC-type methodology for systems evaluations and comparisons has become an extremely versatile tool that can address a wide range of scientific and clinical questions in the laboratory environment. The more important question of interest in all of these studies is not the ability to generalize to cases, readers, abnormalities and modalities under the study/I laboratory conditions, but rather to enable valid inferences on the potential impact of different technologies orI practices on the actual clinical environment. Although intuitive perhaps, to date, there is no conclusivel evidence for the latter. The very limited experimental data we have in this regard suggests the contrary. The primary goal being pursued in this project is to determine and compare the performance levels of observers recommending recall leading to the detection of breast cancers in the clinical environment with their performance in recommending recall and detecting breast cancers in the laboratory. This will be done by ascertaining and verifying performance levels of participants retrospectively from QA and clinical records and by performing a two-mode observer performance study, one simulating the ratings in the clinical environment and the other, an ROC-type study. In one mode we will be simulating the clinical environment (using BI-RADS ratings) and the other will include an ROC-type study (using confidence ratings). Readers will review and interpret both cases that they had previously diagnosed prospectively in the clinic, as well as cases diagnosed by others. The comparison we are attempting to do is at the very core of our ability (or not) to generalize laboratory observer performance data to the general clinical environment in a valid manner.

National Institute of Health (NIH)
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Imaging Technology Study Section (BMIT)
Program Officer
Peng, Grace
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code
Bandos, Andriy I; Rockette, Howard E; Gur, David (2010) Use of likelihood ratios for comparisons of binary diagnostic tests: underlying ROC curves. Med Phys 37:5821-30
Gur, David; Bandos, Andriy I; Rockette, Howard E et al. (2010) Is an ROC-type response truly always better than a binary response in observer performance studies? Acad Radiol 17:639-45
Gur, David (2009) Imaging technology and practice assessments: what next? Acad Radiol 16:638-40
Bandos, Andriy I; Rockette, Howard E; Song, Tao et al. (2009) Area under the free-response ROC curve (FROC) and a related summary index. Biometrics 65:247-56
Gur, David; Bandos, Andriy I; Klym, Amy H et al. (2008) Agreement of the order of overall performance levels under different reading paradigms. Acad Radiol 15:1567-73
Gur, David; Bandos, Andriy I; King, Jill L et al. (2008) Binary and multi-category ratings in a laboratory observer performance study: a comparison. Med Phys 35:4404-9
Gur, David; Bandos, Andriy I; Cohen, Cathy S et al. (2008) The ""laboratory"" effect: comparing radiologists'performance and variability during prospective clinical and laboratory mammography interpretations. Radiology 249:47-53
Gur, David; Bandos, Andriy I; Rockette, Howard E (2008) Comparing areas under receiver operating characteristic curves: potential impact of the ""Last"" experimentally measured operating point. Radiology 247:12-5
Song, Tao; Bandos, Andriy I; Rockette, Howard E et al. (2008) On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 35:1547-58
Gur, David (2008) Imaging technology and practice assessment studies: importance of the baseline or reference performance level. Radiology 247:8-11

Showing the most recent 10 out of 12 publications