Great progress has been made in the last two decades in our understanding of the use of observer performance studies in the evaluation of diagnostic systems performance, both in absolute and in relative terms. Analytical approaches that account for variability of cases, readers, and modes are rapidly gaining credibility; hence, their use is becoming common not only in scientific investigations, but also in industry demonstrations of system utility as a condition for regulatory approvals. ROC-type methodology for systems evaluations and comparisons has become an extremely versatile tool that can address a wide range of scientific and clinical questions in the laboratory environment. The more important question of interest in all of these studies is not the ability to generalize to cases, readers, abnormalities and modalities under the study/I laboratory conditions, but rather to enable valid inferences on the potential impact of different technologies orI practices on the actual clinical environment. Although intuitive perhaps, to date, there is no conclusivel evidence for the latter. The very limited experimental data we have in this regard suggests the contrary. The primary goal being pursued in this project is to determine and compare the performance levels of observers recommending recall leading to the detection of breast cancers in the clinical environment with their performance in recommending recall and detecting breast cancers in the laboratory. This will be done by ascertaining and verifying performance levels of participants retrospectively from QA and clinical records and by performing a two-mode observer performance study, one simulating the ratings in the clinical environment and the other, an ROC-type study. In one mode we will be simulating the clinical environment (using BI-RADS ratings) and the other will include an ROC-type study (using confidence ratings). Readers will review and interpret both cases that they had previously diagnosed prospectively in the clinic, as well as cases diagnosed by others. The comparison we are attempting to do is at the very core of our ability (or not) to generalize laboratory observer performance data to the general clinical environment in a valid manner.

Agency
National Institute of Health (NIH)
Institute
National Institute of Biomedical Imaging and Bioengineering (NIBIB)
Type
Research Project (R01)
Project #
5R01EB003503-04
Application #
7261927
Study Section
Biomedical Imaging Technology Study Section (BMIT)
Program Officer
Cohen, Zohara
Project Start
2004-09-30
Project End
2010-08-31
Budget Start
2007-09-01
Budget End
2010-08-31
Support Year
4
Fiscal Year
2007
Total Cost
$316,811
Indirect Cost
Name
University of Pittsburgh
Department
Radiation-Diagnostic/Oncology
Type
Schools of Medicine
DUNS #
004514360
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213
Bandos, Andriy I; Rockette, Howard E; Gur, David (2010) Use of likelihood ratios for comparisons of binary diagnostic tests: underlying ROC curves. Med Phys 37:5821-30
Gur, David; Bandos, Andriy I; Rockette, Howard E et al. (2010) Is an ROC-type response truly always better than a binary response in observer performance studies? Acad Radiol 17:639-45
Gur, David (2009) Imaging technology and practice assessments: what next? Acad Radiol 16:638-40
Bandos, Andriy I; Rockette, Howard E; Song, Tao et al. (2009) Area under the free-response ROC curve (FROC) and a related summary index. Biometrics 65:247-56
Gur, David; Bandos, Andriy I; Cohen, Cathy S et al. (2008) The ""laboratory"" effect: comparing radiologists'performance and variability during prospective clinical and laboratory mammography interpretations. Radiology 249:47-53
Gur, David; Bandos, Andriy I; Rockette, Howard E (2008) Comparing areas under receiver operating characteristic curves: potential impact of the ""Last"" experimentally measured operating point. Radiology 247:12-5
Song, Tao; Bandos, Andriy I; Rockette, Howard E et al. (2008) On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 35:1547-58
Gur, David (2008) Imaging technology and practice assessment studies: importance of the baseline or reference performance level. Radiology 247:8-11
Gur, David; Bandos, Andriy I; Klym, Amy H et al. (2008) Agreement of the order of overall performance levels under different reading paradigms. Acad Radiol 15:1567-73
Gur, David; Bandos, Andriy I; King, Jill L et al. (2008) Binary and multi-category ratings in a laboratory observer performance study: a comparison. Med Phys 35:4404-9

Showing the most recent 10 out of 12 publications