Retrospective observer performance studies have become the most commonly used approach for generating inferences as to the expected performance in the clinical practice, since prospective assessments in which the observer becomes an integral part of the diagnostic system are expensive, difficult to perform, time consuming, and are generally performed after a new technology and / or a practice has been approved and been in practice for a while by a large number of users. The general belief is that findings from retrospective studies can be used to infer, at least on a relative scale, what should occur in actual clinical practice. Unfortunately, previous studies have shown that observers behave significantly different in retrospective studies than during interpretations of actual clinial examinations that affect patient care. One of the important issues related to these inferences is the training of observers. While there are indications that, in the laboratory, disease prevalence has little, if any, effect on performance, the effect of training and, as important, the possible effect of disease prevalence in the training set has not been investigated. This may be of utmost importance, as not only the overall performance but also the operating point, when implementing new technologies or practices is very important for clinical applications. If a new system or practice is indeed better (i.e., performs along a higher curve), the tradeoff between actually increasing sensitivity and / or specificity (or a combination of both) may largely depend on how observers are specifically trained prior to a study (i.e., a particular emphasis on sensitivity or specificity during training). The primary hypothesis to be tested in this study is tat disease prevalence in training sets could, and likely would, affect actual operating points of observers, but observers will largely operate along a performance curve (ROC or FROC type) that is determined by (inherent to) the imaging system or practice rather than the specific training set. Therefore, a clinical practice could also be significantly affected by a specific emphasis during training. We propose to investigate this issue by performing a unique observer study in which substantially different disease prevalence levels will be presented in different training sets while being tested when reading the same case set. If our primary hypothesis is proved, then, by providing training with a specific emphasis, specific training could be used to optimize the intended the clinical practice by emphasizing the desired parameter (i.e., sensitivity, specificity or a combination of both). Thus, the proposed investigation will be extremely important for acceptance of observer studies in the assessment and approval process of new technologies and practices. An actual example that resulted in the delay of approving a new technology by more than two years because of a clinically undesirable shift in operating points demonstrates directly the importance of this study.
Training is an extremely important aspect of observer performance studies that are frequently used in performance assessments and /or approval processes of new technologies and clinical practices in medicine in general and medical imaging in particular. However, the type of training sets and specific emphasis during training could affect observers' decisions in a manner that results in 'clinically undesirable' shifts in th operating points of observers, even when a new or improved technology or practice is actually significantly better than another. We propose to test the hypothesis that disease prevalence in the training set could indeed significantly affect the decision operating points of observers, whil the underlying performance curve along which they operate is primarily determined by the imaging system being evaluated and not by the disease prevalence in the training set. In an unlikely case of obtaining results contradictory to our hypothesis, we will characterize the type and magnitude of the effects of prevalence-in-training on diagnostic performance thereby helping investigator design more clinically relevant studies. However, if our expectations are confirmed training emphasis could be used to optimize the desired performance parameter for the clinical practice in question without worrying about changing the performance curve.