This project develops a general methodology for generating and evaluating confidence intervals/bands for Receiver Operating Characteristic (ROC) curves, a common tool for comparing classification models. Usually two or more ROC curves are compared in one of three ways: by simple visual inspection without confidence assessments, by focusing on one particular point of the ROC curve and generating confidence intervals around that point, or by comparing the areas under the curves. Little work has studied the soundness of ROC confidence intervals, or their use for comparing entire curves.
This project consists of six main activities: (1) surveying existing techniques for generating confidence intervals/bands for ROC curves, (2) creating new evaluation metrics that are more appropriate for confidence bands, (3) developing a general framework for generating and evaluating intervals/bands, (4) developing new techniques for creating and optimizing bands, (5) creating a suite of benchmark problems, and (6) evaluating the intervals/bands through large-scale empirical studies to analyze and to characterize their expected containment.
The work will lead to a new suite of benchmarks for these types of problems, as well as a general framework for evaluating confidence bands for ROC curves, enabling future researchers to perform similar studies in a well-understood setting. More importantly, the knowledge and techniques that this project produces will affect the many fields, such as medicine, that regularly use ROC analysis. Finally, this project will produce open-source toolkits, so that ROC studies will become much easier.