Just as there are many markers of oxidative stress, the rapid growth of biotechnology means that researchers increasingly must consider which screening or diagnostic test to use in their research. My work with ROC curves is aimed at providing evidence-based approaches for making these choices. The ROC curve simultaneously plots the proportion of both abnormal and normal subjects correctly diagnosed at various test cutoff points. This graphical display facilitates the selection of an optimal threshold and enables easy comparison of the abilities of different tests. Increasingly, ROC curves are used in population based settings as opposed to settings where individuals have been pre-screened to some degree. However, ROC curve methods were not developed to account for common problems such as missing data, measurement error, linear combinations, confounding, referral bias, LODs, and other challenges. We have proposed estimators of the mean of a K-sample U-statistic (of which the area under the ROC curve (AUC) is a special case) when data on the outcomes of interest are missing in some sampled units and auxiliary variables are available in the entire sample. The proposed estimators exploit the information available in the auxiliaries without requiring assumptions about the joint distribution of the auxiliaries and outcomes. The properties of the proposed estimators are derived from general results on efficient semi-parametric estimation of the mean of a K-sample U-statistic with missing at random outcomes, observed auxiliary variables and known missingness probabilities. Random measurement error can attenuate a biomarkers ability to discriminate between diseased and non-diseased populations. We present an approach for estimating the Youden index, the AUC and its associated optimal cut-point for a normally distributed biomarker that corrects for normally distributed random measurement error. We also developed confidence intervals for these corrected estimates using the delta method and coverage probability through simulation of a variety of situations. Applying these techniques to the biomarker thiobarbituric acid reaction substance (TBARS), a measure of oxidative stress that has been proposed as a discriminating measurement for infertility, yields a 50% increase in diagnostic effectiveness at the optimal cut-point. This result may lead to biomarkers that were once naively considered ineffective becoming useful diagnostic devices. Since multiple markers are often available, we considered combining them to improve diagnostic accuracy. The linear combinations derived by Su and Liu (1993) that maximize the AUC may have unsatisfactorily low sensitivity over a certain range of desired specificity. We considered maximization of sensitivity over a range of specificity, and presented alternative linear combinations that have higher sensitivity over a range of high (or low) specificity. Additionally, we evaluated covariate effects on this linear combination assuming that the multiple markers or a transformation thereof, follow a multivariate normal distribution. We estimated the ROC curve of this linear combination of markers adjusted for covariates and approximate confidence intervals for the corresponding AUC. Another frequently encountered problem in studies that evaluate new diagnostic tests is that not all patients undergo disease verification due to the expense and/or invasiveness of the test. In fact, the decision to subject patients to verification testing often depends on the results of the new test and other predictors of disease status. For diagnostic tests where AUC estimation is based only on patients with verified disease status, the usual estimators are biased. We developed estimators that adjust for this bias. When information on disease status is missing, it is necessary either to model the missing data or the process leading to the missingness to obtain well-behavedestimators of the AUC. We have described a doubly robust estimator that is unbiased when the model for disease or the missingness is correct. This estimator does not require EM-type iterations and is easy to compute using standard software. It can accommodate both discrete and continuous markers and allows for the possibility that selection to verification is non-ignorable. In addition, the doubly robust estimator offers more protection against model misspecification than other currently available methods. We have applied the methods described above to show that TBARS, has discriminating abilities above and beyond chance. This work has yielded 23 publications in peer reviewed journals including Biometrika and the Journal of the American Statistical Association.

Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Zip Code
Schisterman, Enrique F; Swanson, Chandra W; Lu, Ya-Ling et al. (2017) The Changing Face of Epidemiology: Gender Disparities in Citations? Epidemiology 28:159-168
Joseph, K S; Lisonkova, Sarka; Muraca, Giulia M et al. (2017) Factors Underlying the Temporal Increase in Maternal Mortality in the United States. Obstet Gynecol 129:91-100
Ananth, Cande V; Schisterman, Enrique F (2017) Confounding, causality, and confusion: the role of intermediate variables in interpreting observational studies in obstetrics. Am J Obstet Gynecol 217:167-175
Schisterman, Enrique F; Perkins, Neil J; Mumford, Sunni L et al. (2017) Collinearity and Causal Diagrams: A Lesson on the Importance of Model Specification. Epidemiology 28:47-53
Mitchell, Emily M; Plowden, Torie C; Schisterman, Enrique F (2016) Estimating relative risk of a log-transformed exposure measured in pools. Stat Med 35:5477-5494
Schisterman, Enrique F; Sjaarda, Lindsey A (2016) No Right Answers without Knowing Your Question. Paediatr Perinat Epidemiol 30:20-2
Danaher, Michelle R; Albert, Paul S; Roy, Aninyda et al. (2016) Estimation of interaction effects using pooled biospecimens in a case-control study. Stat Med 35:1502-13
Hinkle, Stefanie N; Mitchell, Emily M; Grantz, Katherine L et al. (2016) Maternal Weight Gain During Pregnancy: Comparing Methods to Address Bias Due to Length of Gestation in Epidemiological Studies. Paediatr Perinat Epidemiol 30:294-304
Lyles, Robert H; Mitchell, Emily M; Weinberg, Clarice R et al. (2016) An efficient design strategy for logistic regression using outcome- and covariate-dependent pooling of biospecimens prior to assay. Biometrics 72:965-75
Mitchell, Emily M; Hinkle, Stefanie N; Schisterman, Enrique F (2016) It's About Time: A Survival Approach to Gestational Weight Gain and Preterm Delivery. Epidemiology 27:182-7

Showing the most recent 10 out of 86 publications