Continuing technological advancements allow researchers and clinicians to measure an increasingly vast diversity of physiological, molecular and genetic markers, rapidly increasing our understanding of disease processes. The wide range of newly available markers holds great potential for the personalization of medical care through accurate prediction of clinical outcomes. Traditional statistical methods for using a patient's marker values to make personalized predictions are derived under the strong assumption that the true model relating markers to clinical responses can be identified, at least with large enough samples. In practice, however, it is difficult if not impossible even to identify a class of models containing the truth. To make meaningful individual patient's prediction, it is therefore important to modify the standard methods and develop new statistical procedures for model estimation and evaluation when the model may not be correctly specified. When possible we will modify standard methods, but we address some fundamental issues in statistic inference that require the development of new procedures. Specifically, we seek to develop procedures for predicting future observations and for evaluating and comparing prediction rules. The key contribution of the proposed procedures will be the production of valid inferences even when the fitted models are incorrect. We will focus on the following three aims.
In Aim 1 we will develop robust procedures for evaluating and comparing the accuracy of prediction rules constructed under various working models for continuous, binary and censored event time outcomes.
In Aim 2 we will develop procedures that generate optimal robust prediction intervals for future observations without assuming that the model is correct. Implementing such inference procedures often requires approximating the sampling distribution of estimated model parameters and accuracy measures. This can be rather challenging in certain settings when the model is not assumed to contain the truth.
In Aim 3, we will develop numerically efficient resampling methods to facilitate inference under possibly incorrect working models. The proposed methodological research will be guided by a wide variety of real datasets from the Multi-Ethnic Study of Atherosclerosis and cancer clinical trials sponsored by the Eastern Cooperative Oncology Group, to which we have access.
Our aims will require, in most cases, the development of large-sample distribution theory, simulation studies of small-sample behavior and applications to real data. The developed methods will use existing statistical software packages whenever possible and be fully implemented otherwise.

Public Health Relevance

In medical research, it is often of interest to explore the effect of various factors, such as patient characteristics or environmental exposures, on clinical outcomes. For example, an important step in discovering new diagnostic biomarkers is to quantify the ability of the biomarkers to predict the disease risk. It is therefore important to develop statistical procedures for constructing and evaluating empirical prediction models for clinical outcomes. The project will be a systematic research on the consequence of the model mis-specification and how to make clinical decisions based on empirical statistical models. The proposed methodological research will be guided by a wide variety of real problems of clinical interest.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Wolz, Michael
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Tian, Lu; Fu, Haoda; Ruberg, Stephen J et al. (2017) Efficiency of two sample tests via the restricted mean survival time for analyzing event time observations. Biometrics :
Zheng, Yu; Cai, Tianxi (2017) Augmented estimation for t-year survival with censored regression models. Biometrics 73:1169-1178
Chen, Shuai; Tian, Lu; Cai, Tianxi et al. (2017) A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics 73:1199-1209
Zhou, Qian M; Dai, Wei; Zheng, Yingye et al. (2017) Robust Dynamic Risk Prediction with Longitudinal Studies. Stat Theory Relat Fields 1:159-170
Payne, Rebecca; Yang, Ming; Zheng, Yingye et al. (2016) Robust risk prediction with biomarkers under two-phase stratified cohort design. Biometrics 72:1037-1045
Payne, Rebecca; Neykov, Matey; Jensen, Majken Karoline et al. (2016) Kernel machine testing for risk prediction with stratified case cohort studies. Biometrics 72:372-81
Li, Junlong; Zhao, Lihui; Tian, Lu et al. (2016) A predictive enrichment procedure to identify potential responders to a new therapy for randomized, comparative controlled clinical studies. Biometrics 72:877-87
Zhao, Lihui; Claggett, Brian; Tian, Lu et al. (2016) On the restricted mean survival time curve in survival analysis. Biometrics 72:215-21
Shen, Yuanyuan; Cai, Tianxi (2016) Identifying predictive markers for personalized treatment selection. Biometrics 72:1017-1025
Claggett, Brian; Tian, Lu; Castagno, Davide et al. (2015) Treatment selections using risk-benefit profiles based on data from comparative randomized clinical trials with multiple endpoints. Biostatistics 16:60-72

Showing the most recent 10 out of 38 publications