Continuing technological advancements allow researchers and clinicians to measure an increasingly vast diversity of physiological, molecular and genetic markers, rapidly increasing our understanding of disease processes. The wide range of newly available markers holds great potential for the personalization of medical care through accurate prediction of clinical outcomes. Traditional statistical methods for using a patient's marker values to make personalized predictions are derived under the strong assumption that the true model relating markers to clinical responses can be identified, at least with large enough samples. In practice, however, it is difficult if not impossible even to identify a class of models containing the truth. To make meaningful individual patient's prediction, it is therefore important to modify the standard methods and develop new statistical procedures for model estimation and evaluation when the model may not be correctly specified. When possible we will modify standard methods, but we address some fundamental issues in statistic inference that require the development of new procedures. Specifically, we seek to develop procedures for predicting future observations and for evaluating and comparing prediction rules. The key contribution of the proposed procedures will be the production of valid inferences even when the fitted models are incorrect. We will focus on the following three aims.
In Aim 1 we will develop robust procedures for evaluating and comparing the accuracy of prediction rules constructed under various working models for continuous, binary and censored event time outcomes.
In Aim 2 we will develop procedures that generate optimal robust prediction intervals for future observations without assuming that the model is correct. Implementing such inference procedures often requires approximating the sampling distribution of estimated model parameters and accuracy measures. This can be rather challenging in certain settings when the model is not assumed to contain the truth.
In Aim 3, we will develop numerically efficient resampling methods to facilitate inference under possibly incorrect working models. The proposed methodological research will be guided by a wide variety of real datasets from the Multi-Ethnic Study of Atherosclerosis and cancer clinical trials sponsored by the Eastern Cooperative Oncology Group, to which we have access.
Our aims will require, in most cases, the development of large-sample distribution theory, simulation studies of small-sample behavior and applications to real data. The developed methods will use existing statistical software packages whenever possible and be fully implemented otherwise.

Public Health Relevance

In medical research, it is often of interest to explore the effect of various factors, such as patient characteristics or environmental exposures, on clinical outcomes. For example, an important step in discovering new diagnostic biomarkers is to quantify the ability of the biomarkers to predict the disease risk. It is therefore important to develop statistical procedures for constructing and evaluating empirical prediction models for clinical outcomes. The project will be a systematic research on the consequence of the model mis-specification and how to make clinical decisions based on empirical statistical models. The proposed methodological research will be guided by a wide variety of real problems of clinical interest.

Agency
National Institute of Health (NIH)
Institute
National Heart, Lung, and Blood Institute (NHLBI)
Type
Research Project (R01)
Project #
5R01HL089778-03
Application #
7924620
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Wolz, Michael
Project Start
2008-09-01
Project End
2012-07-31
Budget Start
2010-08-01
Budget End
2011-07-31
Support Year
3
Fiscal Year
2010
Total Cost
$233,593
Indirect Cost
Name
Stanford University
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
009214214
City
Stanford
State
CA
Country
United States
Zip Code
94305
Tian, Lu; Fu, Haoda; Ruberg, Stephen J et al. (2018) Efficiency of two sample tests via the restricted mean survival time for analyzing event time observations. Biometrics 74:694-702
Sinnott, Jennifer A; Cai, Tianxi (2018) Pathway aggregation for survival prediction via multiple kernel learning. Stat Med 37:2501-2515
Yu, Sheng; Ma, Yumeng; Gronsbell, Jessica et al. (2018) Enabling phenotypic big data with PheNorm. J Am Med Inform Assoc 25:54-60
Dai, Wei; Yang, Ming; Wang, Chaolong et al. (2017) Sequence robust association test for familial data. Biometrics 73:876-884
Kim, Dae Hyun; Uno, Hajime; Wei, Lee-Jen (2017) Restricted Mean Survival Time as a Measure to Interpret Clinical Trial Results. JAMA Cardiol 2:1179-1180
Chen, Shuai; Tian, Lu; Cai, Tianxi et al. (2017) A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics 73:1199-1209
Michael, H; Tian, L (2017) Discussion of ""A risk-based measure of time-varying prognostic discrimination for survival models,"" by C. Jason Liang and Patrick J. Heagerty. Biometrics 73:735-738
Zhou, Qian M; Dai, Wei; Zheng, Yingye et al. (2017) Robust Dynamic Risk Prediction with Longitudinal Studies. Stat Theory Relat Fields 1:159-170
Zheng, Yu; Cai, Tianxi (2017) Augmented estimation for t-year survival with censored regression models. Biometrics 73:1169-1178
Sinnott, Jennifer A; Cai, Tianxi (2016) Inference for survival prediction under the regularized Cox model. Biostatistics 17:692-707

Showing the most recent 10 out of 44 publications