A reliable and precise prognosis is fundamental for successful disease management and treatment selection. More aggressive intervention can be given to patients who are at high risk of early disease onset, while patients who are unlikely to respond to one treatment should be considered for alternative options. With the rapid advancement of technology, a wide range of biological and genomic markers have emerged as potential tools for improving the prediction of disease and treatment outcomes, and may lead to personalized, tailored medicine. New technologies such as DNA sequencing and microarrays are generating detailed data with exponentially increasing dimensionality and complexity. These data presents unprecedented opportunities and great challenges for making accurate prediction of clinical outcomes. To take full advantage of such data, this proposal aims to develop statistical approaches to efficiently construct and evaluate prognostic tools for disease risk assessment and treatment selection. Specifically, in Aim 1, we will develop accurate risk prediction models by incorporating complex interactive effects via a kernel machine regression framework. We will also provide non-parametric procedures for assessing the predictive performance of the resulting models.
In Aim 2, we propose inference procedures for absolute risks and prediction performance of new markers using two-phase studies.
In Aim 3, we develop systematic procedures for identifying subgroups of patients who may or may not benefit from a new treatment using patient level baseline marker information.
In Aim 4, we focus on high dimensional regression and develop regularized resampling methods to construct confidence intervals and hypothesis testing procedures for regression coefficients and the prediction performance of estimated models. To increase the practical impact of our research, in addition to creating software for public use, we will apply the proposed procedures to predict individual risk of developing (i) rheumatoid arthritis among women using the Nurse's Health Study (NHS);(ii) CVD among diabetic patients using the NHS and the Health Professional Follow-up Study;(iii) AIDS defining events among HIV infected patients using a large immunogenetic study;and (iv) CHD or stroke using the Women's Health Initiative (WHI) study. We also plan to develop algorithms to identify cases of various autoimmune diseases using electronic medical record (EMR) data from two large hospitals in Boston. The identified cases will be used for subsequent genetic case-control studies of the corresponding diseases. Such algorithms will enable the use of EMR clinical data directly for discovery research. In addition, we will develop treatment selection strategies for HIV infected patients using randomized ACTG clinical trials and for dietary intervention in preventing CVD using WHI clinical trials. Incorporating genetic profile, modifiable risk factors, along with biologic markers into risk models is likely to improve the prediction of clinical outcomes and ultimately lead to personalized medicine.

Public Health Relevance

The research proposal addresses the pressing need for advanced statistical tools that meet challenges in current development of prediction models for disease risk and treatment benefit. By providing statistical tools that enable clinical investigators to effectively develop personalized disease management strategies, this proposal will join prior and ongoing research activities towards the goal of finding efficient and cost effective personalized medicine.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard University
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Payne, Rebecca; Neykov, Matey; Jensen, Majken Karoline et al. (2016) Kernel machine testing for risk prediction with stratified case cohort studies. Biometrics 72:372-81
Zhao, Lihui; Claggett, Brian; Tian, Lu et al. (2016) On the restricted mean survival time curve in survival analysis. Biometrics 72:215-21
Payne, Rebecca; Yang, Ming; Zheng, Yingye et al. (2016) Robust risk prediction with biomarkers under two-phase stratified cohort design. Biometrics 72:1037-1045
Li, Junlong; Zhao, Lihui; Tian, Lu et al. (2016) A predictive enrichment procedure to identify potential responders to a new therapy for randomized, comparative controlled clinical studies. Biometrics 72:877-87
Maziarz, Marlena; Heagerty, Patrick; Cai, Tianxi et al. (2016) On longitudinal prediction with time-to-event outcome: Comparison of modeling options. Biometrics :
Zhou, Qian M; Zheng, Yingye; Chibnik, Lori B et al. (2015) Assessing incremental value of biomarkers with multi-phase nested case-control studies. Biometrics 71:1139-49
Shen, Yuanyuan; Cai, Tianxi; Chen, Yu et al. (2015) Retrospective likelihood-based methods for analyzing case-cohort genetic association studies. Biometrics 71:960-8
Minnier, Jessica; Yuan, Ming; Liu, Jun S et al. (2015) Risk Classification with an Adaptive Naive Bayes Kernel Machine Model. J Am Stat Assoc 110:393-404
Uno, Hajime; Tian, Lu; Claggett, Brian et al. (2015) A versatile test for equality of two survival functions based on weighted differences of Kaplan-Meier curves. Stat Med 34:3680-95
Claggett, Brian; Tian, Lu; Castagno, Davide et al. (2015) Treatment selections using risk-benefit profiles based on data from comparative randomized clinical trials with multiple endpoints. Biostatistics 16:60-72

Showing the most recent 10 out of 50 publications