Combining information from multiple studies continues to be a cost-effective approach in biomedical research. In traditional statistical literature, the associated analytic method is coined as meta analysis. However, the statistical tools for meta analysis were developed under rather restricted settings. In developing the next generation meta analysis methods, there are many new challenges ranging from increasing the robustness of traditional meta analysis to enhancing the protection of data privacy in sharing patient level information. In this proposal, we aim to address several important analytic issues that arise from combining multiple studies. We expect that the planned methodological development will be able to provide a general framework to effective information pooling from various sources. We also aim to facilitate the development of new regulatory pathways to integrate real world evidences into the drug development process. The proposal contains three specific aims.
In Specific Aim 1, we plan to develop valid and general random effects meta-analysis inferential procedures allowing the number of studies to be small or the study-specific treatment effect estimator to be irregular, where the statistical inference based on traditional random effects models fails.
In Specific Aim 2, we plan to develop robust and efficient procedures for estimating treatment effects by synthesizing information from real world evidence data and randomized clinical trials. The broad patient population and detailed patient information make large database such as electronic medical records a valuable source for precision medicine research. Effectively extracting rich information from real world evidence data has thus become a pressing need. In this aim, we propose to develop an adaptive causal inferential procedure based on multiple studies to correct biases from various sources under relaxed assumptions.
In Specific Aim 3, we propose to develop optimal estimation/prediction procedures based on data from multiple sources in the presence of the data privacy concern and between study heterogeneity. The first part of the aim is about a divide-and-conquer strategy bypassing the need of patient level data to alleviate the privacy concern in data sharing. The second part of the aim is about a set of statistical learning methods for predicting patients? future outcome and selecting the optimal treatment accounting for between study heterogeneities, when patient level data can be shared.

Public Health Relevance

(Public Health Relevance Statement) Pooling information from multiple sources including real world evidence data is a highly cost-effective approach in biomedical research. The traditional statistical tool, i.e, the meta analysis, was developed under rather restricted settings and we propose to develop novel methodology for the next generation meta analysis in the era of big data. We plan to apply the new methods to the area of precision medicine including individualized diagnosis and treatment recommendation.

National Institute of Health (NIH)
National Heart, Lung, and Blood Institute (NHLBI)
Research Project (R01)
Project #
Application #
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Wolz, Michael
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Tian, Lu; Fu, Haoda; Ruberg, Stephen J et al. (2018) Efficiency of two sample tests via the restricted mean survival time for analyzing event time observations. Biometrics 74:694-702
Sinnott, Jennifer A; Cai, Tianxi (2018) Pathway aggregation for survival prediction via multiple kernel learning. Stat Med 37:2501-2515
Yu, Sheng; Ma, Yumeng; Gronsbell, Jessica et al. (2018) Enabling phenotypic big data with PheNorm. J Am Med Inform Assoc 25:54-60
Dai, Wei; Yang, Ming; Wang, Chaolong et al. (2017) Sequence robust association test for familial data. Biometrics 73:876-884
Kim, Dae Hyun; Uno, Hajime; Wei, Lee-Jen (2017) Restricted Mean Survival Time as a Measure to Interpret Clinical Trial Results. JAMA Cardiol 2:1179-1180
Chen, Shuai; Tian, Lu; Cai, Tianxi et al. (2017) A general statistical framework for subgroup identification and comparative treatment scoring. Biometrics 73:1199-1209
Michael, H; Tian, L (2017) Discussion of ""A risk-based measure of time-varying prognostic discrimination for survival models,"" by C. Jason Liang and Patrick J. Heagerty. Biometrics 73:735-738
Zhou, Qian M; Dai, Wei; Zheng, Yingye et al. (2017) Robust Dynamic Risk Prediction with Longitudinal Studies. Stat Theory Relat Fields 1:159-170
Zheng, Yu; Cai, Tianxi (2017) Augmented estimation for t-year survival with censored regression models. Biometrics 73:1169-1178
Sinnott, Jennifer A; Cai, Tianxi (2016) Inference for survival prediction under the regularized Cox model. Biostatistics 17:692-707

Showing the most recent 10 out of 44 publications