This proposal develops novel statistics and machine learning methods for distributed analysis of big data in biomedical studies and precision medicine and for selecting a small group of molecules that are associated with biological and clinical outcomes from high-throughput data such as microarray, proteomic, and next generation sequence from biomedical research, especially for autism studies and Alzheimer?s disease research. It focuses on developing efficient distributed statistical methods for Big Data computing, storage, and communication, and for solving distributed health data collected at different locations that are hard to aggregate in meta-analysis due to privacy and ownership concerns. It develops both computationally and statistically efficient methods and valid statistical tools for exploring heterogeneity of big data in precision medicine, for studying associations of genomics and genetic information with clinical and biological outcomes, and for feature selection and model building in presence of errors-in- variables, endogeneity, and heavy-tail error distributions, and for predicting clinical outcomes and understanding molecular mechanisms. It introduces more robust and powerful statistical tests for selection of significant genes, SNPs, and proteins in presence of dependence of data, valid control of false discovery rate for dependent test statistics, and evaluation of treatment effects on a group of molecules. The strength and weakness of each proposed method will be critically analyzed via theoretical investigations and simulation studies. Related software will be developed for free dissemination. Data sets from ongoing autism research, Alzheimer?s disease, and other biomedical studies will be analyzed by using the newly developed methods and the results will be further biologically confirmed and investigated. The research findings will have strong impact on statistical analysis of high throughput big data for biomedical research and on understanding heterogeneity for precision medicine and molecular mechanisms of autism, Alzheimer?s disease, and other diseases.

Public Health Relevance

This proposal develops novel statistical machine learning methods and bioinformatic tools for finding genes, proteins, and SNPs that are associated with clinical outcomes and discovering heterogeneity for precision medicine. Data sets from ongoing autism research, Alzheimer?s disease and other biomedical studies will be critically analyzed using the newly developed statistical methods, and the results will be further biologically confirmed and investigated. The research findings will have strong impact on developing therapeutic targets and understanding heterogeneity for precision and molecular mechanisms of autism, Alzheimer?s diseases, and other diseases. !

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM072611-15
Application #
9900790
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2006-02-01
Project End
2022-01-31
Budget Start
2020-02-01
Budget End
2021-01-31
Support Year
15
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Princeton University
Department
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08543
Fan, Jianqing; Liu, Han; Wang, Weichen (2018) LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS. Ann Stat 46:1383-1414
Chen, Zhao; Fan, Jianqing; Li, Runze (2018) Error Variance Estimation in Ultrahigh-Dimensional Additive Models. J Am Stat Assoc 113:315-327
Li, Quefeng; Cheng, Guang; Fan, Jianqing et al. (2018) Embracing the Blessing of Dimensionality in Factor Models. J Am Stat Assoc 113:380-389
Fan, Jianqing; Shao, Qi-Man; Zhou, Wen-Xin (2018) ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS. Ann Stat 46:989-1017
Battey, Heather; Fan, Jianqing; Liu, Han et al. (2018) DISTRIBUTED TESTING AND ESTIMATION UNDER SPARSE HIGH DIMENSIONAL MODELS. Ann Stat 46:1352-1382
Zhou, Wen-Xin; Bose, Koushiki; Fan, Jianqing et al. (2018) A NEW PERSPECTIVE ON ROBUST M-ESTIMATION: FINITE SAMPLE THEORY AND APPLICATIONS TO DEPENDENCE-ADJUSTED MULTIPLE TESTING. Ann Stat 46:1904-1931
Fan, Jianqing; Liu, Han; Sun, Qiang et al. (2018) I-LAMM FOR SPARSE LEARNING: SIMULTANEOUS CONTROL OF ALGORITHMIC COMPLEXITY AND STATISTICAL ERROR. Ann Stat 46:814-841
Avella-Medina, Marco; Battey, Heather S; Fan, Jianqing et al. (2018) Robust estimation of high-dimensional covariance and precision matrices. Biometrika 105:271-284
Fan, Jianqing; Xue, Lingzhou; Yao, Jiawei (2017) Sufficient Forecasting Using Factor Models. J Econom 201:292-306
Fan, Jianqing; Li, Quefeng; Wang, Yuyan (2017) Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J R Stat Soc Series B Stat Methodol 79:247-265

Showing the most recent 10 out of 77 publications