This proposal develops novel statistical methods to select a small group of molecules from high-throughput data such as microarray, proteomic, and next generation sequence from biomedical research, especially for autism studies and brain tumors. It focuses on developing efficient methods and valid statistical tools for controlling false discovery rate an testing treatment effects on a group of molecules, for feature selection and model building in presence of errors-in-variables, endogeneity, and heavy-tail error distributions, and for predicting clinical outcomes and understanding molecular mechanisms. It develops semiparametric and nonparametric models to reduce modeling biases and to augment features. It furthers the developments on estimating large covariance matrices for understanding genetic network, statistical model building and inferences. It introduces multivariate independence screening and conditional independence screening techniques to reduce false negatives and false positives in variable screening, and develops computable and optimal penalized likelihood methods for an array of statistical models. The strength and weakness of each proposed method will be critically analyzed via theoretical investigations and simulation studies. Related software will be developed. Data sets from ongoing autism research, brain tumor, and other biomedical studies will be analyzed using the newly developed methods and the results will be further biologically confirmed and investigated. The research findings will have strong impact on statistical analysis of high throughput data for biomedical research and on understanding molecular mechanisms of autism, brain tumors, and other diseases.

Public Health Relevance

This proposal develops novel statistical and bioinformatic tools for finding genes, proteins, and SNPs that are associated with clinical outcomes. Data sets from ongoing autism research, brain tumors and other biomedical studies will be critically analyzed using the newly developed statistical and bioinformatic methods, and the results will be further biologically confirmed and investigated. The research findings will have strong impact on understanding molecular mechanisms of autism, brain tumors, and other diseases and developing therapeutic targets.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
2R01GM072611-09
Application #
8627273
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Brazhnik, Paul
Project Start
2004-12-01
Project End
2018-01-31
Budget Start
2014-04-01
Budget End
2015-01-31
Support Year
9
Fiscal Year
2014
Total Cost
$308,918
Indirect Cost
$105,338
Name
Princeton University
Department
None
Type
Schools of Engineering
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08544
Zhu, Hongtu; Fan, Jianqing; Kong, Linglong (2014) Spatially Varying Coefficient Model for Neuroimaging Data with Jump Discontinuities. J Am Stat Assoc 109:1084-1098
Fan, Jianqing; Han, Fang; Liu, Han (2014) Challenges of Big Data Analysis. Natl Sci Rev 1:293-314
Ke, By Tracy; Jin, Jiashun; Fan, Jianqing (2014) COVARIANCE ASSISTED SCREENING AND ESTIMATION. Ann Stat 42:2202-2242
Fan, Jianqing; Ma, Yunbei; Dai, Wei (2014) Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models. J Am Stat Assoc 109:1270-1284
Fan, Jianqing; Liao, Yuan; Mincheva, Martina (2013) Large Covariance Estimation by Thresholding Principal Orthogonal Complements. J R Stat Soc Series B Stat Methodol 75:
Fan, Jianqing; Liu, Han (2013) Statistical analysis of big data on pharmacogenomics. Adv Drug Deliv Rev 65:987-1000
Fan, Jianqing; Guo, Shaojun; Hao, Ning (2012) Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J R Stat Soc Series B Stat Methodol 74:37-65
Bradic, Jelena; Fan, Jianqing; Wang, Weiwei (2011) Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection. J R Stat Soc Series B Stat Methodol 73:325-349
Fan, Jianqing; Lv, Jinchi (2011) Non-Concave Penalized Likelihood with NP-Dimensionality. IEEE Trans Inf Theory 57:5467-5484
Zhang, Chunming; Fan, Jianqing; Yu, Tao (2011) MULTIPLE TESTING VIA FDR FOR LARGE SCALE IMAGING DATA. Ann Stat 39:613-642

Showing the most recent 10 out of 28 publications