This proposal develops novel statistical methods to select a small group of molecules from high-throughput data such as microarray and proteomic data from cancer research. The challenge of the study is the ultrahigh dimensionality inherited in these studies, particular when gene-gene interactions are introduced. The ultrahigh dimensionality has large impact on statistical computation, methodological developments, and theoretical studies. The challenge will be dealt by using the proposed novel independence screening methods, which also addresses the computational demand and stability, and the issues of stochastic error accumulation in ultra-high dimensional statistical inferences. An iterative independence screening method is introduced to find hidden signature genes that are marginally unimportant but jointly extremely important to the clinical outcomes. It also enables us to eliminate redundant molecules that are marginally highly but jointly weakly associated with clinical outcomes. With number of features surely reduced to a manageable level, penalized pseudo-likelihood methods will be introduced to further select relevant genes. In addition, methods for finding synergetic groups of molecules are introduced. The idea of independence screening and its iterated version will be applied to various statistical problems from the analysis of high throughput data, ranging from ultrahigh dimensional regression and classification to the analysis of survival time, estimation of genewide variance, and normalization of microarrays. The efficacy of the proposed methods will be evaluated via asymptotic theory and simulation studies. Data sets from on-going biomedical studies on cancer such as breast cancer, multiple myeloma, neuroblastoma, lung tumor, and liver carcigogen will be critically analyzed using the newly developed statistical and bioinformatic tools.

Public Health Relevance

Statistical Methods for Ultrahigh-dimensional Biomedical Data PI: Jianqing Fan This proposal develops novel statistical and bioinformatic tools for finding genes and proteins that are associated with clinical outcomes. Data sets from on-going biomedical studies on cancer such as breast cancer, multiple myeloma, neuroblastoma, lung tumor, and liver carcinogen will be critically analyzed using the newly developed statistical and bioinformatic tools. The research findings will have strong impact on understanding molecular mechanisms of cancer and developing therapeutic targets.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM072611-08
Application #
8423354
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Marcus, Stephen
Project Start
2006-02-01
Project End
2015-01-31
Budget Start
2013-02-01
Budget End
2015-01-31
Support Year
8
Fiscal Year
2013
Total Cost
$254,324
Indirect Cost
$87,138
Name
Princeton University
Department
Miscellaneous
Type
Schools of Arts and Sciences
DUNS #
002484665
City
Princeton
State
NJ
Country
United States
Zip Code
08544
Zhu, Hongtu; Fan, Jianqing; Kong, Linglong (2014) Spatially Varying Coefficient Model for Neuroimaging Data with Jump Discontinuities. J Am Stat Assoc 109:1084-1098
Fan, Jianqing; Han, Fang; Liu, Han (2014) Challenges of Big Data Analysis. Natl Sci Rev 1:293-314
Ke, By Tracy; Jin, Jiashun; Fan, Jianqing (2014) COVARIANCE ASSISTED SCREENING AND ESTIMATION. Ann Stat 42:2202-2242
Fan, Jianqing; Ma, Yunbei; Dai, Wei (2014) Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models. J Am Stat Assoc 109:1270-1284
Fan, Jianqing; Liao, Yuan; Mincheva, Martina (2013) Large Covariance Estimation by Thresholding Principal Orthogonal Complements. J R Stat Soc Series B Stat Methodol 75:
Fan, Jianqing; Liu, Han (2013) Statistical analysis of big data on pharmacogenomics. Adv Drug Deliv Rev 65:987-1000
Fan, Jianqing; Guo, Shaojun; Hao, Ning (2012) Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J R Stat Soc Series B Stat Methodol 74:37-65
Bradic, Jelena; Fan, Jianqing; Wang, Weiwei (2011) Penalized Composite Quasi-Likelihood for Ultrahigh-Dimensional Variable Selection. J R Stat Soc Series B Stat Methodol 73:325-349
Fan, Jianqing; Lv, Jinchi (2011) Non-Concave Penalized Likelihood with NP-Dimensionality. IEEE Trans Inf Theory 57:5467-5484
Zhang, Chunming; Fan, Jianqing; Yu, Tao (2011) MULTIPLE TESTING VIA FDR FOR LARGE SCALE IMAGING DATA. Ann Stat 39:613-642

Showing the most recent 10 out of 28 publications