The objectives of this project are to develop novel statistical methods and computer packages for cancer classification and survival analysis using high-dimensional gene expression data and clinical measurements. For studying the layered complexity of cancer, gene expression prowling using micro arrays and statistical analysis offers a powerful tool as it allows analysis of gene expression patterns in a genome-wide scale. As expression pro ling technologies mature and more and more massive data are being generated, the identification of statistically and biologically significant patterns from high- dimensional and noisy data sets is increasingly becoming a major challenge. There is an urgent need to develop statistical methods that can deal with such high-dimensional problems in linking the clinical outcomes and gene expression pro les, for better diagnosis of disease status, better prescription of treatment, and better survival prediction in cancer. In this project, we will use the basic principles of cancer genetics to guide our efforts in developing novel methods and models for analyzing data from cancer expression prowling studies.
The specific aims of this project are to: (1) Develop bridge penalized method for variable selection and estimation for high-dimensional models that have important applications in gene expression prowling studies of cancer. (2) Develop group-bridge penalized method for incorporating biological pathways into the analysis of gene expression data. (3) Develop bridge and group-bridge penalized methods for tumor classification and biomarker selection using gene expression data, with emphasis on regularized logistic regression and semi-parametric ROC classification methods for improved sensitivity and specificity, while adjusting for other clinical covariates and risk factors. (4) Develop bridge and group-bridge penalized methods for correlating survival with gene expression data, while adjusting for other clinical measurements and risk factors. Important models to consider include the linear and partially linear Cox proportional hazards models and the linear and partially linear accelerated failure time models. (5) Implement the proposed methods in well documented R packages and C programs;evaluate these methods by extensive computer simulation studies and by use of publicly available data sets;and apply them to two expression prowling studies of lymphoma and head and neck cancers. The development of the proposed statistical methods that can deal with high-dimensional problems in estimating the relationship between cancer clinical outcomes and genomic data will contribute to better understanding of the genetic basis of cancer, better diagnoses, and better survival prediction, which in turn, can potentially have important impact on public health.

Public Health Relevance

The development of the proposed statistical methods that can deal with high-dimensional problems in estimating the relationship between cancer clinical outcomes and genomic data will contribute to better understanding of the genetic basis of cancer, better diagnoses, and better survival prediction, which in turn, can potentially have important impact on public health.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA120988-03
Application #
7744698
Study Section
Special Emphasis Panel (ZRG1-HOP-E (03))
Program Officer
Rasooly, Avraham
Project Start
2008-01-02
Project End
2011-12-31
Budget Start
2010-01-01
Budget End
2010-12-31
Support Year
3
Fiscal Year
2010
Total Cost
$287,408
Indirect Cost
Name
University of Iowa
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
062761671
City
Iowa City
State
IA
Country
United States
Zip Code
52242
Breheny, Patrick; Huang, Jian (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25:173-187
Jiang, Dingfeng; Huang, Jian (2014) Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models. Stat Comput 24:871-883
Liu, Jin; Huang, Jian; Ma, Shuangge (2014) Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization. Scand Stat Theory Appl 41:87-103
Jiang, Dingfeng; Huang, Jian; Zhang, Ying (2013) The cross-validated AUC for MCP-logistic regression with high-dimensional data. Stat Methods Med Res 22:505-18
Liu, Jin; Wang, Kai; Ma, Shuangge et al. (2013) Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method. Stat Interface 6:99-115
Huang, Jian; Sun, Tingni; Ying, Zhiliang et al. (2013) ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL. Ann Stat 41:1142-1165
Liu, Jin; Huang, Jian; Ma, Shuangge et al. (2013) Incorporating group correlations in genome-wide association studies using smoothed group Lasso. Biostatistics 14:205-19
Huang, Jian; Wei, Fengrong; Ma, Shuangge (2012) Semiparametric Regression Pursuit. Stat Sin 22:1403-1426
Shen, Shihao; Park, Juw Won; Huang, Jian et al. (2012) MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic Acids Res 40:e61
Huang, Jian; Zhang, Cun-Hui (2012) Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications. J Mach Learn Res 13:1839-1864

Showing the most recent 10 out of 37 publications