The objectives of this project are to develop novel statistical methods and computer packages for cancer classification and survival analysis using high-dimensional gene expression data and clinical measurements. For studying the layered complexity of cancer, gene expression prowling using micro arrays and statistical analysis offers a powerful tool as it allows analysis of gene expression patterns in a genome-wide scale. As expression pro ling technologies mature and more and more massive data are being generated, the identification of statistically and biologically significant patterns from high- dimensional and noisy data sets is increasingly becoming a major challenge. There is an urgent need to develop statistical methods that can deal with such high-dimensional problems in linking the clinical outcomes and gene expression pro les, for better diagnosis of disease status, better prescription of treatment, and better survival prediction in cancer. In this project, we will use the basic principles of cancer genetics to guide our efforts in developing novel methods and models for analyzing data from cancer expression prowling studies.
The specific aims of this project are to: (1) Develop bridge penalized method for variable selection and estimation for high-dimensional models that have important applications in gene expression prowling studies of cancer. (2) Develop group-bridge penalized method for incorporating biological pathways into the analysis of gene expression data. (3) Develop bridge and group-bridge penalized methods for tumor classification and biomarker selection using gene expression data, with emphasis on regularized logistic regression and semi-parametric ROC classification methods for improved sensitivity and specificity, while adjusting for other clinical covariates and risk factors. (4) Develop bridge and group-bridge penalized methods for correlating survival with gene expression data, while adjusting for other clinical measurements and risk factors. Important models to consider include the linear and partially linear Cox proportional hazards models and the linear and partially linear accelerated failure time models. (5) Implement the proposed methods in well documented R packages and C programs;evaluate these methods by extensive computer simulation studies and by use of publicly available data sets;and apply them to two expression prowling studies of lymphoma and head and neck cancers. The development of the proposed statistical methods that can deal with high-dimensional problems in estimating the relationship between cancer clinical outcomes and genomic data will contribute to better understanding of the genetic basis of cancer, better diagnoses, and better survival prediction, which in turn, can potentially have important impact on public health.
The development of the proposed statistical methods that can deal with high-dimensional problems in estimating the relationship between cancer clinical outcomes and genomic data will contribute to better understanding of the genetic basis of cancer, better diagnoses, and better survival prediction, which in turn, can potentially have important impact on public health.
|Breheny, Patrick; Huang, Jian (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25:173-187|
|Jiang, Dingfeng; Huang, Jian (2014) Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models. Stat Comput 24:871-883|
|Liu, Jin; Huang, Jian; Ma, Shuangge (2014) Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization. Scand Stat Theory Appl 41:87-103|
|Jiang, Dingfeng; Huang, Jian; Zhang, Ying (2013) The cross-validated AUC for MCP-logistic regression with high-dimensional data. Stat Methods Med Res 22:505-18|
|Liu, Jin; Wang, Kai; Ma, Shuangge et al. (2013) Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method. Stat Interface 6:99-115|
|Huang, Jian; Sun, Tingni; Ying, Zhiliang et al. (2013) ORACLE INEQUALITIES FOR THE LASSO IN THE COX MODEL. Ann Stat 41:1142-1165|
|Liu, Jin; Huang, Jian; Ma, Shuangge et al. (2013) Incorporating group correlations in genome-wide association studies using smoothed group Lasso. Biostatistics 14:205-19|
|Ma, Shuangge; Dai, Ying; Huang, Jian et al. (2012) Identification of Breast Cancer Prognosis Markers via Integrative Analysis. Comput Stat Data Anal 56:2718-2728|
|Huang, Yuan; Huang, Jian; Shia, Ben-Chang et al. (2012) Identification of cancer genomic markers via integrative sparse boosting. Biostatistics 13:509-22|
|Huang, Jian; Breheny, Patrick; Ma, Shuangge (2012) A Selective Review of Group Selection in High-Dimensional Models. Stat Sci 27:|
Showing the most recent 10 out of 37 publications