The objectives of this project are to develop novel statistical methods and computer packages for cancer classification and survival analysis using high-dimensional gene expression data and clinical measurements. For studying the layered complexity of cancer, gene expression prowling using micro arrays and statistical analysis offers a powerful tool as it allows analysis of gene expression patterns in a genome-wide scale. As expression pro ling technologies mature and more and more massive data are being generated, the identification of statistically and biologically significant patterns from high- dimensional and noisy data sets is increasingly becoming a major challenge. There is an urgent need to develop statistical methods that can deal with such high-dimensional problems in linking the clinical outcomes and gene expression pro les, for better diagnosis of disease status, better prescription of treatment, and better survival prediction in cancer. In this project, we will use the basic principles of cancer genetics to guide our efforts in developing novel methods and models for analyzing data from cancer expression prowling studies.
The specific aims of this project are to: (1) Develop bridge penalized method for variable selection and estimation for high-dimensional models that have important applications in gene expression prowling studies of cancer. (2) Develop group-bridge penalized method for incorporating biological pathways into the analysis of gene expression data. (3) Develop bridge and group-bridge penalized methods for tumor classification and biomarker selection using gene expression data, with emphasis on regularized logistic regression and semi-parametric ROC classification methods for improved sensitivity and specificity, while adjusting for other clinical covariates and risk factors. (4) Develop bridge and group-bridge penalized methods for correlating survival with gene expression data, while adjusting for other clinical measurements and risk factors. Important models to consider include the linear and partially linear Cox proportional hazards models and the linear and partially linear accelerated failure time models. (5) Implement the proposed methods in well documented R packages and C programs;evaluate these methods by extensive computer simulation studies and by use of publicly available data sets;and apply them to two expression prowling studies of lymphoma and head and neck cancers. The development of the proposed statistical methods that can deal with high-dimensional problems in estimating the relationship between cancer clinical outcomes and genomic data will contribute to better understanding of the genetic basis of cancer, better diagnoses, and better survival prediction, which in turn, can potentially have important impact on public health.
Showing the most recent 10 out of 37 publications