An important goal in cancer research is to identify genomic biomarkers that can be used to obtain a better understanding of the genetic basis of cancers, and construct models that can be used to predict cancer occurrence and progression. Many studies have used microarrays to identify genes that have altered expression levels in various cancer tissues. Meta analysis makes it possible to (1) effectively combine experiments with different microarray platforms and/or other setup;(2) lead to more reliable and consistent gene identification results across studies and more satisfactory predictions;and (3) identify genes that are commonly activated in different types of cancer. The proposed study is the first to investigate novel regularized methods for microarray meta analysis where cancer clinical outcomes are measured along with gene expressions in multiple independent experiments. The proposed approaches can (1) effectively combine data from different platforms/ experimental setup;(2) carry out efficient biomarker selection and predictive model building simultaneously;and (3) identify influential genes that are important across different experiments, while allowing for experiment-specific predictive models.
The specific aims of this study include: (1) Develop MTGDR (Meta Threshold Gradient Directed Regularization) method for regularized microarray meta analysis. (2) Develop penalized group-bridge method for regularized microarray meta analysis. (3) Apply the proposed general methodologies to cancer classification and survival analysis with microarray data. Develop user-friendly R packages implementing the proposed approaches and make them publicly available. We will consider cancer microarray meta analysis where individual experiments can have categorical clinical outcomes and right censored survival outcomes. Analysis of practical cancer studies and extensive simulations will be conducted to assess performance of proposed approaches and compare with alternatives. In this application, we emphasize not only development of new general methodologies, but also their computer implementation, applications and empirical performances.