Cancer is a complex genetic disease, which results from accumulation of multiple genetic defects, including mutations and epigenetic changes. Advancements in microarray techniques make it possible to profile gene expressions of human tissues on a genome-wide scale, with which genomic biomarkers with predictive power for cancer diagnosis and prognosis can be discovered. Such discovery can lead to better understanding of cancer genetics, more accurate prediction of tumor behaviors, and more rational treatment selection. Effective biomarker selection is the key step connecting wet-lab studies with pharmacogenetic practice. The long term goal is to provide more effective and reliable biomarker selection methods, make more efficient use of high dimensional gene expression data, and eventually facilitate clinical practice using genomic measurements. In the present application, we will develop novel clustering penalized methods for biomarker selection in cancer studies with gene expression data. The proposed methods explicitly take into account the cluster nature of gene expressions. They are able to identify a few important gene clusters and a few important genes within those selected clusters that have influential impacts on cancer outcomes such as cancer status, response to treatment and cancer survival. They are expected to provide more accurate gene selection and better prediction than existing methods.
The specific aims are as follows. [1] Propose novel clustering penalized methods for biomarker selection at both the cluster level and the within-cluster gene level. We will propose: (a) Supervised Adaptive Group Lasso- SAGLasso;and (b) Group Bridge Lasso-GBL. We will investigate computational algorithms, tuning parameter selection, evaluation of gene selection and prediction, and large-sample statistical properties. [2] Cancer classification analysis using proposed clustering penalized approaches, where the outcome of interest is categorical cancer status or response to therapy. [3] Cancer survival analysis using proposed clustering penalized approaches, where the outcome is censored survival time. [4] Extensive numerical studies using various cancer gene expression data sets. The approaches developed in Aims 1-3 will be used to analyze ongoing studies as well as publicly available cancer microarray data. We will compare gene selection results and prediction performance of proposed approaches with existing methods. The proposed study will be the first to establish a rigorous statistical framework that explicitly accounts for the cluster nature of gene expressions in cancer biomarker selection. The proposed methods are expected to outperform existing ones in terms of gene selection and prediction performance. We will also investigate cancer classification and survival models in great details and develop efficient algorithms and portable R/S-Plus packages, which make the proposed methods easily accessible for routine biomedical data analysis.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Small Research Grants (R03)
Project #
5R03LM009754-02
Application #
7897804
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-08-01
Project End
2012-07-31
Budget Start
2010-08-01
Budget End
2012-07-31
Support Year
2
Fiscal Year
2010
Total Cost
$78,429
Indirect Cost
Name
Yale University
Department
Public Health & Prev Medicine
Type
Schools of Medicine
DUNS #
043207562
City
New Haven
State
CT
Country
United States
Zip Code
06520
Ma, Shuangge; Dai, Ying (2011) Principal component analysis based methods in bioinformatics studies. Brief Bioinform 12:714-22
Huang, Jian; Ma, Shuangge; Li, Hongzhe et al. (2011) The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression. Ann Stat 39:2021-2046
Ma, Shuangge; Kosorok, Michael R; Huang, Jian et al. (2011) Incorporating higher-order representative features improves prediction in network-based cancer prognosis analysis. BMC Med Genomics 4:5
Ma, Shuangge; Kosorok, Michael R (2010) Detection of gene pathways with predictive power for breast cancer prognosis. BMC Bioinformatics 11:1
Ma, Shuangge; Zhang, Yawei; Huang, Jian et al. (2010) Identification of non-Hodgkin's lymphoma prognosis signatures using the CTGDR method. Bioinformatics 26:15-21
Ma, Shuangge; Huang, Jian; Shi, Mingyu et al. (2010) Semiparametric prognosis models in genomic studies. Brief Bioinform 11:385-93
Han, Xuesong; Li, Yang; Huang, Jian et al. (2010) Identification of predictive pathways for non-hodgkin lymphoma prognosis. Cancer Inform 9:281-92
Song, Xiao; Ma, Shuangge (2010) Penalized variable selection with U-estimates. J Nonparametr Stat 22:499-515
Ma, Shuangge; Shi, Mingyu; Li, Yang et al. (2010) Incorporating gene co-expression network in identification of cancer prognosis markers. BMC Bioinformatics 11:271
Ma, Shuangge; Huang, Jian; Moran, Meena S (2009) Identification of genes associated with multiple cancers via integrative analysis. BMC Genomics 10:535