In cancer research, profiling studies have been extensively conducted, measuring genome-wide gene expression levels, DNA modifications, epigenetic regulation, and post-transcriptional regulations. Many studies are """"""""one-dimensional"""""""" and restricted to one type of genomic measurement. More recently, """"""""multi-dimensional"""""""" studies are becoming more popular. In such studies, the same samples are profiled on multiple layers of genomic activities. A representative example is The Cancer Genome Atlas (TCGA). Multi-dimensional studies offer a unique opportunity to more comprehensively describe the etiology and prognosis of cancer. In the literature, much effort has been devoted to modeling the interconnections among different regulations. In contrast, there are relatively few studies conducting integrated analysis and modeling the associations between multiple types of genomic measurements and cancer outcomes. The existing integrated analysis methods also have serious limitations, which may lead to suboptimal or even biased results. Our goal is to more effectively describe cancer etiology and prognosis by analyzing multi-dimensional genomic data. Motivated by the limitations of existing studies, our first objective is to develop novel statistical methods, effectively integrate multi-dimensional genomic measurements, and establish their associations with cancer outcomes. Such an objective differs significantly from those of published studies. The proposed methods will have significant advantages. They will assume different biological working models, allowing for a direct comparison of these models. They will be applicable to a large number of datasets, can accommodate the joint effects of a large number of markers, and adopt efficient statistical techniques. The second objective is to apply these methods and analyze TCGA data on multiple types of cancers.
The specific aims are to (Aim 1) Develop novel statistical methods to integrate multiple types of genomic measurements for cancer outcomes. Three different methods will be developed under different data generating models;
(Aim 2) Develop user- friendly software and project website. Analyze TCGA data on multiple types of cancers, particularly including cancers of breast, ovary and prostate and lymphoma. Such data have measurements on gene expression, copy number variation, methylation, microRNA and others available. With the cost of sequencing falling fast, it will soon become a routine practice to profile multi- dimensional genomic characterizations of samples. This study will deliver a new analysis strategy and a set of novel statistical methods. These methods will integrate multiple types of genomic measurements for cancer outcomes and complement the existing methods. The analysis of TCGA data will provide valuable insights into multiple cancers and serve as prototype for future applications.

Public Health Relevance

Novel statistical methods will be developed to analyze cancer studies with multiple types of genomic measurements. Three methods will be developed to associate genomic measurements with cancer outcomes under different model assumptions. TCGA data on multiple types of cancers will be analyzed.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Small Research Grants (R03)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1-SRLB-D (O1))
Program Officer
Verma, Mukesh
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Chai, Hao; Shi, Xingjie; Zhang, Qingzhao et al. (2017) Analysis of cancer gene expression data with an assisted robust marker identification approach. Genet Epidemiol 41:779-789
Jiang, Yu; Shi, Xingjie; Zhao, Qing et al. (2016) Integrated analysis of multidimensional omics data on cutaneous melanoma prognosis. Genomics 107:223-30
Zhu, Ruoqing; Zhao, Qing; Zhao, Hongyu et al. (2016) Integrating multidimensional omics data for cancer outcome. Biostatistics 17:605-18
Shi, Xingjie; Zhao, Qing; Huang, Jian et al. (2015) Deciphering the associations between gene expression and copy number alteration using a sparse double Laplacian shrinkage approach. Bioinformatics 31:3977-83
Zhao, Qing; Shi, Xingjie; Huang, Jian et al. (2015) Integrative Analysis of ""-Omics"" Data Using Penalty Functions. Wiley Interdiscip Rev Comput Stat 7:99-108
Zhao, Qing; Shi, Xingjie; Xie, Yang et al. (2015) Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. Brief Bioinform 16:291-303
Shi, Xingjie; Yi, Huangdi; Ma, Shuangge (2015) Measures for the degree of overlap of gene signatures and applications to TCGA. Brief Bioinform 16:735-44
Wu, Cen; Ma, Shuangge (2015) A selective review of robust variable selection with applications in bioinformatics. Brief Bioinform 16:873-83
Wu, Cen; Cui, Yuehua; Ma, Shuangge (2014) Integrative analysis of gene-environment interactions under a multi-response partially linear varying coefficient model. Stat Med 33:4988-98
Zhu, Ruoqing; Zhao, Hongyu; Ma, Shuangge (2014) Identifying gene-environment and gene-gene interactions using a progressive penalization approach. Genet Epidemiol 38:353-68

Showing the most recent 10 out of 11 publications