For a large number of cancer types, gene expression (GE) profiling studies have been extensively conducted. Analyzing data so generated has led to a better understanding of cancer biology, effective markers for drug development, and clinically useful prediction models. With cancer GE data, network-based analysis, which takes a system perspective and more effectively accounts for the interconnections among genes, has led to important findings beyond individual-gene-based and pathway-based analyses. With analysis conducted at a higher functional level, such findings are usually more stable and more reproducible. Despite tremendous effort, GE data analysis results are still often unsatisfactory, because of ?a lack of information? caused by the low signal-to-noise ratio and high data dimensionality. In recent cancer research, a prominent trend is to conduct multidimensional studies, which collect data on GEs as well as other types of omics measurements on the same subjects. GE levels are regulated by CNVs, microRNAs, DNA methylation, and others, and thus regulators contain information on GEs. In individual-gene-based analysis, our group and others have shown that effectively extracting information from regulators can assist the analysis of GE data. Advancing from the existing studies, we will develop a novel ANGEA (Assisted Network-based Gene Expression Analysis) framework and a set of innovative methods. This study will be among the first to more effectively conduct network-based GE data analysis by ?borrowing information? from regulators. It consists of three tightly integrated aims.
(Aim 1) Develop novel assisted methods for identifying gene network modules and hubs. Advancing from the existing studies, we will construct a more comprehensive network which is composed of both GEs and their regulators. Novel regularization methods will be developed for constructing the network Laplacian and identifying modules and hubs.
(Aim 2) Develop an assisted method for building GE models for cancer outcomes and phenotypes. Significantly advancing from the existing studies, we will develop a novel method which directly incorporates regulators in GE modeling and explicitly borrows information in estimation and marker selection.
(Aim 3) Analyze data on multiple cancer types. Data will be collected from our own studies and public resources. With our unique expertise, we will first analyze data on the cancers of skin, lung, and lymph node. Data on other cancer types will also be analyzed. The analysis results will undergo extensive statistical and bioinformatics evaluations. We will conduct extensive comparisons with the alternatives. We will deliver a novel analysis framework and a set of competitive methods. Such methods, although developed for GE data, will also be applicable to the analysis of other types of data. With an equal emphasis on data analysis, this study will foster the research and clinical practice of multiple cancer types.

Public Health Relevance

For cancer gene expression (GE) studies that have also collected data on GE regulators, we will develop a novel ANGEA (Assisted Network-based Gene Expression Analysis) framework and a set of innovative methods. Data recently collected on melanoma, lung cancer, non-Hodgkin Lymphoma, and other cancer types will be analyzed, which will lead to a significant practical impact.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Small Research Grants (R03)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Yale University
Public Health & Prev Medicine
Schools of Medicine
New Haven
United States
Zip Code
Teran Hidalgo, Sebastian J; Ma, Shuangge (2018) Clustering multilayer omics data using MuNCut. BMC Genomics 19:198
Teran Hidalgo, Sebastian J; Zhu, Tingyu; Wu, Mengyun et al. (2018) Overlapping clustering of gene expression data using penalized weighted normalized cut. Genet Epidemiol 42:796-811