Our understanding of cancer etiology has increasingly motivated the need for investigations that are simulta- neously pan-omics and pan-cancer. Pan-omics de?nes the integration of data from multiple high-dimensional platforms that capture different molecular components, which is needed because changes to cellular function that affect cancer development can occur at the level of the genome, transcriptome, translatome, proteome, or epigenome. Pan-cancer de?nes the integrative analysis of data from multiple tissue-of-origin or histolog- ical cancer types, which is needed because clinically relevant molecular alterations are often shared across cancer types. Recent statistical advances have facilitated pan-omics analyses (i.e., vertical integration) of a single cancer type and pan-cancer analyses (i.e., horizontal integration) of a single platform. Analyses that are simultaneously pan-omics and pan-cancer have tremendous potential to completely and powerfully character- ize molecular heterogeneity in cancer, but require new statistical approaches for the principled bi-dimensional integration of such data. We will develop a framework for bi-dimensional integration that allows for features that are shared across multiple omics platforms and cancer types, as well as features that are unique to a particular platform or cancer type. This framework extends Joint and Individual Variation Explained (JIVE) and related methods for unidimensional (e.g., vertical or horizontal) integration, to allow for dimension reduction, visualization, and molecular characterization of pan-omics pan-cancer data. Our motivating application is The Cancer Genome Atlas (TCGA) project, which is the most comprehensive and well-curated study of the cancer genome with data for 6 different omics platforms from 11,000 patients representing 33 different cancer tumor types. We will use our novel methodology to characterize molecular cancer heterogeneity from the entire TCGA database, and we will use these results to develop a comprehensive model for patient survival to re?ne traditional pathological diagnoses. Our team is quali?ed to undertake this challenging and impactful project, with expertise in statistical data integration (Dr. Lock), large-scale computing and bioinformatics (Dr. Myers), and cancer genomics and TCGA (Dr. Hoadley). Our methods will be useful for other (e.g., non-cancer) studies involving genomics data, and we will implement them in free, open-source and easily accessible software to facilitate their use by other researchers and practitioners.

Public Health Relevance

To understand the clinical heterogeneity of cancer it is ?rst necessary to characterize the molecular hetero- geneity of cancer. We will develop a framework for the complete characterization of molecular heterogeneity across different cellular components and different types of cancer, which will enhance our scienti?c under- standing of cancer etiology and facilitate more powerful models for important clinical outcomes such as patient survival.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21CA231214-02
Application #
9765282
Study Section
Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer
Li, Jerry
Project Start
2018-09-01
Project End
2020-08-31
Budget Start
2019-09-01
Budget End
2020-08-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
555917996
City
Minneapolis
State
MN
Country
United States
Zip Code
55455