The overall goal of this project is to develop novel statistical methods for integrative analysis of genomic data in cancer research. We propose to develop analytical tools that can integrate data from multiple genomic platforms and incorporate external omic information from publically available databases. These tools will be applicable to both etiological studies geared toward causal discovery and to clinical and translational studies geared toward predictive modeling. Advances in high-throughput molecular technologies have enabled large-scale omic projects (e.g. Encode, The Cancer Genome Atlas, Epigenome Roadmap) to generate vast amounts of information on the structure, function and regulation of the genome. In addition to this publically available data, individual studies are increasingly generating multiplatform genomic profiles (e.g. genotypes, gene expression, methylation copy number variation, miRNA) to elucidate the complex mechanisms of cancer development and progression, and investigate determinants and predictors of health and clinical outcomes. Integration across these multiple genomic ?dimensions? and incorporation of the available external information can increase the ability to discovery causal relationships (e.g. Cancer-SNP associations), enhance prediction and prognosis modeling (e.g. cancer aggressiveness), and provide insights into biological mechanisms. We propose two analytic approaches aimed at addressing the challenges to effective integration across multiplatform genomic data and incorporation of external information from omic projects. The first approach (Aim 1) is a Bayesian regression and feature selection method that can integrate prior omic information in a very flexible manner allowing the data to `speak for itself' to determine which pieces of external information are relevant for the problem at hand. The method works with individual-level data and also with meta-analytic summaries, making it well suited for analyzing data from large multi-study consortia. The second approach (Aim 2) is a regularized regression and feature selection method for integrating multiplatform genomic features measured on the same set of individuals. The method is designed to scale to the very large numbers of features typical of genomewide platforms, to account for the different properties of each genomic data type, and to incorporate relevant external information to increase efficiency. Both approaches can be applied for causal discovery and for developing predictive and prognostic models. We will apply our methods to search for novel risk variants in the CORECT consortium of genome association studies, and to construct a prognostic model of CRC recurrence based on genomewide expression methylation data in the ColoCare consortium cohort of CRC patients. This work will provide new tools for analyzing high-dimensional multi-platform genomic that can take advantage of available external information.

Public Health Relevance

Cancer results from a complex series of alterations of the structure, function, and regulation of the genome. Integration of information across these multiple genomic `dimensions' can provide insights into the development and progression of cancer and accelerate the discovery of novel biomarkers for prediction and prognosis. The goal of this project is to develop novel statistical methods for integrating multiple levels of genomic information to elucidate the complex mechanisms of cancer development and progression and to investigate the determinants and predictors of cancer clinical outcomes. We will apply these methods to two studies that have characterized germline and somatic variation in tumors, one of colorectal cancer patients followed for clinical outcomes, and one large consortium of colorectal cancer association studies.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Program Projects (P01)
Project #
5P01CA196569-05
Application #
9991771
Study Section
Special Emphasis Panel (ZCA1)
Project Start
Project End
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
5
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Southern California
Department
Type
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089
Ryser, Marc D; Min, Byung-Hoon; Siegmund, Kimberly D et al. (2018) Spatial mutation patterns as markers of early colorectal tumor cell mobility. Proc Natl Acad Sci U S A 115:5774-5779
Liu, Jie; Liang, Gangning; Siegmund, Kimberly D et al. (2018) Data integration by multi-tuning parameter elastic net regression. BMC Bioinformatics 19:369
Moss, Lilit C; Gauderman, William J; Lewinger, Juan Pablo et al. (2018) Using Bayes model averaging to leverage both gene main effects and G?×? E interactions to identify genomic regions in genome-wide association studies. Genet Epidemiol :
McAllister, Kimberly; Mechanic, Leah E; Amos, Christopher et al. (2017) Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 186:753-761
Raskin, Leon; Guo, Yan; Du, Liping et al. (2017) Targeted sequencing of established and candidate colorectal cancer genes in the Colon Cancer Family Registry Cohort. Oncotarget 8:93450-93463
Ritchie, Marylyn D; Davis, Joe R; Aschard, Hugues et al. (2017) Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions. Am J Epidemiol 186:771-777
Patel, Chirag J; Kerr, Jacqueline; Thomas, Duncan C et al. (2017) Opportunities and Challenges for Environmental Exposure Assessment in Population-Based Studies. Cancer Epidemiol Biomarkers Prev 26:1370-1380
Thomas, Paul D (2017) The Gene Ontology and the Meaning of Biological Function. Methods Mol Biol 1446:15-24
Manrai, Arjun K; Cui, Yuxia; Bushel, Pierre R et al. (2017) Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health. Annu Rev Public Health 38:279-294
Marconett, Crystal N; Zhou, Beiyun; Sunohara, Mitsuhiro et al. (2017) Cross-Species Transcriptome Profiling Identifies New Alveolar Epithelial Type I Cell-Specific Genes. Am J Respir Cell Mol Biol 56:310-321

Showing the most recent 10 out of 28 publications