The overall goal of this project is to develop novel statistical methods for integrative analysis of genomic data in cancer research. We propose to develop analytical tools that can integrate data from multiple genomic platforms and incorporate external omic information from publically available databases. These tools will be applicable to both etiological studies geared toward causal discovery and to clinical and translational studies geared toward predictive modeling. Advances in high-throughput molecular technologies have enabled large-scale omic projects (e.g. Encode, The Cancer Genome Atlas, Epigenome Roadmap) to generate vast amounts of information on the structure, function and regulation of the genome. In addition to this publically available data, individual studies are increasingly generating multiplatform genomic profiles (e.g. genotypes, gene expression, methylation copy number variation, miRNA) to elucidate the complex mechanisms of cancer development and progression, and investigate determinants and predictors of health and clinical outcomes. Integration across these multiple genomic ?dimensions? and incorporation of the available external information can increase the ability to discovery causal relationships (e.g. Cancer-SNP associations), enhance prediction and prognosis modeling (e.g. cancer aggressiveness), and provide insights into biological mechanisms. We propose two analytic approaches aimed at addressing the challenges to effective integration across multiplatform genomic data and incorporation of external information from omic projects. The first approach (Aim 1) is a Bayesian regression and feature selection method that can integrate prior omic information in a very flexible manner allowing the data to `speak for itself' to determine which pieces of external information are relevant for the problem at hand. The method works with individual-level data and also with meta-analytic summaries, making it well suited for analyzing data from large multi-study consortia. The second approach (Aim 2) is a regularized regression and feature selection method for integrating multiplatform genomic features measured on the same set of individuals. The method is designed to scale to the very large numbers of features typical of genomewide platforms, to account for the different properties of each genomic data type, and to incorporate relevant external information to increase efficiency. Both approaches can be applied for causal discovery and for developing predictive and prognostic models. We will apply our methods to search for novel risk variants in the CORECT consortium of genome association studies, and to construct a prognostic model of CRC recurrence based on genomewide expression methylation data in the ColoCare consortium cohort of CRC patients. This work will provide new tools for analyzing high-dimensional multi-platform genomic that can take advantage of available external information.
Cancer results from a complex series of alterations of the structure, function, and regulation of the genome. Integration of information across these multiple genomic `dimensions' can provide insights into the development and progression of cancer and accelerate the discovery of novel biomarkers for prediction and prognosis. The goal of this project is to develop novel statistical methods for integrating multiple levels of genomic information to elucidate the complex mechanisms of cancer development and progression and to investigate the determinants and predictors of cancer clinical outcomes. We will apply these methods to two studies that have characterized germline and somatic variation in tumors, one of colorectal cancer patients followed for clinical outcomes, and one large consortium of colorectal cancer association studies.
Showing the most recent 10 out of 28 publications