Although genome-wide association studies (GWAS) have been extremely successful in identifying numerous risk loci for complex traits and diseases, at the vast majority of these loci, the causal mechanism between genetic variation and disease risk remains largely unknown. This prohibits the development of novel drug targets, personalized treatments or accurate prediction of high-risk individuals. In the quest to address this gap, post-GWAS studies are experiencing a ?big data? revolution driven by the exponentially decreasing costs of high-throughput genomic assays. Multiple layers of data (genetic variation, transcriptome levels, epigenetic modifications, localization of tissue-specific regulatory sites, etc.) are routinely collected in increasingly large cohorts of individuals. This raises the need for new computational and statistical methods that are able to integrate various types of data (genetic, epigenetic, transcriptomic) to understand the causal mechanism of disease at GWAS risk loci. Here we propose to develop new methods and techniques and to apply them to gain insights to the genetic basis of common disease and traits. Importantly, we aim to circumvent genomic privacy issues (that often prohibit access to large-scale GWAS data) by proposing techniques that operate directly at the summary statistic level (e.g. variant effect sizes). We will apply existing and newly developed methods on GWAS summary data sets over 30 traits and diseases spanning more than 1,000,000 phenotype measurements, joint with a catalogue of over 7,000 biochemical and evolutionary genetic metrics of functionality as well as over 10,000 individuals for which genetic variation, gene expression and disease status has been measured.

Public Health Relevance

Genetic studies of common diseases are experiencing a experiencing a ?big data? revolution driven by the exponentially decreasing costs of high-throughput genomic assays. Multiple layers of data (genetic variation, gene expression levels, localization of tissue-specific regulatory sites, etc.) are routinely collected in increasingly large cohorts of individuals, raising the need for new computational and statistical methods. In this proposal we will develop new techniques and apply them to large-scale empirical data to gain insights into genetic basis of common disease and traits.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG009120-02
Application #
9442816
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Ramos, Erin
Project Start
2017-03-01
Project End
2022-02-28
Budget Start
2018-03-01
Budget End
2019-02-28
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of California Los Angeles
Department
Pathology
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Brown, Robert; Kichaev, Gleb; Mancuso, Nicholas et al. (2017) Enhanced methods to detect haplotypic effects on gene expression. Bioinformatics 33:2307-2313
Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé et al. (2017) Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. Am J Hum Genet 100:473-487
Shi, Huwenbo; Mancuso, Nicholas; Spendlove, Sarah et al. (2017) Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits. Am J Hum Genet 101:737-751