Over the past decade, thousands of genome-wide association studies (GWAS) have been performed, greatly improving our understanding of the genetic origins of complex diseases. A large number of variants have been associated with individual traits, but a complete understanding of complex disease remains elusive, due in large part to two unsolved challenges. First, a majority of associated variants are noncoding and distant from the nearest gene, complicating their interpretation. Second, the observed heritability of many complex diseases far exceeds the portion which can be explained by GWAS-discovered variants, largely because of the combined effects of rare variants unprobed by current techniques and common variants falling below significance thresholds of existing GWAS methods. As whole-genome sequencing and rare variant discovery become increasingly prevalent, frameworks for functionally annotating rare variants and associating them with disease-associated driver genes and pathways will become increasingly important. A wealth of public epigenetic data exists, including collections of chromatin modification profiles and 3D structure data from various Common Fund sources as well as external consortia. In combination with whole-genome sequencing data, these datasets offer great potential to further our understanding of diseases across the spectrum from Mendelian to complex diseases. As members of the ENCODE Project, we have developed the Registry of candidate cis-Regulatory Elements (cCREs), a collection of nearly a million candidate enhancers, promoters, and insulators in the human genome with activity profiles in more than 800 human cell types. In parallel, we collaborated with Prof. Xihong Lin on the development of variant-Set Test for Association using Annotation infoRmation (STAAR), a framework for performing rare-variant association tests using functional annotations and a dynamic weighting scheme. Here we aim to extend the Registry of cCREs to include gene regulatory networks, including gene-enhancer links, 3D chromatin neighborhoods, co-expressed gene networks, and biochemical pathways, drawing on data from the Common Fund, including GTEx and the 4DNucleome Project, and other public sources (Aim 1). We then aim to extend GWAS and the STAAR methodology to incorporate these higher-order features to identify novel gene regulatory network associations with disease-associated rare variants (Aim 2). In this study, we will focus on three human congenital disorders, cleft lip/palate (CLP), congenital diaphragmatic hernia (CDH), and ventricular septal defect (VSD), as these disorders have extensive whole-genome sequencing data by the Gabriella Miller Kids First Pediatric Research Consortium. We will validate our results using Knockout Mouse Phenotyping Program (KOMP2). In summary, we will discover new disease-gene associations, produce a framework broadly applicable to existing and future whole-genome sequencing datasets, and improve the utility and accessibility of four select Common Fund datasets (GTEx, 4DNucleome, KOMP2, and Kids First).

Public Health Relevance

Genome-wide association studies are increasingly moving toward rare variant discovery and whole-genome sequencing in order to discover sources of missing heritability and identify novel genetic contributors to complex diseases, and computational frameworks for functionally annotating rare variants will be critical to derive meaningful biological findings from these datasets as they are produced. We propose to build upon our existing computational methods and resources, integrating the wealth of Common Fund and public genomic and epigenomic data to develop a comprehensive, novel framework for linking rare variants with gene regulatory networks. We will pilot this framework by analyzing datasets for three congenital human disorders from the Kids First collection, aiming to discover novel gene regulatory associations with these disorders and further our understanding of their genetic origins.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Type
Small Research Grants (R03)
Project #
1R03OD030608-01
Application #
10112018
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Kinsinger, Christopher
Project Start
2020-09-18
Project End
2021-08-31
Budget Start
2020-09-18
Budget End
2021-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Massachusetts Medical School Worcester
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
603847393
City
Worcester
State
MA
Country
United States
Zip Code
01655