Despite decades of research, much of the genetic heritability of human disease remains unmapped to susceptibility loci; and many gene-phenotype effects do not neatly fit the patterns of heterogeneity required for well-powered analysis by GWAS nor family-based methods. Some genetic factors that contribute to disease fall on a detectable, shared haplotypic background, yet have an appreciable population frequency due to modest effects on disease risk. In such cases, analyses that utilize segmental sharing patterns in distant relatives, such as identity-by-descent (IBD) mapping, are optimal for disease-gene discovery. This approach has the advantage of allowing for: lower allele frequency of causal factors and higher allelic heterogeneity than GWAS, and lower penetrance, more modest effect sizes, and higher genetic heterogeneity than linkage. Additionally, the creation of large shared segment repositories allows for the identification of people who carry haplotypes known to harbor rare risk variants, enabling efficient uses of targeted sequencing for evaluating the effects of rare variants. Building on tools that we have developed as well as others', we propose the following aims to leverage genetic relatedness estimation and shared segments in big data environments: 1) Create a resource of shared segments in two large DNA biobanks. We will employ efficient and highly scalable software architecture to automate analyses of relatedness from genetic data, including deep and accurate relationship estimation and pedigree-aware shared segment detection across heterogeneous genetic data types. Existing and novel approaches will be employed in BioVU and BioME, two large EHR-linked DNA databanks to create shared segment repositories for use by the scientific community. Our analytic framework will improve scalability and support a variety of standard output formats to integrate with downstream analyses. 2) IBD mapping phenome-wide. Shared segments provide an opportunity to recover power to detect a tranche of disease-causing variants that contribute to the missing heritability of traits. Furthermore, we will establish the effect of genetic dysregulation of genes in regions significantly enriched with shared segments phenome-wide. 3) Demonstrate the utility of shared segments for identifying likely carriers of causal variants in cancer predisposition genes. We will identify individuals in BioVU and BioME likely to harbor pathogenic variants in known cancer predisposition genes by matching IBD segments shared between biorepository participants and cancer cases sequenced at MD Anderson (N>10,000) and performing follow-up genotyping of the loci to directly assess the clinical significance of the variants using the full EHR.
Each aim represents an innovative approach to data utilization in large EHR-linked DNA databanks, and the creation of shared resources that will fuel future research. Collectively, our aims map a path towards efficient and affordable novel disease-gene discovery using shared segments.

Public Health Relevance

Genomic segments shared due to relatedness represent an untapped resource for disease gene mapping and identifying people likely to carry rare mutations. We have developed some of the most popular and powerful tools for accurate relatedness detection. We propose to build on our tools and others' to create large-scale shared segment data repositories from electronic health record-linked DNA databanks and identify genes that impact disease risk, phenome-wide.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM133169-02
Application #
10021033
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
2019-09-19
Project End
2023-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Vanderbilt University Medical Center
Department
Type
DUNS #
079917897
City
Nashville
State
TN
Country
United States
Zip Code
37232