Relationship inference in large genetic data

Chen, Wei-Min

Abstract

Technological advances in high-throughput sequencing and custom genotyping arrays are making genetic studies larger than ever. The number of studies generating whole genome sequencing (WGS) data has increased substantially over the past few years, and the NHLBI alone is expected to generate WGS data on over ~30,000 individuals in the next year. Ongoing collections of exome sequence data are now approaching 200,000 subjects, and future NIH- and private-funded projects will soon generate WGS with similar sample size. There is a great need to make full use of the large amount of newly generated data, including a better way to identify and utilize relatedness information. Due to its computational efficiency, our relationship inference tool (KING) has been the main software tool to infer relationships in large genetic studies in the past few years. With the challenges and great opportunities provided by high-throughput genotyping, exome and whole genome sequence data, an even faster, more reliable and more powerful relationship inference procedure and tool is urgently needed. Such a tool would open possibilities to inform rare variant association beyond currently available approaches. We propose to develop robust and computationally efficient algorithms to infer close and distant family relationships in large datasets consisting of 1,000s-100,000s of individuals. The fast algorithm will allow identification of close relationships in large datasets consisting of >100,000 individuals, and the algorithms that are proposed specifically for the rare variant data from the WGS technology will allow us to infer relationships more reliably in the presence of inbreeding, population structure (including population admixture), and/or sample contamination, and also at a higher-order of degree. Further, we plan to develop an integrated toolset that is based on our fast relationship inference algorithms, such as pedigree reconstruction, Quality Control (QC), and family-based association methods. Preliminary data analysis shows our algorithm can identify all close relationships in the 1000 Genomes data in 12 seconds. We also successfully inferred an extended pedigree containing only distant (2nd- and 3rd-degree) relationships representing an aunt, her niece, and her first cousin. Our proposed methods will be implemented in freely distributed software (KING), allowing other investigators to apply the methods directly to analysis of their own sequencing and other high-throughput array data. We expect the relationship inference methods developed here will play an important role in the quality control and analysis of large sets of genetic/genomic data in the coming years.

Public Health Relevance

We propose to develop statistical methods and computationally efficient tools to infer cryptic family relationships, establish extended pedigrees/lineages, and increase power and resolution for mapping common and rare variants that contribute to human diseases.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 5R01HG008965-03
Application #: 9492397
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2016-07-21
Project End: 2021-05-31
Budget Start: 2018-06-01
Budget End: 2019-05-31
Support Year: 3
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: University of Virginia
Department: Public Health & Prev Medicine
Type: Schools of Medicine
DUNS #: 065391526

City: Charlottesville
State: VA
Country: United States
Zip Code: 22904

Related projects


NIH 2020 R01 HG	Relationship inference in large genetic data Chen, Wei-Min / University of Virginia
NIH 2019 R01 HG	Relationship inference in large genetic data Chen, Wei-Min / University of Virginia
NIH 2018 R01 HG	Relationship inference in large genetic data Chen, Wei-Min / University of Virginia
NIH 2017 R01 HG	Relationship inference in large genetic data Chen, Wei-Min / University of Virginia	$395,000
NIH 2016 R01 HG	Relationship inference in large genetic data Chen, Wei-Min / University of Virginia	$395,000

Comments

Be the first to comment on Wei-Min Chen's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: