Scalable methods for identity by descent

Zhi, Degui; Zhang, Shaojie

Abstract

In the next a few years, large genotyped cohorts are becoming available (e.g., TOPMed, UK biobank, All of Us, Million Veteran Program). With the sample size approaches 0.1%-1% of the total population size, extensive distant relatives and Identity-by-descent, or IBD information are represented in such samples. Such information will enable more sophisticated and powerful genetics analysis beyond single variant-based analyses. However, current informatics methods are not equipped with the efficiency to handle genotype data of that scale. We will develop new genome informatics methods for biobank-scale cohorts with genotypes. We have developed an efficient tool, RaPID, the first computationally feasible method for inferring IBD segments among individuals in a biobank-scale cohort. We demonstrated that RaPID achieves running time linear to the sample size and is over 100 times faster than existing methods. At the same time, RaPID detects a greater number of IBDs, with higher accuracy, and sharper segment boundaries than existing methods. In this application, we propose to develop (1) the RaPID+ method for pairwise IBD detection that can tolerate and correct phasing errors, with a principled way of parameter tuning, and can work with genotype data across sequencing and array platforms; (2) the RaPID-diploid method for detection of IBD2 segments; (3) the RaPID-multiway method that identifies IBD Cluster; and (4) the RaPID-ancestry method for local ancestry inference across subcontinental populations. Methods will be rigorously tested in simulations using realistic population demographic models as well as real data from large cohorts. All methods will be implemented as free software for academic use. This project will advance genetic research by developing efficient informatics tools that reveal detailed genetic relationships in very large genotyped cohorts.

Public Health Relevance

(Public Health Relevance Statement) The aim of the project is to develop and evaluate accurate and efficient methods and tools for detecting Identity-by-Descent (IBD) and local ancestry information in large genotyped cohorts, resources of increasing importance in the era of precision medicine. If successful, this project will advance genetic research by offering efficient informatics tools to researchers that can reveal detailed genetic relationships in very large genotyped cohorts.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG010086-01
Application #: 9501260
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Brooks, Lisa

Project Start: 2018-06-01
Project End: 2022-03-31
Budget Start: 2018-06-01
Budget End: 2019-03-31
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: University of Texas Health Science Center Houston
Department
Type: Sch Allied Health Professions
DUNS #: 800771594

City: Houston
State: TX
Country: United States
Zip Code: 77030

Related projects


NIH 2020 R01 HG	Scalable methods for identity by descent Zhi, Degui; Zhang, Shaojie / University of Texas Health Science Center Houston
NIH 2019 R01 HG	Scalable methods for identity by descent Zhi, Degui; Zhang, Shaojie / University of Texas Health Science Center Houston
NIH 2018 R01 HG	Scalable methods for identity by descent Zhi, Degui; Zhang, Shaojie / University of Texas Health Science Center Houston

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: