Fast and accurate phasing using the positional Burrows-Wheeler transform (PBWT)

Price, Alkes

Abstract

Phasing, defined as the estimation of haplotypes from diploid genotype data, is a fundamental problem in medical and population genetics. Phasing is a key preprocessing step for genotype imputation algorithms employed in genome-wide association studies of diseases and complex traits, and is also important for mapping molecular QTL using allele-specific reads, detecting clonal mosaicism, inferring population structure, and detecting natural selection. Considerable resources have been invested into developing accurate phasing algorithms, but currently, unsolved challenges include: (i) incorporating large reference panels, such as the Haplotype Reference Consortium, to improve phasing accuracy (reference-based phasing), and (ii) phasing extremely large cohorts using within-cohort data (cohort-based phasing). Here, we propose an exploratory two-year research program, in which we will develop methods and software for both reference-based phasing, and cohort-based phasing, using a new data structure based on the Positional Burrows-Wheeler Transform (PBWT).
We aim to make fast and accurate phasing methods and software freely available to all researchers via public phasing servers. We will also explore the early and conceptual stages of developing PBWT-based methods for reference-based imputation as well. Our team has multiple strengths: our statistical and computational expertise; our track record of producing practical, publicly-available software packages for a broad range of applications in statistical genetics that are widely used by the community, and our data-driven approach to methods research. We will guide our methods development using data from 500,000 samples from the UK Biobank, and will work closely with the Haplotype Reference Consortium (see letters of support).

Public Health Relevance

Statistical phasing, defined as the use of statistical methods to partition an individual's genome into its maternal and paternal components, is a problem of fundamental importance in medical genetics. Association studies that associate genetic variants to disease make use of statistical phasing in order to produce a more complete and accurate catalog of the genetic variants that each individual in the study contains. In this proposal, we will develop new statistical methods for conducting statistical phasing in very large data sets that are faster and more accurate than previous methods, helping association studies to succeed.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21HG009513-01
Application #: 9293785
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Brooks, Lisa

Project Start: 2017-06-03
Project End: 2019-05-31
Budget Start: 2017-06-03
Budget End: 2018-05-31
Support Year: 1
Fiscal Year: 2017
Total Cost: $238,438
Indirect Cost: $88,438

Institution

Name: Harvard University
Department: Public Health & Prev Medicine
Type: Schools of Public Health
DUNS #: 149617367

City: Boston
State: MA
Country: United States
Zip Code: 02115

Related projects


NIH 2018 R21 HG	Fast and accurate phasing using the positional Burrows-Wheeler transform (PBWT) Price, Alkes L. / Harvard University
NIH 2017 R21 HG	Fast and accurate phasing using the positional Burrows-Wheeler transform (PBWT) Price, Alkes L. / Harvard University	$238,438

Publications

Loh, Po-Ru; Genovese, Giulio; Handsaker, Robert E et al. (2018) Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559:350-355

Comments

Be the first to comment on Alkes Price's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: