Haplotype-based analysis methods for population genomics

Novembre, John

Abstract

This project will develop a series of computational tools that exploit the power of haplotype-based models for the analysis of population genomics data. The development of such tools is particularly important as advances in sequencing have now made it routine for sequence data to be gathered across full chromosomes. The multi-locus patterns of linkage disequilibrium that are present in haplotype data are informative about a range of important processes in population genetics. Leveraging the information in haplotypes is methodologically challenging, and for many specific problems the appropriate analysis tools do not yet exist. In response, our research will develop haplotype-based models in four major directions. First, we will develop haplotype-based models to infer recombination rates using genetic data from admixed individuals. The key principle is that ancestry switch points in admixed individuals can be used to infer recent recombination events. Our work will produce a software package for inference of recombination rates based on genome-wide single-nucleotide polymorphism data, and a separate simulation package for generating data with which to test the method. A key innovation will be developing and testing a version of this approach that can handle multi-way (>2 source population) admixtures. Second, we will use haplotype-informed approaches to improve the power of complex trait mapping approaches based on the """"""""evolve and resequence"""""""" paradigm. The improvement in power will come from using haplotype information embedded in the raw read data from pooled sequencing experiments. Again we will develop both inference software and simulations to test the inference methods. Third, we will investigate to what extent purifying selection has shaped haplotype diversity in human populations. The expectation is that segregating deleterious variants will show reduced haplotype diversity, much as adaptive variants do. This signature has largely been unexplored and we will develop theoretical, empirical, and simulation-based approaches to establish whether this property exists and how it can be used to infer the strength of purifying selection in human population genetic data. Finally, we will derive a novel form of the conditional sampling distribution (CSD) for a haplotype. The application of CSDs in population genetics has been very fruitful, even though the approach is in its infancy. We will develop an approach that leads to a more accurate CSD. The new CSD will also open the door to extensions for computing haplotype probabilities in models with non-equilibrium demography and/or population structure. Throughout the project there will be an emphasis on software development for the broader population genomics community, and on overcoming computational and algorithmic challenges that arise commonly with haplotype-based models. The contributions are essential for pushing forward population genetics into the genomic era. Project Relevance This project will contribute to the basic toolkit population geneticists use to extract information from large genomic datasets and will enhance research on a number of applied areas with practical relevance. In particular we will develop tools that empower researchers to measure recombination, map complex traits, and understand the fitness consequences of human genetic variation. These areas are relevant to disease trait mapping, genetic disease etiology, and historical demography. Finally, we expect the algorithms developed will be useful either directly or with minor adjustment to closely related problems beyond those detailed in the project. As an example, our algorithms for haplotype frequency estimation in pooled sequences are closely related to problems for identifying the abundance of pathogenic strains in sequencing of blood DNA.

Public Health Relevance

The proposed research will develop a series of computational tools that exploit the power of explicit haplotype- based models for the analysis of population genomics data. The applications of these tools will empower efforts to (1) estimate recombination in admixed populations, (2) map the genetic basis of complex traits using the evolve and resequence paradigm, (3) quantify purifying selection in human populations and (4) improve basic models of haplotype variation.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG007089-01
Application #: 8422889
Study Section: Genetic Variation and Evolution Study Section (GVE)
Program Officer: Brooks, Lisa

Project Start: 2013-01-01
Project End: 2013-02-28
Budget Start: 2013-01-01
Budget End: 2013-02-28
Support Year: 1
Fiscal Year: 2013
Total Cost: $40,217
Indirect Cost: $14,102

Institution

Name: University of California Los Angeles
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 092530369

City: Los Angeles
State: CA
Country: United States
Zip Code: 90095

Related projects


NIH 2017 R01 HG	Haplotype-based analysis methods for population genomics Novembre, John / University of Chicago
NIH 2016 R01 HG	Haplotype-based analysis methods for population genomics Novembre, John / University of Chicago	$284,400
NIH 2015 R01 HG	Haplotype-based analysis methods for population genomics Novembre, John / University of Chicago	$308,100
NIH 2014 R01 HG	Haplotype-based analysis methods for population genomics Novembre, John / University of Chicago	$284,400
NIH 2013 R01 HG	Haplotype-based analysis methods for population genomics Novembre, John / University of California Los Angeles	$40,217
NIH 2013 R01 HG	Haplotype-based analysis methods for population genomics Novembre, John / University of Chicago	$252,016

Publications

Chiang, Charleston W K; Marcus, Joseph H; Sidore, Carlo et al. (2018) Genomic history of the Sardinian population. Nat Genet 50:1426-1434

Smith, Joel; Coop, Graham; Stephens, Matthew et al. (2018) Estimating Time to the Common Ancestor for a Beneficial Allele. Mol Biol Evol 35:1003-1017

Wong, Emily H M; Khrunin, Andrey; Nichols, Larissa et al. (2017) Reconstructing genetic history of Siberian and Northeastern European populations. Genome Res 27:1-14

van den Berg, Marten E; Warren, Helen R; Cabrera, Claudia P et al. (2017) Discovery of novel heart rate-associated loci using the Exome Chip. Hum Mol Genet 26:2346-2363

Novembre, John; Peter, Benjamin M (2016) Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 41:98-105

Peter, Benjamin M (2016) Admixture, Population Structure, and F-Statistics. Genetics 202:1485-501

Chiang, Charleston W K; Ralph, Peter; Novembre, John (2016) Conflation of Short Identity-by-Descent Segments Bias Their Inferred Length Distribution. G3 (Bethesda) 6:1287-96

Han, Eunjung; Sinsheimer, Janet S; Novembre, John (2015) Fast and accurate site frequency spectrum estimation from low coverage sequence data. Bioinformatics 31:720-7

Kessner, Darren; Novembre, John (2015) Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits. Genetics 199:991-1005

Sidore, Carlo; Busonero, Fabio; Maschio, Andrea et al. (2015) Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat Genet 47:1272-1281

Showing the most recent 10 out of 22 publications

Comments

Be the first to comment on John Novembre's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: