Population Genetic Inferences from Dense Genotype Data

Clark, Andrew; Bustamante, Carlos; Nielsen, Rasmus

Abstract

Technological innovations arising from the HapMap Project have dramatically increased the speed and accuracy of genotyping while greatly reducing cost. Public and private efforts are beginning to release an unprecedented volume of human genotype and DNA sequence data into the public domain. In order to allow the best inferences about human variation and past human evolution from these data, we propose a series of investigations that center around four aims. First, we will develop novel statistical methods for population genetic inference from high-throughput DNA sequencing platforms. Pyrosequencing technology will generate assembled alignments that represent a sampling of sequence reads across individuals (multinomial) and across homologous chromosomes within an individual (binomial), producing a complex mixture. Inference of population genetic parameters from such data will demand novel statistical approaches, and we outline a set of plans to develop statistically rigorous methods. Second, we will develop methods for reverse-engineer the ascertainment biases of SNPs on widely used genotyping panels so as to enable population genetic inference. SNPs on the high-throughput genotyping platforms of Affymetrix and Illumina were ascertained in diverse and often irretrievable ways. Statistically sound population genetic inference from these data requires an understanding of the nature of the ascertainment bias of these platforms. We will reverse engineer the ascertainment by use of ENCODE and other dense resequence data, and use these inferences to perform ascertainment bias correction to high- density SNP platform data. Third, we will develop novel methods for inference of natural selection from patterns of haplotype diversity within and among human populations and apply these approaches to publicly available data sets. Methods of inference of natural selection from SNP frequency and haplotype diversity continue to gain in power and specificity. Optimization of these methods demands correction for effects of ascertainment, demographic effects, local variation in recombination, and for imputation of missing data and of haplotype phase. We will make use of Markov-Hidden Markov models for jointly estimating the magnitude, location, and age of selection sweeps. Finally, we will develop novel approaches for predicting the functional consequences of nucleotide substitutions in putatively functional regions of the human genome. Whole-genome association tests will gain power and specificity from the use of prior inference of the likelihood that a SNP has a damaging effect on a gene's function. In addition, after genome-wide association tests, there will follow extensive resequencing of candidate regions, and inference of the likelihood of deleterious effects of the many rare variants will also have utility. We propose methods that have advantages over existing approaches, making use of comparative genomic data, protein structure, cis-regulatory information, and patterns of segregating variation. Project Narrative: This project will develop methods of statistical inference from human DNA resequencing and SNP genotype data that will allow accurate estimation of critical parameters that describe the structure of variation in human populations. These inferences can provide vital clues to identifying genes that are associated with risk of complex genetic disorders.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 3R01HG003229-05S1
Application #: 7921193
Study Section: Genetic Variation and Evolution Study Section (GVE)
Program Officer: Brooks, Lisa

Project Start: 2009-09-30
Project End: 2012-06-30
Budget Start: 2009-09-30
Budget End: 2012-06-30
Support Year: 5
Fiscal Year: 2009
Total Cost: $419,259
Indirect Cost

Institution

Name: Cornell University
Department: Biochemistry
Type: Schools of Arts and Sciences
DUNS #: 872612445

City: Ithaca
State: NY
Country: United States
Zip Code: 14850

Related projects

Publications

Racimo, Fernando; Marnetto, Davide; Huerta-Sánchez, Emilia (2017) Signatures of Archaic Adaptive Introgression in Present-Day Human Populations. Mol Biol Evol 34:296-317

Racimo, Fernando; Gokhman, David; Fumagalli, Matteo et al. (2017) Archaic Adaptive Introgression in TBX15/WARS2. Mol Biol Evol 34:509-524

Poznik, G David; Xue, Yali; Mendez, Fernando L et al. (2016) Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat Genet 48:593-9

Slavney, Andrea; Arbiza, Leonardo; Clark, Andrew G et al. (2016) Strong Constraint on Human Genes Escaping X-Inactivation Is Modulated by their Expression Level and Breadth in Both Sexes. Mol Biol Evol 33:384-93

Henn, Brenna M; Botigué, Laura R; Peischl, Stephan et al. (2016) Distance from sub-Saharan Africa predicts mutational load in diverse human genomes. Proc Natl Acad Sci U S A 113:E440-9

Racimo, Fernando; Sankararaman, Sriram; Nielsen, Rasmus et al. (2015) Evidence for archaic adaptive introgression in humans. Nat Rev Genet 16:359-71

Hunter-Zinck, Haley; Clark, Andrew G (2015) Aberrant Time to Most Recent Common Ancestor as a Signature of Natural Selection. Mol Biol Evol 32:2784-97

Ma, Li; Keinan, Alon; Clark, Andrew G (2015) Biological knowledge-driven analysis of epistasis in human GWAS with application to lipid traits. Methods Mol Biol 1253:35-45

Rohlfs, Rori V; Aguiar, Vitor R C; Lohmueller, Kirk E et al. (2015) Fitting the Balding-Nichols model to forensic databases. Forensic Sci Int Genet 19:86-91

Henn, Brenna M; Botigué, Laura R; Bustamante, Carlos D et al. (2015) Estimating the mutation load in human genomes. Nat Rev Genet 16:333-43

Showing the most recent 10 out of 103 publications

Comments

Be the first to comment on Andrew Clark's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: