The shift in attention toward rare alleles and the concomitant need for DNA sequence data from large samples drawn from human populations has driven the need to accurately describe the patterns of human DNA sequence variation and to understand the forces that impact it. At the same time, SNP genotyping platforms are expanding in SNP density at the same time unprecedented sample sizes are accumulating from GWAS studies. In order to foster rigorous inferences about human variation and past human evolution from these data, we propose a series of investigations that center around four aims. First, we will develop novel statistical methods for population genetic inference from next-generation DNA sequence data. Starting from alignments of sequence reads from multiple individuals, we will develop methods of parameter estimation and hypothesis testing that integrate over likelihoods of genotypes conditional on the data. Optimal balance of sample size vs. sequencing coverage will be analyzed for several distinct experimental problems. The methods will be thoroughly tested and applied to several resequencing data sets to which we have access. Second, we will develop and extend methods for ancestry inference from SNP and genome sequence data of admixed individuals and employ them to infer past demographic history, including migration. Our method of ancestry inference based on Principal Components Analysis will be extended to accommodate data uncertainty and ascertainment bias. We will model a range of admixture scenarios from single-pulse to continuous influx in order to determine whether genetic data allows more refined inference of the past history of mixing of two ancestral populations. Third, we will develop methods for estimation of joint IBD relationships across multiple individuals. Existing methods take discrete genotype calls as a starting point, and do not accommodate platform-specific error. There is significant need to develop methods for inference of shared IBD regions genome-wide across multiple individuals in large population samples. Through a combination of heuristic approaches and graph- theory based computational algorithms, we will develop and test such methods. Finally, we will use IBD sharing inferred across individuals in a sample to estimate population genetic parameters in models of demography and selection. Just as demographic changes impact the site frequency spectrum of SNPs, so too will they impact the pattern of IBD sharing in a sample. Turning this problem around, we will develop approaches for inference of population genetic parameters, such as demography, rates of inbreeding, levels of purifying and positive selection, admixture and migration based only on the patterns of IBD sharing. These will be contrasted to approaches that use phased haplotype information for demography inference.

Public Health Relevance

This project aims to understand the population-level forces at play on the human genome by analysis of genome-wide SNP data and next-generation sequences using newly developed statistical methods. Estimation of model parameters from alignments of next-generation sequence reads will be done so as to accommodate base-calling uncertainty, and segment-wise inference of ancestry in admixed genomes will be applied to understand past admixture history. Identity-by-descent methods will be pursued to allow the most reliable inferences about demography, natural selection and other population forces acting on human genetic variation.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG003229-09
Application #
8511770
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Brooks, Lisa
Project Start
2004-05-21
Project End
2014-06-30
Budget Start
2013-07-01
Budget End
2014-06-30
Support Year
9
Fiscal Year
2013
Total Cost
$805,945
Indirect Cost
$83,658
Name
Cornell University
Department
Biochemistry
Type
Schools of Earth Sciences/Natur
DUNS #
872612445
City
Ithaca
State
NY
Country
United States
Zip Code
14850
Raghavan, Maanasa; Skoglund, Pontus; Graf, Kelly E et al. (2014) Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505:87-91
Bhaskar, Anand; Clark, Andrew G; Song, Yun S (2014) Distortion of genealogical properties when the sample is very large. Proc Natl Acad Sci U S A 111:2385-90
Liang, Mason; Nielsen, Rasmus (2014) The lengths of admixture tracts. Genetics 197:953-67
Kidd, Jeffrey M; Sharpton, Thomas J; Bobo, Dean et al. (2014) Exome capture from saliva produces high quality genomic and metagenomic data. BMC Genomics 15:262
Moreno-Estrada, Andrés; Gignoux, Christopher R; Fernández-López, Juan Carlos et al. (2014) Human genetics. The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science 344:1280-5
Key, Felix M; Peter, Benjamin; Dennis, Megan Y et al. (2014) Selection on a variant associated with improved viral clearance drives local, adaptive pseudogenization of interferon lambda 4 (IFNL4). PLoS Genet 10:e1004681
Huerta-Sánchez, Emilia; Jin, Xin; Asan et al. (2014) Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512:194-7
Fumagalli, Matteo; Vieira, Filipe G; Linderoth, Tyler et al. (2014) ngsTools: methods for population genetics analyses from next-generation sequencing data. Bioinformatics 30:1486-7
Gazave, Elodie; Ma, Li; Chang, Diana et al. (2014) Neutral genomic regions refine models of recent rapid human population growth. Proc Natl Acad Sci U S A 111:757-62
Liu, Shiping; Lorenzen, Eline D; Fumagalli, Matteo et al. (2014) Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears. Cell 157:785-94

Showing the most recent 10 out of 59 publications