Methods for inference of complex demography and selection from genomic data

Song, Yun

Abstract

Recent advances in sequencing technology has fundamentally transformed population genetics. Using whole-genome DNA sequence data, population geneticists now hope to estimate jointly many parameters of interest in complex models of evolution involving multiple populations. However, many of the statistical methods available for population genetic analyses are not scalable to the whole genome level. The main objective of the research proposed here is to develop a suite of mathematical, statistical, and computational methods that will allow researchers to more fully take advantage of the availability of genomic data. Likelihood-based approaches that take linkage information into account utilize more information in the data than do methods based on summary statistics and should therefore be more statistically efficient. However, they tend to require intensive computation, thus limiting their applicability.
Aim 1 will develop a new statistical approach to ful-likelihood inference that can be applied at the genomic scale using many more sequences than previously possible. The distribution of segments of shared genetic similarity, i.e., segments of identity-by-descent or identity-by-state, contain important information about past demography and selection.
Aims 2 and 3,will derive new theoretical results concerning such information and apply them to develop new statistical methods to tackle challenging problems such as the estimation of admixture proportions and admixture times, and inference of admixed DNA tracts. Recently, there has been much interest in using allele frequency spectra to estimate parameters in complex demography models.
Aim 4 will develop efficient methods based on coalescent theory to compute the expected joint allele frequency spectra for more populations than could be previously considered. The use of the Wright-Fisher diffusion is ubiquitous in population genetics as a model for the forwards-in-time dynamics of the frequency of an allele in a large population. There are several population genetic applications in which it is natural to study the associated diffusion bridge.
Aim 5 will investigate methods for simulating diffusion bridges in the presence of selection and obtain analytic results on the distribution of important functionals of the bridge path.

Public Health Relevance

Understanding human genome variation is crucial for mapping diseases and for individualized genome-based intervention and treatment. This project will provide mathematical and statistical infrastructure that will allow researchers to gain a bettr understanding of the processes that have shaped human genomic variation. The fundamental mathematical framework developed here is also expected to spur the development of new methods for disease mapping and prediction.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM109454-01
Application #: 8639647
Study Section: Special Emphasis Panel (ZGM1-BBCB-5 (BM))
Program Officer: Eckstrand, Irene A

Project Start: 2013-09-01
Project End: 2017-06-30
Budget Start: 2013-09-01
Budget End: 2014-06-30
Support Year: 1
Fiscal Year: 2013
Total Cost: $308,590
Indirect Cost: $107,101

Institution

Name: University of California Berkeley
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2016 R01 GM	Methods for inference of complex demography and selection from genomic data Song, Yun S. / University of California Berkeley
NIH 2015 R01 GM	Methods for inference of complex demography and selection from genomic data Song, Yun S. / University of California Berkeley	$292,202
NIH 2014 R01 GM	Methods for inference of complex demography and selection from genomic data Song, Yun S. / University of California Berkeley
NIH 2013 R01 GM	Methods for inference of complex demography and selection from genomic data Song, Yun S. / University of California Berkeley	$308,590

Publications

Evans, Steven N; Lanoue, Daniel (2018) RECOVERING A TREE FROM THE LENGTHS OF SUBTREES SPANNED BY A RANDOMLY CHOSEN SEQUENCE OF LEAVES. Adv Appl Math 96:39-75

Rosen, Zvi; Bhaskar, Anand; Roch, Sebastien et al. (2018) Geometry of the Sample Frequency Spectrum and the Perils of Demographic Inference. Genetics 210:665-682

Choi, Hye Soo; Evans, Steven N (2017) DOOB-MARTIN COMPACTIFICATION OF A MARKOV CHAIN FOR GROWING RANDOM WORDS SEQUENTIALLY. Stoch Process Their Appl 127:2428-2445

Evans, Steven N; Molchanov, Ilya (2017) THE SEMIGROUP OF METRIC MEASURE SPACES AND ITS INFINITELY DIVISIBLE PROBABILITY MEASURES. Trans Am Math Soc 369:1797-1834

Kamm, John A; Terhorst, Jonathan; Song, Yun S (2017) Efficient computation of the joint sample frequency spectra for multiple populations. J Comput Graph Stat 26:182-194

Schraiber, Joshua G; Evans, Steven N; Slatkin, Montgomery (2016) Bayesian Inference of Natural Selection from Allele Frequency Time Series. Genetics 203:493-511

Steinrücken, Matthias; Jewett, Ethan M; Song, Yun S (2016) SpectralTDF: transition densities of diffusion processes with time-varying selection parameters, mutation rates and effective population sizes. Bioinformatics 32:795-7

Harris, Kelley; Nielsen, Rasmus (2016) The Genetic Cost of Neanderthal Introgression. Genetics 203:881-91

Harris, Kelley (2015) Evidence for recent, population-specific evolution of the human mutation rate. Proc Natl Acad Sci U S A 112:3439-44

Evans, Steven N; Hening, Alexandru; Schreiber, Sebastian J (2015) Protected polymorphisms and evolutionary stability of patch-selection strategies in stochastic environments. J Math Biol 71:325-59

Showing the most recent 10 out of 15 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: