Recent advances in sequencing technology is enabling fast and cost-effective generation of sequence data. Soon, whole-genome sequencing will become a routine assay, opening up new opportunities for biomedical research and related fields. The overall objective of this proposal is to develop accurate, scalable computational methods for whole-genome variation study, both by developing a new theoretical framework for studying population genetics models and by improving the key mathematical component underlying various statistical tools for genome analysis. A common thread that runs through this proposal is recombination. Both crossover and gene conversion recombinations will be considered.
The specific aims of the proposed research are: (1) Develop and apply a new theoretical framework that complements the standard coalescent theory when the rate of recombination is moderate to large, thus providing a new set of useful analytic tools to the population genomics community. (2) Devise principled methods to derive accurate multi-locus conditional sampling distributions directly from the underlying population genetics model. Improve the accuracy of a wide range of statistical methods for genome analysis that utilize conditional sampling distributions. (3) Develop scalable computational methods for joint estimation of crossover rates, gene conversion rates, and mean conversion tract lengths from population SNP data. Incorporate realistic biological scenarios into a model with overlapping gene conversions and extend the model to handle ectopic (or non-allelic) gene conversions in multigene families. The above goals will be achieved not by engineering modifications based on intuition or simulations, but by applying and generalizing recent mathematical results that are rigorous and accurate. The new theoretical framework developed in this research will allow one to carry out analytic computation, which was considered to be intractable in the standard coalescent theory with recombination. Furthermore, mathematically justified approximations based on diffusion processes will be devised and portable software packages will be developed for population genomics analysis.

Public Health Relevance

The ongoing large-scale sequencing projects will provide a comprehensive view of genomic variation in populations, helping to unravel the genetic basis of human biology and disease risk. Recombination is a major biological mechanism responsible for generating genetic variation in a population, and has important implications for many computational problems in genome analysis, including disease-association mapping and detecting signatures of natural selection. The proposed research will help with analyzing and interpreting whole-genome variation data, by developing novel mathematical frameworks and statistical tools for studying population genetics models with recombination.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM094402-03
Application #
8306868
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Eckstrand, Irene A
Project Start
2010-09-01
Project End
2015-08-31
Budget Start
2012-09-01
Budget End
2013-08-31
Support Year
3
Fiscal Year
2012
Total Cost
$197,390
Indirect Cost
$62,255
Name
University of California Berkeley
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704
Mallick, Swapan; Li, Heng; Lipson, Mark et al. (2016) The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538:201-206
Steinrücken, Matthias; Jewett, Ethan M; Song, Yun S (2016) SpectralTDF: transition densities of diffusion processes with time-varying selection parameters, mutation rates and effective population sizes. Bioinformatics 32:795-7
Sheehan, Sara; Song, Yun S (2016) Deep Learning for Population Genetic Inference. PLoS Comput Biol 12:e1004845
Jewett, Ethan M; Steinrücken, Matthias; Song, Yun S (2016) The Effects of Population Size Histories on Estimates of Selection Coefficients from Time-Series Genetic Data. Mol Biol Evol 33:3002-3027
Raghavan, Maanasa; Steinrücken, Matthias; Harris, Kelley et al. (2015) POPULATION GENETICS. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science 349:aab3884
Živković, Daniel; Steinrücken, Matthias; Song, Yun S et al. (2015) Transition Densities and Sample Frequency Spectra of Diffusion Processes with Selection and Variable Population Size. Genetics 200:601-17
Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S (2015) Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res 25:268-79
Jenkins, Paul A; Fearnhead, Paul; Song, Yun S (2015) TRACTABLE DIFFUSION AND COALESCENT PROCESSES FOR WEAKLY CORRELATED LOCI. Electron J Probab 20:
Zou, James Y; Park, Danny S; Burchard, Esteban G et al. (2015) Genetic and socioeconomic study of mate choice in Latinos reveals novel assortment patterns. Proc Natl Acad Sci U S A 112:13621-6
Terhorst, Jonathan; Schlötterer, Christian; Song, Yun S (2015) Multi-locus analysis of genomic time series data from experimental evolution. PLoS Genet 11:e1005069

Showing the most recent 10 out of 29 publications