Mathematical Models and Statistical Methods for Genome Analysis

Song, Yun

Abstract

Recent advances in sequencing technology is enabling fast and cost-effective generation of sequence data. Soon, whole-genome sequencing will become a routine assay, opening up new opportunities for biomedical research and related fields. The overall objective of this proposal is to develop accurate, scalable computational methods for whole-genome variation study, both by developing a new theoretical framework for studying population genetics models and by improving the key mathematical component underlying various statistical tools for genome analysis. A common thread that runs through this proposal is recombination. Both crossover and gene conversion recombinations will be considered.
The specific aims of the proposed research are: (1) Develop and apply a new theoretical framework that complements the standard coalescent theory when the rate of recombination is moderate to large, thus providing a new set of useful analytic tools to the population genomics community. (2) Devise principled methods to derive accurate multi-locus conditional sampling distributions directly from the underlying population genetics model. Improve the accuracy of a wide range of statistical methods for genome analysis that utilize conditional sampling distributions. (3) Develop scalable computational methods for joint estimation of crossover rates, gene conversion rates, and mean conversion tract lengths from population SNP data. Incorporate realistic biological scenarios into a model with overlapping gene conversions and extend the model to handle ectopic (or non-allelic) gene conversions in multigene families. The above goals will be achieved not by engineering modifications based on intuition or simulations, but by applying and generalizing recent mathematical results that are rigorous and accurate. The new theoretical framework developed in this research will allow one to carry out analytic computation, which was considered to be intractable in the standard coalescent theory with recombination. Furthermore, mathematically justified approximations based on diffusion processes will be devised and portable software packages will be developed for population genomics analysis.

Public Health Relevance

The ongoing large-scale sequencing projects will provide a comprehensive view of genomic variation in populations, helping to unravel the genetic basis of human biology and disease risk. Recombination is a major biological mechanism responsible for generating genetic variation in a population, and has important implications for many computational problems in genome analysis, including disease-association mapping and detecting signatures of natural selection. The proposed research will help with analyzing and interpreting whole-genome variation data, by developing novel mathematical frameworks and statistical tools for studying population genetics models with recombination.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM094402-03
Application #: 8306868
Study Section: Genetic Variation and Evolution Study Section (GVE)
Program Officer: Eckstrand, Irene A

Project Start: 2010-09-01
Project End: 2015-08-31
Budget Start: 2012-09-01
Budget End: 2013-08-31
Support Year: 3
Fiscal Year: 2012
Total Cost: $197,390
Indirect Cost: $62,255

Institution

Name: University of California Berkeley
Department: Engineering (All Types)
Type: Schools of Engineering
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2018 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley
NIH 2017 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley
NIH 2016 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley
NIH 2015 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley	$303,504
NIH 2014 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley
NIH 2013 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$190,481
NIH 2012 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$197,390
NIH 2011 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$197,390
NIH 2010 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$199,384

Publications

Spence, Jeffrey P; Steinrücken, Matthias; Terhorst, Jonathan et al. (2018) Inference of population history using coalescent HMMs: review and outlook. Curr Opin Genet Dev 53:70-76

Steinrücken, Matthias; Spence, Jeffrey P; Kamm, John A et al. (2018) Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans. Mol Ecol 27:3873-3888

Moreno-Mayar, J Víctor; Vinner, Lasse; de Barros Damgaard, Peter et al. (2018) Early human dispersals within the Americas. Science 362:

Moreno-Mayar, J Víctor; Potter, Ben A; Vinner, Lasse et al. (2018) Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans. Nature 553:203-207

Palamara, Pier Francesco; Terhorst, Jonathan; Song, Yun S et al. (2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 50:1311-1317

Terhorst, Jonathan; Kamm, John A; Song, Yun S (2017) Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49:303-309

Miroshnikov, Alexey; Steinrücken, Matthias (2017) Computing the joint distribution of the total tree length across loci in populations with variable size. Theor Popul Biol 118:1-19

Crawford, Nicholas G; Kelly, Derek E; Hansen, Matthew E B et al. (2017) Loci associated with skin pigmentation identified in African populations. Science 358:

Luo, Shishi; Mattingly, Jonathan C (2017) SCALING LIMITS OF A MODEL FOR SELECTION AT TWO SCALES. Nonlinearity 30:1682-1707

Luo, Shishi; Yu, Jane A; Song, Yun S (2016) Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads. PLoS Comput Biol 12:e1005117

Showing the most recent 10 out of 39 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: