Mathematical Models and Statistical Methods for Large-Scale Population Genomics

Song, Yun

Abstract

Technological advances in DNA sequencing have dramatically increased the availability of genomic variation data over the past few years. This development offers a powerful window into understanding the genetic basis of human biology and disease risk. To facilitate achieving this goal, it is crucial to develop efficient analytical methods that will allow researchers to more fuly utilize the information in genomic data and consider more complex models than previously possible. The central goal of this project is to tackle this important challenge, by carrying out te following Specific Aims:
In Aim 1, we will develop efficient inference tools for whole-genome population genomic analysis by extending our ongoing work on coalescent hidden Markov models and apply them to large-scale data. The methods we develop will enable researchers to analyze large samples under general demographic models involving multiple populations with population splits, migration, and admixture, as well as variable effective population sizes and temporal samples (ancient DNA). Multi-locus full-likelihood computation is often prohibitive in most population genetic models with high complexity. To address this problem, we will develop in Aim 2 a novel likelihood-free inference framework for population genomic analysis by applying a highly active area of machine learning research called deep learning. We will apply the method to various parameter estimation and classification problems in population genomics, particularly joint inference of selection and demography. In addition to carrying out technical research, we will develop a useful software package that will allow researchers from the population genomics community to utilize deep learning in their own research. It is becoming increasingly more popular to utilize time-series genetic variation data at the whole-genome scale to infer allele frequency changes over a time course. This development creates new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters.
In Aim 3, we will develop new statistical methods to take full advantage of this novel data source at both short and long evolutionary timescales. Specifically, we will develop and apply efficient statistical inference methods for analyzing time-series genomic variation data from experimental evolution and ancient DNA samples. Useful open-source software will be developed for each specific aim. The novel methods developed in this project will help to analyze and interpret genetic variation data at the whole-genome scale.

Public Health Relevance

This project will develop several novel statistical methods for analyzing and interpreting human genetic variation data at the whole-genome scale. The computational tools stemming from this research will enable efficient and accurate inference under complex population genetic models, thereby broadly facilitating research efforts to understand the genetic basis of human biology and disease risk.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM094402-08
Application #: 9328097
Study Section: Genetic Variation and Evolution Study Section (GVE)
Program Officer: Janes, Daniel E

Project Start: 2010-09-01
Project End: 2019-08-31
Budget Start: 2017-09-01
Budget End: 2018-08-31
Support Year: 8
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: University of California Berkeley
Department: Engineering (All Types)
Type: Biomed Engr/Col Engr/Engr Sta
DUNS #: 124726725

City: Berkeley
State: CA
Country: United States
Zip Code: 94704

Related projects


NIH 2018 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley
NIH 2017 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley
NIH 2016 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley
NIH 2015 R01 GM	Mathematical Models and Statistical Methods for Large-Scale Population Genomics Song, Yun S. / University of California Berkeley	$303,504
NIH 2014 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley
NIH 2013 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$190,481
NIH 2012 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$197,390
NIH 2011 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$197,390
NIH 2010 R01 GM	Mathematical Models and Statistical Methods for Genome Analysis Song, Yun S. / University of California Berkeley	$199,384

Publications

Spence, Jeffrey P; Steinrücken, Matthias; Terhorst, Jonathan et al. (2018) Inference of population history using coalescent HMMs: review and outlook. Curr Opin Genet Dev 53:70-76

Steinrücken, Matthias; Spence, Jeffrey P; Kamm, John A et al. (2018) Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans. Mol Ecol 27:3873-3888

Moreno-Mayar, J Víctor; Vinner, Lasse; de Barros Damgaard, Peter et al. (2018) Early human dispersals within the Americas. Science 362:

Moreno-Mayar, J Víctor; Potter, Ben A; Vinner, Lasse et al. (2018) Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans. Nature 553:203-207

Palamara, Pier Francesco; Terhorst, Jonathan; Song, Yun S et al. (2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 50:1311-1317

Terhorst, Jonathan; Kamm, John A; Song, Yun S (2017) Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49:303-309

Miroshnikov, Alexey; Steinrücken, Matthias (2017) Computing the joint distribution of the total tree length across loci in populations with variable size. Theor Popul Biol 118:1-19

Crawford, Nicholas G; Kelly, Derek E; Hansen, Matthew E B et al. (2017) Loci associated with skin pigmentation identified in African populations. Science 358:

Luo, Shishi; Mattingly, Jonathan C (2017) SCALING LIMITS OF A MODEL FOR SELECTION AT TWO SCALES. Nonlinearity 30:1682-1707

Luo, Shishi; Yu, Jane A; Song, Yun S (2016) Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads. PLoS Comput Biol 12:e1005117

Showing the most recent 10 out of 39 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: