Technological advances in DNA sequencing have dramatically increased the availability of genomic variation data over the past few years. This development offers a powerful window into understanding the genetic basis of human biology and disease risk. To facilitate achieving this goal, it is crucial to develop efficient analytical methods that will allow researchers to more fuly utilize the information in genomic data and consider more complex models than previously possible. The central goal of this project is to tackle this important challenge, by carrying out te following Specific Aims:
In Aim 1, we will develop efficient inference tools for whole-genome population genomic analysis by extending our ongoing work on coalescent hidden Markov models and apply them to large-scale data. The methods we develop will enable researchers to analyze large samples under general demographic models involving multiple populations with population splits, migration, and admixture, as well as variable effective population sizes and temporal samples (ancient DNA). Multi-locus full-likelihood computation is often prohibitive in most population genetic models with high complexity. To address this problem, we will develop in Aim 2 a novel likelihood-free inference framework for population genomic analysis by applying a highly active area of machine learning research called deep learning. We will apply the method to various parameter estimation and classification problems in population genomics, particularly joint inference of selection and demography. In addition to carrying out technical research, we will develop a useful software package that will allow researchers from the population genomics community to utilize deep learning in their own research. It is becoming increasingly more popular to utilize time-series genetic variation data at the whole-genome scale to infer allele frequency changes over a time course. This development creates new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters.
In Aim 3, we will develop new statistical methods to take full advantage of this novel data source at both short and long evolutionary timescales. Specifically, we will develop and apply efficient statistical inference methods for analyzing time-series genomic variation data from experimental evolution and ancient DNA samples. Useful open-source software will be developed for each specific aim. The novel methods developed in this project will help to analyze and interpret genetic variation data at the whole-genome scale.

Public Health Relevance

This project will develop several novel statistical methods for analyzing and interpreting human genetic variation data at the whole-genome scale. The computational tools stemming from this research will enable efficient and accurate inference under complex population genetic models, thereby broadly facilitating research efforts to understand the genetic basis of human biology and disease risk.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM094402-07
Application #
9145232
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Janes, Daniel E
Project Start
2010-09-01
Project End
2019-08-31
Budget Start
2016-09-01
Budget End
2017-08-31
Support Year
7
Fiscal Year
2016
Total Cost
Indirect Cost
Name
University of California Berkeley
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704
Spence, Jeffrey P; Steinrücken, Matthias; Terhorst, Jonathan et al. (2018) Inference of population history using coalescent HMMs: review and outlook. Curr Opin Genet Dev 53:70-76
Steinrücken, Matthias; Spence, Jeffrey P; Kamm, John A et al. (2018) Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans. Mol Ecol 27:3873-3888
Moreno-Mayar, J Víctor; Vinner, Lasse; de Barros Damgaard, Peter et al. (2018) Early human dispersals within the Americas. Science 362:
Moreno-Mayar, J Víctor; Potter, Ben A; Vinner, Lasse et al. (2018) Terminal Pleistocene Alaskan genome reveals first founding population of Native Americans. Nature 553:203-207
Palamara, Pier Francesco; Terhorst, Jonathan; Song, Yun S et al. (2018) High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability. Nat Genet 50:1311-1317
Terhorst, Jonathan; Kamm, John A; Song, Yun S (2017) Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49:303-309
Miroshnikov, Alexey; Steinrücken, Matthias (2017) Computing the joint distribution of the total tree length across loci in populations with variable size. Theor Popul Biol 118:1-19
Crawford, Nicholas G; Kelly, Derek E; Hansen, Matthew E B et al. (2017) Loci associated with skin pigmentation identified in African populations. Science 358:
Luo, Shishi; Mattingly, Jonathan C (2017) SCALING LIMITS OF A MODEL FOR SELECTION AT TWO SCALES. Nonlinearity 30:1682-1707
Luo, Shishi; Yu, Jane A; Song, Yun S (2016) Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads. PLoS Comput Biol 12:e1005117

Showing the most recent 10 out of 39 publications