Population genetics for large-scale sequencing studies of diverse populations

Rosenberg, Noah; Zoellner, Sebastian

Abstract

Population-based studies identifying common genetic variants that affect complex human diseases have relied heavily on population-genetic principles in important tasks such as study design, quality control, and genotype imputation. As the emphasis of mapping studies has now shifted to investigating rare variants in next- generation sequencing projects, new opportunities exist for leveraging population genetics to maximize the return from these investigations. Because studies thus far have often focused on populations of European descent, it is critical that new methods provide tools to analyze data from a greater diversity of populations. This project builds on productive efforts in the first funding period, proposing methods that capitalize on the study of human population genetics to enhance the design, analysis, and interpretation of genome sequencing studies, and focusing on analysis of rare risk variants in diverse human populations. (1) We will devise methods for selecting subsamples of individuals for genome and exome sequencing, particularly in admixed and structured populations. Such subsamples will make it possible for researchers to maximize their potential for achieving statistical power to detect rare disease variants. (2) We will enhance variant-calling accuracy, particularly in low-coverage data and for challenging indels and copy-number variants, by including in the variant-calling pipeline evidence accumulated from closely related haplotypes in the population. This approach will be particularly beneficial in admixed and genetically diverse populations, in which haplotype variation is especially significant and selecting an informative haplotype subset to assist in variant-calling is of greatest value. (3) We will use population-genetic principles to improve sample quality control in sequencing studies. First, we address the common challenge of sample contamination, which adversely affects variant-calling and downstream analyses. We will produce a method to estimate the genotypes of the minor contributor of a mixed sample, thus enabling the population of origin of a contaminating signal to be identified. This identification further facilitates variant-calling and permits in silico deconvolution of mixed samples. Second, to enhance the sharing of samples in large projects, we will devise methods to uncover duplicate or related samples from non- overlapping marker sets. Our approach will reduce the risk of expending effort to obtain sequence that will not be fully utilized, and will also assist in making use of historical low-density data in understudied populations. (4) We will incorporate new advances in the study of human population growth and natural selection for evaluating rare-variant tests and identifying powerful testing strategies. Evaluations of current tools often ignore important population-genetic factors such as selection or accelerating growth; our methods will enhance models for analyzing rare-variant testing methods, tailoring them to populations of interest. Throughout the project, we will use multi-population genome sequence data from the TopMed and InPSYght studies to test our approaches. To facilitate use of our methods, we will produce, test, and distribute new publicly available software programs.

Public Health Relevance

Population-based studies that assess large samples of unrelated cases and controls offer a powerful approach to identify risk variants for common complex diseases. However, many methods for addressing the current focus of these studies on rare risk variants and genome sequencing make limited use of informative models from population genetics, and they often do not consider complexities inherent to studies of populations of non- European origin. Our project will leverage models from population genetics to provide methods and software that will accelerate the discovery of genetic factors that increase disease risk, addressing challenges arising from consideration of rare genetic variation, large sample sizes, complex sequencing projects, and the effort to find disease variants in underrepresented populations.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 2R01HG005855-07A1
Application #: 9380866
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Brooks, Lisa

Project Start: 2010-09-13
Project End: 2021-06-30
Budget Start: 2017-09-15
Budget End: 2018-06-30
Support Year: 7
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: Stanford University
Department: Biology
Type: Schools of Arts and Sciences
DUNS #: 009214214

City: Stanford
State: CA
Country: United States
Zip Code: 94304

Related projects


NIH 2020 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2020 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2019 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2018 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2017 R01 HG	Population genetics for large-scale sequencing studies of diverse populations Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2014 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University
NIH 2013 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University	$366,830
NIH 2012 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University	$384,360
NIH 2011 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / Stanford University	$463,811
NIH 2010 R01 HG	Advanced strategies for genotype imputation Rosenberg, Noah; Zoellner, Sebastian / University of Michigan Ann Arbor	$377,783

Publications

Aw, Alan J; Rosenberg, Noah A (2018) Bounding measures of genetic similarity and diversity using majorization. J Math Biol 77:711-737

Reppell, M; Zöllner, S (2018) An efficient algorithm for generating the internal branches of a Kingman coalescent. Theor Popul Biol 122:57-66

Kim, Jaehee; Edge, Michael D; Algee-Hewitt, Bridget F B et al. (2018) Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci. Cell 175:848-858.e6

Arbisser, Ilana M; Jewett, Ethan M; Rosenberg, Noah A (2018) On the joint distribution of tree height and tree length under the coalescent. Theor Popul Biol 122:46-56

Edge, Michael D; Algee-Hewitt, Bridget F B; Pemberton, Trevor J et al. (2017) Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets. Proc Natl Acad Sci U S A 114:5671-5676

Vattathil, Selina; Scheet, Paul (2016) Extensive Hidden Genomic Mosaicism Revealed in Normal Tissue. Am J Hum Genet 98:571-578

Kang, Jonathan T L; Goldberg, Amy; Edge, Michael D et al. (2016) Consanguinity Rates Predict Long Runs of Homozygosity in Jewish Populations. Hum Hered 82:87-102

Kang, Jonathan T L; Zhang, Peng; Zöllner, Sebastian et al. (2015) Choosing Subsamples for Sequencing Studies by Minimizing the Average Distance to the Closest Leaf. Genetics 201:499-511

Lo, Yancy; Kang, Hyun M; Nelson, Matthew R et al. (2015) Comparing variant calling algorithms for target-exon sequencing in a large sample. BMC Bioinformatics 16:75

Buzbas, Erkan O; Rosenberg, Noah A (2015) AABC: approximate approximate Bayesian computation for inference in population-genetic models. Theor Popul Biol 99:31-42

Showing the most recent 10 out of 43 publications

Comments

Be the first to comment on Noah Rosenberg's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: