Tools for Visualization of Geographic Structure in Population Genomic Data

Novembre, John

Abstract

Large samples sizes are increasingly common in genetics/genomics, particularly in human genetics where sample sizes must be large (>1,000s of individuals) to detect variant associations with complex disease traits. A common feature of data from large samples is that the individuals within the study have varying levels of similarity with one another that can become problematic for downstream analyses (e.g. causing spurious associations) if not understood. Thus uncovering population structure and dissecting it to understand its source is a common and important practice in large-scale studies. Here, we aim to solve challenges for visualizing population structure that regularly arise when researchers interact with large-scale population genomic data sets.
In Aim 1 we will develop a software tool for visualizing population structure using principal components analysis (PCA). This tool will make straightforward several steps that are commonly reinvented by data scientists as they analyze PCA outputs from genetic data. It will also make more clear whether PCA analyses may be returning anomalous results.
In Aim 2 we will develop a tool for producing geographic allele frequency maps of publicly available or user-generated allele frequency data.
In Aim 3 we will develop a visualization approach for displaying geographic regions where populations show unexpectedly high or low levels of differentiation.
In Aim 4 we will integrate these pieces of software into a single suite and link them to externally generated data sources and existing genome browsers. By developing these sets of tools we help to remove the need for unnecessary script generation by independent researchers and increase the pace of genomics research. Throughout the project we will pay special attention to developing user-friendly interactive data displays such as those generated by the Data Driven Documents (d3) JavaScript visualization libraries. Where possible we will use simple, yet flexible python backends and provide complementary R libraries to facilitate customizations and integration with existing analysis pipelines. While population genetic applications will motivate our work, the tools we are generating will be generally applicable to other forms of structured biomedical data.

Public Health Relevance

This project will provide tools for visualizing large-scale genetic data with population structure. While numerous advanced algorithms for summarizing population structure exist, the human interface to the outputs of these methods is lacking and has become a time sink during the analysis of large samples. In this project we will provide user-friendly tools that lower the barrier to understanding genetic variation datasets. In particulr we will develop tools for visualizing compressed representations of genetic variation (i.e. PCA results) and how genetic diversity is distributed across geographic space in a sample.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Cancer Institute (NCI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01CA198933-01
Application #: 8876141
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Miller, David J

Project Start: 2015-06-01
Project End: 2018-05-31
Budget Start: 2015-06-01
Budget End: 2016-05-31
Support Year: 1
Fiscal Year: 2015
Total Cost
Indirect Cost

Institution

Name: University of Chicago
Department: Genetics
Type: Schools of Medicine
DUNS #: 005421136

City: Chicago
State: IL
Country: United States
Zip Code: 60637

Related projects


NIH 2017 U01 CA	Tools for Visualization of Geographic Structure in Population Genomic Data Novembre, John / University of Chicago	$409,700
NIH 2016 U01 CA	Tools for Visualization of Geographic Structure in Population Genomic Data Novembre, John / University of Chicago
NIH 2015 U01 CA	Tools for Visualization of Geographic Structure in Population Genomic Data Novembre, John / University of Chicago

Publications

Engelmann, Brett W; Hsiao, Chiaowen Joyce; Blischak, John D et al. (2018) A Methodological Assessment and Characterization of Genetically-Driven Variation in Three Human Phosphoproteomes. Sci Rep 8:12106

Dey, Kushal K; Hsiao, Chiaowen Joyce; Stephens, Matthew (2017) Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genet 13:e1006599

Marcus, Joseph H; Novembre, John (2017) Visualizing the geography of genetic variants. Bioinformatics 33:594-595

de Manuel, Marc; Kuhlwilm, Martin; Frandsen, Peter et al. (2016) Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science 354:477-481

Novembre, John; Peter, Benjamin M (2016) Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 41:98-105

Petkova, Desislava; Novembre, John; Stephens, Matthew (2016) Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet 48:94-100

Comments

Be the first to comment on John Novembre's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: