A major challenge of contemporary research in genetics and genomics is the vast quantity of data. Visu- alization tools and customized data portals help conquer this complexity and greatly aid researchers on the path from data to knowledge. An important source of structure in genomic data is geography. Understanding the geography of genetic variation is crucial for human genomics as well as for the study of other species that are deeply relevant to human health. It is especially important in precision medicine, which aims to develop effective treatments for individuals of all ancestries. Currently there is a well-documented bias in genome-wide association studies (GWAS) towards European ancestry populations, though the relevance of this is unclear? some studies ?nd that GWAS results are largely portable across populations, others suggest substantial errors will arise in applying GWAS results across populations, and yet others leverage population variation via trans- ethnic ?ne-mapping. Given the broad importance of population structure, multiple computational tools have been developed for revealing population structure, and some of them are among the most cited algorithms in computational biology. Nevertheless, few existing computational genomic methods grapple explicitly with geography. Here, we propose to develop and improve multiple tools that will empower researchers to visualize and interpret geo- graphic patterns in genomic data. In the ?rst, we will build on our ?Geography of Genetic Varaints? browser, a web-based tool for accessing and displaying information on the geographic distribution of genetic variants in humans. In the second, we will expand the functionality of our software titled EEMS (for Estimating Effective Migration Surfaces), which provides a visualization tool that builds maps that reveal the genetic connectivity among populations. In the third, we develop a new variant-centric view for displaying patterns of popula- tion structure that has multiple applications. Overall, we expect to produce effective, important tools that will illuminate the relationships between genetic ancestry and geography. Throughout the project we will pay special attention to building user-friendly software and interactive data displays such as those generated by the Data Driven Documents (d3) JavaScript visualization libraries.
We aim to use simple, yet ?exible python backends and provide complementary R libraries to facilitate customization and integration with existing analysis pipelines. Finally, while population genetic applications motivate our work, the tools we are generating will be generally applicable to other forms of structured biomedical data.

Public Health Relevance

Geography plays an important role in structuring genetic variation in numerous species that are relevant to human health. Yet, computational methods for geneticists to analyze geographic structure are largely under- developed. In this project, we will develop multiple computational tools that empower researchers to derive insights from the geographic distributions of genetic variation.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM132383-01
Application #
9715755
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2019-04-01
Project End
2023-03-31
Budget Start
2019-04-01
Budget End
2020-03-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Chicago
Department
Genetics
Type
Schools of Medicine
DUNS #
005421136
City
Chicago
State
IL
Country
United States
Zip Code
60637