Software for exploring all forms of genetic variation in any species

Quinlan, Aaron

Abstract

Modern DNA sequencing technologies have revolutionized the design of experiments investigating the biology of the genome and the genetic basis of traits. Arguably the most powerful application of these technologies has been the creation of exquisitely detailed catalogs describing the landscape of genetic variation in multiple species. However, discovery of genetic variation is merely the beginning. Exploration and analysis of the resulting catalogs is required to catalyze new insights into the relationship between genotype and phenotype. This proposal is motivated by two fundamental limitations inhibiting discovery from genetic variation datasets. First, existing software for mining variation to understand disease and other traits does not scale to large datasets involving thousands of samples. Second, most existing tools are focused on human studies; consequently, this inhibits the application of modern DNA sequencing to genetic studies of model organisms, livestock genetics, and newly sequenced species. We propose to solve these challenges by building upon our GEMINI framework. Since 2012, we have maintained GEMINI as a powerful software framework for exploring genome variation. GEMINI's strength is that it integrates genetic variation with a diverse set of genome annotations into a database to facilitate variant prioritization. It allows researchers to conduct complex analyses with simple queries based on sample genotypes, phenotypes, inheritance patterns, and genome annotations. GEMINI has quickly become a very popular tool for rare human disease research leading to discoveries by multiple labs, including our own. Despite its power and popularity, GEMINI has three important limitations. It was not designed for studies involving genetic variation from more than a few hundred samples. Furthermore, its focus is the analysis of single-nucleotide (SNP) and insertion-deletion (INDEL); it is blind to structural and copy number variation. Finally, GEMINI can only analyze genetic variation datasets for the human genome; no other species or genome builds are supported. Therefore, this proposal seeks to provide geneticists studying any species with a powerful, flexible and simple to use software system that is fast and scalable enough to support genetic research for many years to come. We will do this but achieving the following Specific Aims: (1) Develop a scalable, high performance genotype and haplotype query engine to empower large scale genome studies. (2) Devise new methods for genotyping, integrating and prioritizing structural variation. (3) Enable scalable, flexible genome analysis in any species and genome build. In summary, by completing these aims, the proposed research will provide geneticists studying any species with a powerful, flexible and simple to use software system that is fast and scalable enough to support genetic research for many years to come.

Public Health Relevance

Arguably the most powerful application of modern DNA sequencing technologies has been the creation exquisitely detailed catalogs that describe the landscape of genetic variation in multiple species. However, discovery of genetic variation is merely the beginning; exploration and analysis of the resulting catalogs is required to catalyze new insights into the relationship between genotype and phenotype. This proposal seeks to provide geneticists studying any species with a powerful, flexible and simple to use software system that fast and scalable enough to support genetic research for many years to come.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM124355-01
Application #: 9367199
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Ravichandran, Veerasamy

Project Start: 2017-08-01
Project End: 2021-07-31
Budget Start: 2017-08-01
Budget End: 2018-07-31
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: University of Utah
Department: Genetics
Type: Schools of Medicine
DUNS #: 009095365

City: Salt Lake City
State: UT
Country: United States
Zip Code: 84112

Related projects


NIH 2020 R01 GM	Software for exploring all forms of genetic variation in any species Quinlan, Aaron R. / University of Utah
NIH 2019 R01 GM	Software for exploring all forms of genetic variation in any species Quinlan, Aaron R. / University of Utah
NIH 2018 R01 GM	Software for exploring all forms of genetic variation in any species Quinlan, Aaron R. / University of Utah
NIH 2017 R01 GM	Software for exploring all forms of genetic variation in any species Quinlan, Aaron R. / University of Utah

Publications

Belyeu, Jonathan R; Nicholas, Thomas J; Pedersen, Brent S et al. (2018) SV-plaudit: A cloud-based framework for manually curating thousands of structural variants. Gigascience 7:

Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya et al. (2018) GIGGLE: a search engine for large-scale integrated genome analysis. Nat Methods 15:123-126

Pedersen, Brent S; Quinlan, Aaron R (2018) Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34:867-868

Ostrander, Betsy E P; Butterfield, Russell J; Pedersen, Brent S et al. (2018) Whole-genome analysis for effective clinical diagnosis and gene discovery in early infantile epileptic encephalopathy. NPJ Genom Med 3:22

Pedersen, Brent S; Collins, Ryan L; Talkowski, Michael E et al. (2017) Indexcov: fast coverage quality control for whole-genome sequencing. Gigascience 6:1-6

Comments

Be the first to comment on Aaron Quinlan's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: