We developed Variant Association Tools (VAT), which is a software application for the association analysis of disease susceptibility genes using next-generation sequencing (NGS) data. Unlike many other applications that specialize in certain aspects of rare variant association analysis, VAT aims to provide an effective and comprehensive solution to association analysis of genetic epidemiology studies. Although VAT has been successfully applied to a number of large-scale whole exome sequence (WES) association studies including the NHLBI Exome Sequencing Project, continued development is required to effectively analyze whole genome sequence (WGS) data from emerging large-scale genetic epidemiology studies. Such studies generate unprecedented amount of data and pose methodological and bioinformatics challenges including efficient storage, retrieval, and analysis of terabytes of genotype data, access to heterogeneous annotation resources, choices of appropriate statistical gene mapping methods, and applications to real-world studies with proper controls for errors and biases. To address these challenges, we propose to extend VAT with new capacity and new analytical methods. More specifically, in aim 1, we will remove a major bottleneck of VAT by implementing a highly efficient hybrid storage model to store WGS samples.
In aim 2 we will implement recently developed and emerging statistical methods for sequence-based variant association analysis, with a programming interface to allow researchers to implement their own methods in VAT.
In aim 3, we will provide user-friendly pipelines, online resources, tutorials, and web interfaces to facilitate the applications of VAT to real-world studie.

Public Health Relevance

Variant Association Tools (VAT) is an analysis pipeline to perform quality control and data analysis of large-scale sequence and genotype data. The analyses which VAT can perform will be extended to, e.g., family data, admixed populations and gene-pathways and it will also be modified to handle data on tens of thousands of genomes. Freely available VAT will be fully documented and tutorials will be available to allow researchers to easily use this analysis tool.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG008972-01
Application #
9078292
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brooks, Lisa
Project Start
2016-06-14
Project End
2020-03-31
Budget Start
2016-06-14
Budget End
2017-03-31
Support Year
1
Fiscal Year
2016
Total Cost
$398,125
Indirect Cost
$73,125
Name
Baylor College of Medicine
Department
Genetics
Type
Schools of Medicine
DUNS #
051113330
City
Houston
State
TX
Country
United States
Zip Code
77030