We developed Variant Association Tools (VAT), which is a software application for the association analysis of disease susceptibility genes using next-generation sequencing (NGS) data. Unlike many other applications that specialize in certain aspects of rare variant association analysis, VAT aims to provide an effective and comprehensive solution to association analysis of genetic epidemiology studies. Although VAT has been successfully applied to a number of large-scale whole exome sequence (WES) association studies including the NHLBI Exome Sequencing Project, continued development is required to effectively analyze whole genome sequence (WGS) data from emerging large-scale genetic epidemiology studies. Such studies generate unprecedented amount of data and pose methodological and bioinformatics challenges including efficient storage, retrieval, and analysis of terabytes of genotype data, access to heterogeneous annotation resources, choices of appropriate statistical gene mapping methods, and applications to real-world studies with proper controls for errors and biases. To address these challenges, we propose to extend VAT with new capacity and new analytical methods. More specifically, in aim 1, we will remove a major bottleneck of VAT by implementing a highly efficient hybrid storage model to store WGS samples.
In aim 2 we will implement recently developed and emerging statistical methods for sequence-based variant association analysis, with a programming interface to allow researchers to implement their own methods in VAT.
In aim 3, we will provide user-friendly pipelines, online resources, tutorials, and web interfaces to facilitate the applications of VAT to real-world studie.

Public Health Relevance

Variant Association Tools (VAT) is an analysis pipeline to perform quality control and data analysis of large-scale sequence and genotype data. The analyses which VAT can perform will be extended to, e.g., family data, admixed populations and gene-pathways and it will also be modified to handle data on tens of thousands of genomes. Freely available VAT will be fully documented and tutorials will be available to allow researchers to easily use this analysis tool.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
7R01HG008972-04
Application #
9901082
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Sofia, Heidi J
Project Start
2016-06-14
Project End
2021-03-31
Budget Start
2019-04-17
Budget End
2021-03-31
Support Year
4
Fiscal Year
2019
Total Cost
Indirect Cost
Name
Columbia University (N.Y.)
Department
Neurology
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032