The Genome Aggregation Database (gnomAD) is a ubiquitous resource for basic research and clinical interpretation. The world?s largest genetic variation resource, the gnomAD dataset is used in virtually all clinical genetic diagnostic pipelines worldwide, and the website has over 20 million page views to date. Here we outline a proposal that will expand the gnomAD resource to millions of samples across diverse global populations. Our proposal will scale variant-calling and quality control to match this sample size, integrate statistical tools and other genomic resources critical to clinical interpretation, and ensure that the data we aggregate will continue to be shared freely with the biomedical community. To accomplish this we will apply a highly computationally efficient strategy to call all classes of variation (including SNVs, small indels, and the mutational spectrum of structural variants) across millions of sequenced samples enriched for under- represented ancestry groups. We will deploy a cloud-based framework for the efficient storage and automated quality control of these very large and heterogeneous sequence data sets using the massively parallel Hail architecture. We will leverage the scale of gnomAD to provide increasingly high-resolution maps of the depletion of functional variation across regions of the genome (highlighting genome regions where natural selection constrains DNA change) and provide statistical frameworks for quantitatively assessing whether the population frequency of a variant is consistent with pathogenicity, linking this information with evidence from the ClinVar resource. We will continue to share all of this data as rapidly and openly as possible with the biomedical community, long before publication. We will support and expand functionality in our widely accessed data browser as well as create scalable and publicly accessible datasets that integrate our variation data with clinical and functional genomic annotations, accessible through API frameworks to empower novel applications of the datasets. We will also provide resources and training to improve the use of gnomAD resources by the clinical genetics and wider biomedical communities.
The Genome Aggregation Database (gnomAD) will provide public access to the largest and most diverse genome sequencing resource in the field. gnomAD is critical to help doctors and scientists interpret the genomes of the human population and understand which changes in an individual?s DNA may lead to disease or have no effect.