Personal genome sequences from next generation sequencing technologies are permeating clinical care via diagnostic testing. Clinical labs are under pressure to handle this data in unambiguous, reproducible ways, and provide interpretations that are comparable across testing sites, but face many difficulties to do so. Over the last several years the 'variant file', has emerged as the common currency for exchange and analyses of personal genome sequences for research, and now clinical purposes. These files describe every position in a personal genome that differs from the reference GenBank genome sequence. Given their widespread use, the variant file is a logical starting point for designing a format suitable for clinical applications. Currently variant file styles are widely divergent, ther is not uniformity in the way that complex variants are annotated and genomics has not embraced the use of medical data standards. The diagnostic genomics community, aware of these issues, has mobilized a working group to provide recommendations and requirements to unify variant annotation, to improve public health and clinical applications. For example, without clear standards for capturing variant sequence data, sharing data for the following applications is hindered: across laboratories for quality assurance, with databases for making an interpretation, with a patient record for future use. This proposal addresses the problems with polymorphic variant description by providing novel algorithms to define sequence variants and by developing file formats and software to communicate this information, using guidance from the clinical diagnostic community. The standardized format, VCFclin, and co-developed software tools will end information loss and ambiguity as genomics data flow from sequencing machines, through variant calling and analysis pipelines to interpretation and clinical use.

Public Health Relevance

Clinical sequencing labs are under pressure to handle sequence variant data in unambiguous, reproducible ways, and provide interpretations that are comparable across testing sites, but face many difficulties to do so. This proposal addresses these problems by providing novel algorithms to define sequence variants and by developing file formats and software to communicate this information, using guidance from the clinical diagnostic community. The file format VCFclin, and co-developed software tools will end information loss and ambiguity as genomics data flow from sequencing machines, through variant calling and analysis pipelines to interpretation and clinical use.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG008628-03
Application #
9293355
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Ramos, Erin
Project Start
2015-09-21
Project End
2019-06-30
Budget Start
2017-07-01
Budget End
2019-06-30
Support Year
3
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Utah
Department
Miscellaneous
Type
Schools of Medicine
DUNS #
009095365
City
Salt Lake City
State
UT
Country
United States
Zip Code
84112
Eilbeck, Karen; Quinlan, Aaron; Yandell, Mark (2017) Settling the score: variant prioritization and Mendelian disease. Nat Rev Genet 18:599-612
Lubin, Ira M; Aziz, Nazneen; Babb, Lawrence J et al. (2017) Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings. J Mol Diagn 19:417-426