We propose to create the world's premier database of genetic variants relevant to clinical care (Clinically Relevant Genetic Variants Resource or CRVR). We will provide transparent data synthesis and consensus opinion on the clinical utility of a given genetic variant across a spectrum of genetic lesions including single nucleotide changes, small indels and structural variants. We will integrate with ClinVar, PharmGKB, and OMIM and draw upon NHGRI initiatives including the Genome Sequencing and Analysis and Mendelian Disorders Sequencing Centers, and the Clinical Sequencing Exploratory Research Centers. We will work closely with other CRVR sites and NHGRI funded initiatives to improve deposition of data from clinical laboratories. Our database will be built through three Aims.
Aim 1 will engage and energize the clinical genomics community around CRVR efforts. We will partner with the other CRVR and U41 investigators in this activity as they will focus on engagement of professional societies, clinical testing laboratories, and the broader clinical genomics community to ensure creation of a CRVR resource that meets anticipated community needs including assembly of Disease-Specific and Mutation Type Working Groups (DSWGs and MTWGs) comprised of expert clinical geneticists and molecular diagnosticians to establish metrics for the initial classification of variants and integration of guidelines from professional organizations.
Aim 2 will involve creation of a CRVR CoreDB resource through expert review of the existing literature, locus databases, and NHGRI initiatives. We will disseminate consensus findings on clinically relevant genetic variants and the clinical implications of these variants, with supporting evidence and documentation of the consensus process. Information will be aggregated using standard ontologies and advanced methodologies for handling heterogeneous data to create a Core Database (CoreDB). The consensus of expert review will be disseminated through a user-friendly web Portal (vetted by Genetic Counseling WG), web services for data mining, and consensus clinical guidelines to the appropriate clinical and research communities. The results will be organized by gene, variant, disease, pathway, and literature. Supporting evidence will also be curated and disseminated, and the resource will be updated continuously as new information accumulates.
Aim 3 will involve deployment of machine-learning algorithms for semi- automatic identification of putative Clinically Relevant Variants (CRVs). We will undertake data mining of the clinical and epidemiological genetics literature and existing databases to identify putative clinically important variants. This will involve mining data from ClinVar, OMIM, CSER, and the Mendelian centers aggregated in Aim 2. The Working Groups formed in Aim 1 will establish criteria and oversee curators vetting variants. We will develop and optimize disease- and gene-specific machine learning algorithms to facilitate rapid classification of variants based on data provided by genetic testing services via ClinVar. We will integrate population-genetic data inferred from at least 25 reference populations from the 1000 Genomes Project and other large endeavors into our machine learning approaches so as to infer the global relevance of CRVs discovered here.

Public Health Relevance

We propose to create a unified, public, and freely available database of genetic alterations relevant to clinical care. Our ultimate goal is to empower clinicians, genetic counselors, and patients to make informed decisions based on DNA testing. Because much of the information required for such decisions is scattered among public and private databases, we propose combining the medical literature, expert summary of millions of de- identified genetic tests, and results from current and past NIH-funded genetic studies into a single unified database.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (M2))
Program Officer
Ramos, Erin
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Stanford University
Schools of Medicine
United States
Zip Code
Dewey, Frederick E; Grove, Megan E; Pan, Cuiping et al. (2014) Clinical interpretation and implications of whole-genome sequencing. JAMA 311:1035-45
Battle, A; Montgomery, S B (2014) Determining causality and consequence of expression quantitative trait loci. Hum Genet 133:727-35