Whole genome sequencing (WGS) has the potential to improve medical care, but much effort remains to translate sequence data into meaningful clinical interpretations. WGS interpretation must address both newly observed genetic variants that are likely to be harmful, as well as the review of over 150,000 variants that are already reported to be associated with disease from the medical and scientific literature. Many of these discoveries were made in small cohort and case studies, making it difficult to translate these into disease risks for asymptomatic individuals that carry these variants. Without accurate risk estimates for these associations, we may potentially expose healthy patients to false positive findings, leading to needless diagnostic workups and screenings that will substantially increase medical costs and patient morbidity. Central to WGS interpretation is the development of a standardized methodology to filter likely benign results, and to prioritize those variants that may be clinically significant and scientifically valid. While many of these previously identified variats are associated with Mendelian disorders that are individually rare, (e.g. hypertrophic cardiomyopathy and neurofibromatosis,) these disorders are collectively common, forming a long tail that confers disease risk for many individuals. Because each of these diseases is so rare, it is hard to envision a specialized interpretive approach to calculate risk for each disease so we propose a systematic approach that is broadly applicable across many rare diseases to assess variant disease risk. To meet this urgent need, we will develop a novel approach that estimates the penetrance of disease- associated variants using the prior probability of each disease, and the population frequencies of all of the known genetic variants for that disease for affected and unaffected individuals. This prior probability of disease is measured as the prevalence, or the proportion of individuals in a population affected with a disorder. Because the prevalence of a Mendelian disease is actually a combination of the penetrance and frequency of all of its genetic variation (as well as other behavioral and environmental factors) we propose to estimate these penetrance values using the disease prevalence and distribution of associated variation, for each disease. If there is only one variant associated with a disease, the total penetrance and population frequency for that disease should be closely correlated with disease prevalence, but if there are many disease-associated variants, each contributes less to the overall burden of diseases, adjusted by its frequency in the population. We will then use these penetrance estimates to establish genome-wide filtering cutoffs for likely benign variation and to prioritize observed WGS variation for review by clinical geneticists. We then propose to use these values to filter and rank the observed variation in individual WGS datasets in an existing clinical trial, and to compare these with existing clinical genetics interpretations.
The interpretation of clinical testing relies on physician integration of new evidence that modifies the prior probability of disease, but for many rare disease associations, there is little or no structured data describing the risk that they will caus in individual patients with no symptoms of disease. Without reliable data for underlying population risk for these rare diseases, and the risk conferred by each disease associated variant, it is not possible to computationally generate an accurate likelihood of disease in asymptomatic individuals, limiting benefits to patients from whole genome sequencing and possibly exposing them to dangerous false positive findings. In this project, we will integrate epidemiological and population genetics data to estimate the penetrance of each variant observed in an individual's genome sequence and use these data to prioritize which variants clinical geneticists should review most urgently.
|Cassa, Christopher A; Jordan, Daniel M; Adzhubei, Ivan et al. (2018) A literature review at genome scale: improving clinical variant assessment. Genet Med 20:936-941|
|Cassa, Christopher A; Weghorn, Donate; Balick, Daniel J et al. (2017) Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 49:806-810|
|Cassa, Christopher A; Smith, Stacy E; Docken, William et al. (2016) An argument for early genomic sequencing in atypical cases: a WISP3 variant leads to diagnosis of progressive pseudorheumatoid arthropathy of childhood. Rheumatology (Oxford) 55:586-9|
|Akle, Sebastian; Chun, Sung; Jordan, Daniel M et al. (2015) Mitigating false-positive associations in rare disease gene discovery. Hum Mutat 36:998-1003|
|Balick, Daniel J; Do, Ron; Cassa, Christopher A et al. (2015) Dominance of Deleterious Alleles Controls the Response to a Population Bottleneck. PLoS Genet 11:e1005436|
|Chopra, Sameer S; Leshchiner, Ignaty; Duzkale, Hatice et al. (2015) Inherited CHST11/MIR3922 deletion is associated with a novel recessive syndrome presenting with skeletal malformation and malignant lymphoproliferative disease. Mol Genet Genomic Med 3:413-23|
|Brownstein, Catherine A; Beggs, Alan H; Homer, Nils et al. (2014) An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol 15:R53|
|Cassa, Christopher A; Tong, Mark Y; Jordan, Daniel M (2013) Large numbers of genetic variants considered to be pathogenic are common in asymptomatic individuals. Hum Mutat 34:1216-20|
|Cassa, Christopher A; Miller, Rachel A; Mandl, Kenneth D (2013) A novel, privacy-preserving cryptographic approach for sharing sequencing data. J Am Med Inform Assoc 20:69-76|
|Cassa, Christopher A; Chunara, Rumi; Mandl, Kenneth et al. (2013) Twitter as a sentinel in emergency situations: lessons from the Boston marathon explosions. PLoS Curr 5:|