Loss-of-function (LoF) mutations in human protein-coding genes are known to play a major role in severe diseases such as cystic fibrosis and muscular dystrophy, and have also recently been shown to influence the risk of complex diseases such as type 1diabetes and Crohn's disease. We have recently conducted the largest systematic survey to date of human LoF variants, as part of the 1000 Genomes Project, which has confirmed the value of LoF variants for human disease studies and also identified key chalenges for the detection and interpretation of these variants. We propose to overcome these challenges by constructing robust, accurate tools for the annotation, characterization and high-throughput genotyping of LoF variants. Firstly, we will develop an integrated informatic pipeline (Annotation of LoF Transcripts, ALoFT) for the identification and filtering of all classes of LoF variant, including single nucleotide substitutions (SNPs), insertions and deletions. Secondly, we will exploit data from RNA sequencing experiments and disease mutation databases to create more accurate predictive models of the efects of genetic variants on gene expression and splicing, and of their probability of disease causation. Finally, we will apply the ALoFT pipeline and the predictive models described above to over 30,000 human exomes and genomes sequenced as part of other NIH-funded projects, using the resulting annotation as the basis for a publicly accessible database of validated LoF variants, dbLoF. We will use our functional annotation to develop a weighted association test and apply this to the discovery of novel disease risk variants in these sequenced individuals. In addition, we will use the catalogue of LoF variants identified in these samples to design a custom genotyping array permitting rapid, cost-effective interrogation of the majority of common human LoF variants in human cohorts, allowing the phenotypic effects of these variants to be assessed in separately funded association studies. This study will provide powerful tools for discovering and characterizing natural loss-of-function variants, and for exploring their potential association with human disease risk.

Public Health Relevance

Genetic variants that cause the complete loss of function (LoF) of human protein-coding genes are known to play a major role in severe human disease, but are also highly susceptible to sequencing and annotation artifacts. We propose the development of a suite of analytical tools for the accurate identification and filtering of LoF variants, guided and validated using RNA sequencing data and databases of known severe disease mutations. We will apply these tools to large-scale human genome sequence data, generating a high-quality catalogue of LoF variants in the human population, and guiding the design of a genotyping array for further studies assessing the effects of these variants on human phenotypes and disease risk.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM104371-03
Application #
8843011
Study Section
Special Emphasis Panel (ZGM1)
Program Officer
Krasnewich, Donna M
Project Start
2013-05-01
Project End
2016-04-30
Budget Start
2015-05-01
Budget End
2016-04-30
Support Year
3
Fiscal Year
2015
Total Cost
Indirect Cost
Name
Massachusetts General Hospital
Department
Type
DUNS #
073130411
City
Boston
State
MA
Country
United States
Zip Code
Balasubramanian, Suganthi; Fu, Yao; Pawashe, Mayur et al. (2017) Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nat Commun 8:382
Saleheen, Danish; Natarajan, Pradeep; Armean, Irina M et al. (2017) Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature 544:235-239
Reddy, Hemakumar M; Cho, Kyung-Ah; Lek, Monkol et al. (2017) The sensitivity of exome sequencing in identifying pathogenic mutations for LGMD in the United States. J Hum Genet 62:243-252
Paludan-Müller, C; Ahlberg, G; Ghouse, J et al. (2017) Integration of 60,000 exomes and ACMG guidelines question the role of Catecholaminergic Polymorphic Ventricular Tachycardia-associated variants. Clin Genet 91:63-72
Kosmicki, Jack A; Samocha, Kaitlin E; Howrigan, Daniel P et al. (2017) Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat Genet 49:504-510
Kumánovics, Attila; Lee, Yu Nee; Close, Devin W et al. (2017) Estimated disease incidence of RAG1/2 mutations: A case report and querying the Exome Aggregation Consortium. J Allergy Clin Immunol 139:690-692.e3
Zhang, Xiaolei; Minikel, Eric V; O'Donnell-Luria, Anne H et al. (2017) ClinVar data parsing. Wellcome Open Res 2:33
Whiffin, Nicola; Minikel, Eric; Walsh, Roddy et al. (2017) Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med 19:1151-1158
Karczewski, Konrad J; Weisburd, Ben; Thomas, Brett et al. (2017) The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res 45:D840-D845
Cassa, Christopher A; Weghorn, Donate; Balick, Daniel J et al. (2017) Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat Genet 49:806-810

Showing the most recent 10 out of 24 publications