Every human carries at least a hundred loss-of-function (LoF) variants predicted to severely disrupt the function of protein-coding genes, including many in the homozygous state. These variants represent experiments of nature that can cast light on the function of currently uncharacterized human genes: indeed, much novel biology has already been learned from the involvement of rare LoF variants in severe Mendelian disease. Additionally, these variants have also proved valuable in identifying potential therapeutic targets: for instance, LoF variants in PCSK9 have been causally linked to low LDL cholesterol levels, leading to the development of PCSK9 as a therapeutic target for cardiovascular disease. However, discovering LoFs in the human population remains a significant challenge, as these variants are enriched for annotation errors. Current methods for predicting the function of genetic variants are insensitive to many important classes of LoF variant, such as splice-disrupting variants outside canonical splice sites, and none have been systematically validated against large-scale functional data sets to assess their accuracy in the detection of LoF variants. A second challenge is that LoF variants typically have very low frequency, meaning that very large sample sizes will be required to systematically discover LoFs in every possible gene. Alternatively, two distinct strategies, the use of bottlenecked populations and populations with a high rate of consanguineous mating, are established to significantly enrich for discovery of homozygous rare LoF variants, effectively identifying knockout humans. To this end, we have developed an open-source tool, LOFTEE (Loss Of Function Transcript Effect Estimator), to annotate loss-of-function variants. In order to characterize the landscape of homozygous LoF variants (knockouts) across humans, we will first enhance and validate LOFTEE using databases of known disease variants, and investigate the role of LoF variants on splicing by intersecting exomes with matched RNA-Seq data from over 500 individuals from the GTEx consortium. We will then apply LOFTEE to a number of large datasets from collaborative efforts, including over 91,000 exomes aggregated from a variety of rare and complex disease consortia, deeply phenotyped samples from Finnish national biobanks, and over 1,000 parentally-related individuals from the UK. Finally, we will aggregate LoF variants from these studies into a database of LoF variants, dbLoF, providing a resource for pharmaceutical development, transplant biology, and understanding of rare Mendelian diseases.

Public Health Relevance

Many clinical trials fail due to an inability to accurately predict the biological impact of inhibition of human genes. We propose to identify naturally occurring human 'knockouts' of human genes to facilitate this understanding, through the development of a pipeline for the accurate detection of loss-of-function variants and its application to sequencing data from tens of thousands of humans. In addition to aiding the diagnosis of rare genetic diseases, this project will use natural human variation to explore the biological function of many currently uncharacterized human genes.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Postdoctoral Individual National Research Service Award (F32)
Project #
5F32GM115208-02
Application #
9077055
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Maas, Stefan
Project Start
2015-04-01
Project End
2017-03-31
Budget Start
2016-04-01
Budget End
2017-03-31
Support Year
2
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Massachusetts General Hospital
Department
Type
DUNS #
073130411
City
Boston
State
MA
Country
United States
Zip Code
Karczewski, Konrad J; Snyder, Michael P (2018) Integrative omics for health and disease. Nat Rev Genet 19:299-310
Tukiainen, Taru; Villani, Alexandra-ChloƩ; Yen, Angela et al. (2017) Landscape of X chromosome inactivation across human tissues. Nature 550:244-248
Karczewski, Konrad J; Weisburd, Ben; Thomas, Brett et al. (2017) The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res 45:D840-D845
Cummings, Beryl B; Marshall, Jamie L; Tukiainen, Taru et al. (2017) Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med 9:
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V et al. (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285-91
Minikel, Eric Vallabh; Vallabh, Sonia M; Lek, Monkol et al. (2016) Quantifying prion disease penetrance using large population control cohorts. Sci Transl Med 8:322ra9