Interrogating regulatory variants by multiplexed genome editing

Gymrek, Melissa; Goren, Alon

Abstract

A major result from recent genome wide association studies (GWAS) is that the majority of genetic variants driving common human diseases lie in regulatory, rather than protein-coding, regions. Massive efforts to map epigenomic features such as localization of histone modifications (HMs) and transcription factors (TFs) have paved the way toward understanding the regulatory genome. However, dissecting the impact of an individual non-coding variant remains an unsolved challenge. A variety of computational methods have been proposed, such as quantitative trait loci (QTL) studies and machine learning techniques. However, these methods still do not provide conclusive information about causality of any specific non-coding mutation and lack gold-standard experimental results for evaluation. Several techniques are used to experimentally test the impact of individual regulatory variants. For example, massively parallel reporter assays (MPRA) synthesize thousands of oligonucleotides encoding mutated versions of putative regulatory elements placed in plasmids upstream of reporter genes. However, a major limitation is that tested sequences are outside of their endogenous chromosomal locus, and hence do not necessarily provide physiological relevance. CRISPR enables targeted editing of genomic DNA. Indeed, CRISPR is widely used, but studies of individual point mutations have been primarily on a small scale and are usually limited to a handful of variants or to a single gene. The major throughput challenge in studying a specific variant using genome editing is in tying genotype to phenotype. Introducing individual mutations exhibits low efficiency, and thus there is a need for enrichment of the genotype or phenotype of interest prior to assessing the impact of a mutation on a phenotype, such as gene expression. Current enrichment methods either disrupt the physiological context or are low throughput. Recent efforts overcame these challenges using pooled editing to analyze thousands of mutations simultaneously, but were limited to variants in protein coding regions. This proposal aims to develop a novel technique merging multiplexed genome editing of putative regulatory variants followed by chromatin immunoprecipitation sequencing (ChIP-seq) to simultaneously measure the impact of hundreds of non-coding variants on regulatory potential in their native genomic context. The key insight of the proposed approach is that mutations impacting epigenomic features can be measured both in genomic DNA and in phenotypic readouts such as ChIP-seq of TFs or HMs, avoiding the need for a selection step to connect genotypes with phenotypes. ?Aim 1 develops the pooled editing technique on a pilot set of previously validated regulatory variants. ?Aim 2 scales this approach to interrogate thousands of mutations at once. ?Aim 3 integrates experimental predictions with state of the art machine learning methods to evaluate and optimize computational methods for regulatory variant effect prediction.

Public Health Relevance

Recent studies have demonstrated that the majority of genetic changes in the population contributing to common human diseases, such as schizophrenia, heart disease, and diabetes, lie in regions of the genome that do not code for proteins, but rather regulate the expression of genes. Despite massive efforts to map regulatory regions across dozens of human cell types, it is still difficult to predict the effect of an individual non-coding mutation. This project develops a high-throughput genome editing technique to simultaneously measure the impact of hundreds of non-coding mutations on regulatory potential in their native genomic context, with the ultimate goal of interpreting genetic changes leading to human disease.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants (R21)
Project #: 1R21HG010070-01
Application #: 9509167
Study Section: Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer: Gilchrist, Daniel A

Project Start: 2018-08-09
Project End: 2020-07-31
Budget Start: 2018-08-09
Budget End: 2019-07-31
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: University of California, San Diego
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 804355790

City: La Jolla
State: CA
Country: United States
Zip Code: 92093

Related projects


NIH 2019 R21 HG	Interrogating regulatory variants by multiplexed genome editing Gymrek, Melissa; Goren, Alon / University of California, San Diego
NIH 2018 R21 HG	Interrogating regulatory variants by multiplexed genome editing Gymrek, Melissa; Goren, Alon / University of California, San Diego

Comments

Be the first to comment on Melissa Gymrek's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: