Most disease- or trait-associated genetic variants lie within non-coding sequences of the genome such as enhancers, promoters and insulators. Those sequences regulate gene expression, play critical roles in determining disease severity and may serve as targets for novel rational therapeutic interventions. However, we still lack insights on the exact mechanisms by which non-coding sequences are translated into function and on the impact of genetic variation within them. CRISPR perturbations of non-coding elements, offer unprecedented opportunities for assessing their function in a myriad of developmental and disease settings. We hypothesize that an improved understanding of non-coding sequences by direct perturbation in their endogenous context, and with a direct readout of the genetic perturbations, will offer new opportunities to therapeutically intervene in human disease. The long-term goal of our work is to overcome the limitations of the current strategies to study the non-coding genome. Our overall vision is to push our understanding of the mechanisms of action of non-coding sequences on gene expression at nucleotide resolution and single-cell level. The main objective of this proposal is a multi-scale discovery and dissection of regulatory elements by combining CRISPR targeted perturbations and single cell assays with two key goals: (1) Uncover functional non-coding elements with unbiased and generalizable approaches for different cell types and elucidate their regulatory grammar and mechanisms of action on gene expression. (2) Study how endogenous or perturbation induced mutations in non-coding sequences are reflected in gene expression programs at a single-cell level. To pilot our conceptual framework, we will study the non- coding regulatory elements within the BCL11A gene, a master regulator of the hemoglobin switch and a therapeutic target for sickle cell disease (SCD) and ?-thalassemia. In fact, several clinical trials (e.g. NCT03432364) are underway aiming to disrupt regulatory sequences at BCL11A as a therapy for these ?- hemoglobin disorders. This is one of the most well-studied loci identified by GWAS and is a system in which we have extensive experience and expertise. However, our approaches will also be generally applicable to other loci linked to traits or diseases. At the end of this project we will provide a general framework and user-friendly computational tools to study the function and the structure of non-coding regulatory sequences generalizable to different perturbation screens, regulatory regions, cell types and phenotypes. Importantly, all the computational tools developed in this proposal will be shared with NHGRI funded consortia with similar goals such as ENCODE, and with the broader scientific community. We anticipate that the proposed research could have a positive translational impact providing the foundation to develop strategies involving non-coding sequence perturbations with direct therapeutic potential for human disease.

Public Health Relevance

Non-coding sequence variation affects gene expression and ultimately development and disease. This application provides a general framework for multi-scale dissection of non-coding regions to link single nucleotide changes to transcriptional changes at the single cell level. This work will generate insights and testable hypotheses for the study of causal genetic variants underlying complex traits and human diseases.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Unknown (R35)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Pazin, Michael J
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts General Hospital
United States
Zip Code