The ENCODE project has generated comprehensive maps of cis-regulatory elements (CREs) controlling the transcription of genes within the human genome. These maps have been crucial in our efforts to understand sequence variants linked to human traits and disease, as the majority of these variants are non- coding regulatory changes rather than amino acid substitutions. However, even though we know the locations of thousands of CREs, our understanding of how they operate is derived from a relatively small set of well- described examples. Therefore, we plan to directly characterize the function of ENCODE CREs at a genome- wide scale in multiple cell-types. This will transition the field of functional genomics from a simple map of regulatory elements towards a deep understanding of the fundamental rules governing regulatory logic down to the basepair resolution. Achieving this will dramatically expand ENCODE's utility by strengthening our ability to interpret the effects of natural human variation on gene regulation. We propose to directly measure regulatory activity of over 3% of the genome, pursuing loci highlighted as important by ENCODE and other functional data. We will first apply computational methods to identify the most biologically informative CREs, representing a diversity of regulatory logic and architecture, and will use machine learning techniques to prioritize functional variants for characterization relevant to common and rare human diseases, traits, and adaptation. Of these we will select 200,000 CREs and 300,000 variants, representing 100 Mb of genomic sequence, and characterize them using the massively parallel reporter assay (MPRA) to understand each element's regulatory activity. Then, to complement data from the MPRA, we will characterize additional 1 Mb regions across 10 loci using CRISPR-based non-coding screens to build a comprehensive picture of these loci. This strategy leverages the throughput and flexibility of MPRA while maintaining the connectivity of regulatory logic in the CRISPR-based screens, which perturb elements within their endogenous genomic context. This will help us judge the accuracy and completeness of ENCODE, while also providing data from both approaches to address a wide-variety of research questions. These methods are difficult to apply to disease relevant primary cells at full scale, but we will use the results of our MPRA and CRISPR screens to inform our models and better predict the fundamental rules of regulatory logic. We will then construct smaller, targeted libraries to test disease-specific variants in primary cells and use assays specific for each of three autoimmune diseases: type 1 diabetes, inflammatory bowel disease, and lupus. This approach will inform the research community on the rules governing the activity of the CREs mapped by the ENCODE project, and will simultaneously provide concrete information about the function of hundreds of thousands of sequence variants relevant for human traits, health, and disease.
In our proposal we seek to extend the efforts by the ENCODE consortium and others who have made significant strides towards mapping the regulatory landscape of the human genome. We will apply large-scale functional characterization methods to directly test over 3% of the human genome for cis-regulatory activity. In doing so, we will create a resource that will improve our ability to pinpoint regulatory elements in our genome, increase our understanding of how they function, and aid in our ability to link genetic variation to human health and disease.