The overall aim of the ENCODE project is to comprehensively identify functional elements in the human genome. Currently applicable high-throughput technologies, such as RNA-Seq, ChIP-Seq, and DNase-Seq, exploit patterns of marks to infer the role of specific sequences, but generally fall short of functionally interrogatig and thereby validating these predictions. To address this gap, we propose a novel paradigm for the massively parallel functional testing of candidate regulatory elements. In preliminary work, we have developed a system whereby sequence-based transcribed barcodes enable the extensive multiplexing of classic reporter assays, in vitro or in vivo. Here, we propose to adapt this approach for testing tens-of- thousands of human regulatory elements in single assays, and furthermore to shift these assays from an episomal to a chromosomal context.
Our specific aims are: (1) To develop high-throughput methods to clone, by capture or by synthesis, large numbers of candidate regulatory elements and to link them to transcribed, synthetic barcodes within complex populations of reporter vectors. (2) To test in parallel tens-of-thousands of candidate regulatory elements nominated by liver ChIP-Seq for in vitro and in vivo activity using HepG2 transfections and the hydrodynamic tail vein assay, with RNA-Seq of the synthetic barcodes serving as a single readout for the differential activity of distinct candidate regulatory elements. (3) To develop a similarly multiplexed lentiviral assay for regulatory element analysis that is chromosomally based and generically applicable to diverse cell and tissue types. We anticipate that these methods can be scaled for the efficient, in vivo functional testing of large numbers of candidate regulatory elements nominated by other technologies. Furthermore, our approach can easily be adopted by other researchers and used for many related goals, such as testing which regulatory elements work together, dissecting the fine-scale architecture of individual regulatory elements, and evaluating the performance of synthetic regulatory elements.
As we enter an era of personalized medicine, a deep understanding of the human genome will be increasingly important to public health, contributing towards the unraveling of the genetic basis of human disease, as well as serving an increasing role in clinical diagnostics. Regulatory sequences in the human genome, that is, sequences that are functionally important but do not encode proteins, are clearly of fundamental importance but are nonetheless poorly understood. This project will develop novel technologies for the parallel validation of large numbers of candidate regulatory sequences, thereby furthering our understanding of their function.
Showing the most recent 10 out of 25 publications