Since the human genome was first sequenced a decade ago, researchers have made great strides in identifying the genomic locations of many kinds of functional elements, including the sequences that control gene regulation. Nevertheless, the primary focus to date has been to catalog individual regulatory elements, without regard for their dynamic behavior or interactions. In this proposal, we outline an innovative approach for both identifying sequences critical for gene regulation and characterizing their dynamic interactions. Our proposal involves combining a powerful method for directly measuring the expression of genes, called PRO-seq, with an adaptation of DNase-seq, a method for identifying positions in the genome at which gene-regulating transcription factors are bound. We propose to apply these methods in a time course after stimulation of an inducible system to obtain dynamic, genome-wide information about both binding and expression, focusing in particular on stress responses induced by the small molecular celastrol in the immortalized K562 leukemia cell line. Because neither PRO-seq nor DNase-seq depends on antibodies to particular transcription factors, or on the technique of chromatin immunoprecipitation, we describe this approach as factor-general and ChIP-free. Our proposal has three main aims: (1) to identify and characterize transcription units using PRO-seq;(2) to identify and characterize the binding sites for many transcription factors using DNase-seq;and (3) to integrate these dynamic patterns of transcription and binding to reveal networks of interaction between regulatory sequences and transcription units. Each of these aims involves the development of new statistical models and computational methods. Our newly generated data, our predictions, and our software will all be made publicly available.
(unchanged from original) We propose to make use of powerful experimental technologies and computational methods to shed new light on the mechanisms of gene regulation in human cells. Gene regulation is a critical link between genotype and phenotype, and is implicated in many human diseases.