Sequence analysis of the human genome has identified approximately25,000 protein-coding genes, but little is known about how most of these genes are regulated in different tissues and stages of development. A number of approaches attempt to identify gene regulatory elements on a genome-wide scale, but there is yet to be a proven method that accurately accomplishes this goal. Only with the development of better experimental technologies will the rapid identification of regulatory elements be possible. Mapping DNase hypersensitive (HS) sites has been the gold-standard method for identifying the location of promoters, enhancers, silencers, and insulators. While this method has been proven invaluable for identifying the location of active regulatory elements for individual genes, the labor-intensive nature of this technique has limited its application to only a small number of human genes. We have developed a novel protocol to generate a genome-wide library of gene regulatory sequences by cloning DNase HS sites. As a pilot, we generated a library of DNase HS sites from quiescent primary human CD4+ T cells and analyzed 5,600 of the resulting clones. Compared to sequences from randomly generated in silico libraries, sequences from these clones were found to map more frequently to regions of the genome known to contain regulatory elements, such as regions upstream of genes, within CpG islands, and in sequences that align between mouse and human. Validation of putative regulatory elements was achieved by repeated recovery of the same sequence (clustering), and by real-time PCR. To distinguish all valid DNase HS sites from background, we estimate it is necessary to sequence approximately 1million clones. To generate this number of sequences affordably and rapidly, we will employ massively parallel signature sequencing (MPSS), a bead-based technology capable of generating 1million sequence tags per run. Preliminary data show that this technology is readily adaptable to our cloning protocol and can efficiently capture DNase HS sites. MPSS will be used to sequence DNase HS libraries from a number of different cell types, includinghuman embryonic stem cells. Comparisons to comparative genomics will determine the degree to which human DNase HS sites are shared among different species. Functional characterization of a representative sample of DNase HS sites around interesting loci of the genome will identify how these regions of the genome are regulated. Characterizing different DNase HS sites libraries will allow for a better understanding of the chromatin differences that delineate tissue specificity, housekeeping function, cell activation, pluripotency, and early cell differentiation.
Showing the most recent 10 out of 11 publications