With the sequence of the human genome now complete, and the sequence of many other model organism genomes moving rapidly forward, there is a pressing need to identify the sequences involved in regulation of expression of the 20,000 - 25,000 human genes. That includes promoters, enhancers, insulators, silencers, and other structures that signal the transcriptional apparatus to recognize (or not) a particular gene for expression. For more than 20 years, the gold standard method for identifying the genomic location of such regulatory signals has been the use of DNAse hypersensitivity analysis. Histones that normally coat the DNA strand are stripped off in regions of the genome that are the site of active binding of specific regulatory factors, rendering these regions accessible to the action of dilute concentrations of DNAse. While this method has produced valuable data for a few dozen genes, it is very laborious and has not previously been amenable to genome-wide application. We have devised a method to apply this approach to the entire genome of a particular tissue or cell line. Nuclei are exposed to DNAse in the traditional fashion, but then the digested ends are polished, the DNA is cut with a restriction enzyme, and the specific fragments that have one blunt end (from DNAse) and one sticky end (from the restriction enzyme) are captured and sequenced from the blunt end. This can be accomplished by direct sequencing or by using a modification of the SAGE approach. More recently we have adapted the bead-based method known as massively parallel signature sequencing (MPSS), which generates hundreds of thousands of 20 base pair tag sequences from a single experiment. In experiments generating such tags from primary human CD4+ T-cells, the captured sequences were significantly enriched for segments that lie just upstream or in the first exon or intron of known genes. Further validation of the captured sequences by a real-time PCR approach indicated that about 90% of the sequence tags that occur in clusters correspond to genuine DNAse hypersensitivity sites. We have joined the ENCODE consortium, and are now exploring the scale up of this effort to generate an entire profile of regulatory sequences from multiple cell lines and tissues. Finally, we have developed a new method of identifying DNAse hypersensitivity sites using hybridization to high density oligonucleotide arrays from Nimblegen. The correlation of this analog data with the digital data from MPSS is highly instructive.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Intramural Research (Z01)
Project #
1Z01HG200304-03
Application #
7147974
Study Section
(GTB)
Project Start
Project End
Budget Start
Budget End
Support Year
3
Fiscal Year
2005
Total Cost
Indirect Cost
Name
Human Genome Research
Department
Type
DUNS #
City
State
Country
United States
Zip Code
Bernat, John A; Crawford, Gregory E; Ogurtsov, Aleksey Y et al. (2006) Distant conserved sequences flanking endothelial-specific promoters contain tissue-specific DNase-hypersensitive sites and over-represented motifs. Hum Mol Genet 15:2098-105
Crawford, Gregory E; Holt, Ingeborg E; Whittle, James et al. (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16:123-31
Crawford, Gregory E; Davis, Sean; Scacheri, Peter C et al. (2006) DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods 3:503-9
Lewinski, Mary K; Yamashita, Masahiro; Emerman, Michael et al. (2006) Retroviral DNA integration: viral and cellular determinants of target-site selection. PLoS Pathog 2:e60
ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306:636-40
Crawford, Gregory E; Holt, Ingeborg E; Mullikin, James C et al. (2004) Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc Natl Acad Sci U S A 101:992-7