This application describes a project that aims to answer the question, do some parts of the human genome function by virtue of their structure, and not directly by their nucleotide sequence? To address this question a structural map of the 30 Mb of the ENCODE regions of human genomic DNA, at single-nucleotide resolution, has been produced, based on hydroxyl radical cleavage patterns of DNA. A freely available database, ORChID (OH Radical Cleavage Intensity Database), was constructed to house hydroxyl radical cleavage data. An algorithm was developed to predict the hydroxyl radical cleavage pattern of any DNA sequence to high accuracy. This algorithm was applied to the ENCODE regions of the human genome, and the resulting data were deposited in the UCSC genome browser. This dataset of DNA structural data can be searched for structural patterns in genomic DNA that may be associated with biological function. The first objective of this project is to develop high-throughput methods for collecting hydroxyl radical cleavage data, to enhance the predictive power of the ORChID database. The second objective is to locate structural features in human genomic DNA that are under selective evolutionary pressure, but for which the exact nucleotide sequence is not under selection. Experimental data production and informatics pipelines, data verification and validation protocols, and plans for data deposition are detailed in the application.
The aim of this research is to understand how the local shape and structure of DNA, and not just the sequence of nucleotides, contribute to the storage and readout of the information that is contained in the human genome. This work will lead to a deeper understanding of how the human genome works, a necessary step in the next phase of genome-based medicine.