There are >50,000 active origins in the typical somatic genome. Owing to this complexity, only a few origins have been identified by molecular biological approaches, and then only after enormous cost and expenditure of effort. Many genes are replicated early in S-phase in cell types in which they are active but late when they are inactive, suggesting interplay between origin activity and transcription during development. Based on biochemical analyses of the handful of validated origins, there appear to be two types: 1) broad zones of inefficient initiation sites, and 2) highly preferred start sites. We propose that the genome is peppered at intervals of 1 kb or less with a hierarchy of potential initiation sites, a subset of which has evolved into true replicators. Usage of any given site is proposed to be regulated by local gene activity and chromatin architecture. The completion of high-quality human, murine, and rat genome sequences, as well as the advent of microarrays, provides a unique opportunity to perform global analyses of the distribution, structure, and regulation of replication origins in order to address this model. We have developed a novel gel-trapping strategy for isolating virtually pure origin-containing fragments, and have prepared pure libraries of origins from CHO and human cells.
Specific aims of the proposal are: 1) To test the proposal that active origins will be confined to transcriptionally-active chromosomal domains, but will be confined to intergenic regions. A comprehensive human origin library (or the uncloned starting material) will be used to probe high-density microarrays from human chr 21 and 22 under saturating conditions. The distribution of active origins vis-a-vis active genes will be determined by probing the microarrays with cDNAs from the same cells. These studies will also provide a preliminary assortment of active origins into fixed sites versus zones. 2) To prepare high-resolution origin preparations to identify fixed initiation sites that could correspond to replicators. The gel-trapping procedure will be refined to allow isolation of smaller origin-containing fragments; alternatively, very short, origin-centered, nascent DNAs will be synthesized in vitro in early-S-phase nuclei. The resulting materials will be used as probes on the chr 21&22 arrays. The distribution of fixed origins vis-a-vis active genes will indicate whether they have evolved specifically in the neighborhoods of developmentally regulated genes or gene clusters. 3) To isolate early-, mid-, and late-firing origins and determine how their activation times relate to their genetic and epigenetic signatures. Origin libraries will be prepared from synchronized cells at selected time points by the standard bubble-trapping procedure and hybridized to the chr 21&22 arrays. Comparison to the results of Aims 1&2 will indicate differential effects of local transcription on the time of origin activation, as well as the apparent efficiency of origin utilization. 4) To use computational approaches to identify the most common sequence motifs among fixed origins, and determine whether their positions are conserved among humans, mice, and rats. Those origins characterized in Aims 2&3 that appear to correspond to single sites or circumscribed zones will be analyzed for common compositional bias, periodicity, fold-back potential, DNA unwinding elements, and conventional sequence motifs to uncover commonalities that might serve a replicator function. Their conservation among humans, mice, and rats will also be determined by comparisons among the genome databases by standard computational methodologies. 5) To test the hypothesis that active origins and genes reside in common chromatin domains with unique architectures. The distributions of modified histones and selected other proteins on chr 21 and 22 will be analyzed by the ChlP-on-ChIP approach, using a variety of antibodies to relevant proteins. By comparing these data to the distributions identified in Aim 1. we will define aspects of chromatin architecture that characterize active origins, origin clusters, and/or local genes. ? ? ?