The completion of the human genome sequence presents unprecedented opportunities and challenges to biologists. While the access to complete sequences of genes whose protein products are well known provides new opportunities for testing hypotheses, the availability of sequences of genes with no known function poses critical challenges. To develop a model for how to predict functions of uncharacterized genes using bioinformatics, we will search the human genome for new DNA repair genes and then confirm their identity by testing the putative repair proteins for the predicted biochemical activities. The R21 part of the proposal will use mainly homology based methods to search for potential human DNA glycosylases. These tools will include development of improved sequence profiles of glycosylase families and utilization of these profiles along with the structural information for threading analysis of the human proteome. The R33 part of the proposal will use non-homology based methods including identification of catalytic centers and an associative search for DNA glycosylases in other known DNA modifying enzymes. Additionally, two other classes of DNA repair enzymes will be included in our search. This work will be a collaboration between three research groups; one with expertise in the development bioinformatics tools, a second group with extensive experience in application of this software and the last group with expertise in the biochemistry of DNA repair enzymes. While the first two groups will establish the necessary computer hardware, develop new software and perform the analysis that will predict new DNA repair genes, the latter group will set-up the necessary biochemical tests for the putative repair enzymes, clone the corresponding cDNAs into Escherichia coli and test them for activity. This integrated prediction-validation approach should be superior to a purely bioinformatics or a purely biochemical approach and may serve as a paradigm for searching biochemical functions in genomes of all organisms.