The goal of this research is to lay the foundations for what we call Genomic Enzymology, the extension of our understanding of enzyme chemistry to include the structural context. To accomplish this, we propose to investigate how nature re-engineers protein structures for new functions by studying enzyme superfamilies whose members have diverged to perform substantially different overall chemical reactions. We focus on enzymes because the relationships between protein structure and function can be more easily identified than in other classes of proteins. This is because the individual steps of the chemical reactions can be mapped explicitly to specific elements of their associated structures. To accomplish this goal, we propose 1) to cluster the known enzyme sequences into their respective superfamilies at divergence distances that are not currently available; 2) to examine the reactions of the enzymes in these superfamilies to distinguish the fundamental steps of their mechanisms that can be associated with the common elements of the superfamily architecture; 3) to use this information to generate a correlated structure/function """"""""fingerprint"""""""" for each superfamily that can be used to infer important overall and sub-group properties of the member enzymes, and 4) to organize the results into an internet-accessible database that can be used and further developed by the scientific community. For enzyme superfamilies that we have previously investigated, this work has led to the correct prediction for unknown reading frames, identification of new functions for previously characterized enzymes and insight into the fundamental aspects of enzyme mechanism for proteins that had been only poorly characterized. As proposed in this work, we expect that a systematic investigation of the universe of enzyme superfamilies will provide a conceptual framework for prediction of enzyme function for the many unknown reading frames that have been generated by the genome projects. Our proposal for identification of sequence elements associated with specific sub-groups within an enzyme superfamily will be useful for refining the assignments of function generated by automated annotation of genomic data. Finally, by providing a better understanding into the functional opportunities and constraints nature has engineered into a particularly superfamily scaffold, we expect that this research will provide information useful for reengineering of proteins in the laboratory and for rational drug design.
Showing the most recent 10 out of 71 publications