Laying the Foundation of Genomic Enzymology

Babbitt, Patricia

Abstract

Of the >12 million protein sequences now in public databases, only a tiny proportion have been experimentally characterized, necessitating assignment of molecular function for the future almost exclusively by computational methods. Many enzymes can be classified as members of functionally diverse superfamilies, each containing a group of evolutionarily related proteins descended from a common ancestor but diverged to catalyze many different chemical reactions using sometimes highly dissimilar substrates. A large number of these superfamilies contain thousands of proteins, challenging our abilities to manage data and information about them or even to determine for which proteins experimental characterization could be best leveraged for functional annotation or mechanistic insight about their homologs. Additionally, because of their underlying structural similarities, their computational prediction of molecular function is difficult and plagued by high levels of misannotation. The overall goals of this proposal are to contribute to our understanding of these structure-function relationships to improve computational annotation of many enzymes, inform experimental design of mechanistic studies and enzyme engineering efforts, and achieve a more informed theory about how nature re-uses ancestral structural templates to evolve many new enzymatic reactions. We have three specific aims. 1) Using high-throughput methods to generate sequence and structure similarity networks, we propose to computationally characterize on a large-scale functionally diverse enzyme superfamilies, mapping known molecular functions and other biological information to the network clusters to reveal functional trends from the context of sequence and structural similarity. The results will be disseminated as an online resource for """"""""genomic enzymology,"""""""" providing networks, alignments, and interactive tools so that users can query this information for their own studies. We will validate our results for a few superfamilies through collaboration with experimental groups with deep expertise in each. Many of these superfamilies are important in human health and disease, including the caspases (human cancer/apoptosis), proteases in parasites (drug targets for orphan diseases), and strictosidine synthases (engineering enzymatic synthesis of new drug precursors). 2) We will deduce and compare network topologies across our target superfamilies to identify global patterns of functional divergence that reflect the underlying structure-function """"""""strategies"""""""" nature has used to evolve divergent reactions. This will provide a new view of how function evolves across the enzyme universe. 3) In analogy to sequence and structural comparisons, we will compare overall reactions, mechanistic steps, substrates and substrate sub-structures to deduce relationships among the characterized reactions in superfamilies targeted in Aim 1. This orthogonal approach for distinguishing subgroups and families will enhance our ability to predict functional features of unknowns and provide clues about how new enzyme functions partition across each superfamily topology.

Public Health Relevance

Of the >12 million protein sequences available in public databases, the functions of only a tiny proportion have been experimentally determined, limiting the use of the genome projects for understanding human health and disease. We will use computational approaches to link enzyme sequences and structures with known enzyme reactions and substrates in an easy-to-use representation useful to many scientists for inferring functional properties and to help guide drug target selection.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM060595-12
Application #: 8463551
Study Section: Macromolecular Structure and Function E Study Section (MSFE)
Program Officer: Anderson, Vernon

Project Start: 2000-03-01
Project End: 2015-04-30
Budget Start: 2013-05-01
Budget End: 2014-04-30
Support Year: 12
Fiscal Year: 2013
Total Cost: $330,947
Indirect Cost: $109,744

Institution

Name: University of California San Francisco
Department: Pharmacology
Type: Schools of Pharmacy
DUNS #: 094878337

City: San Francisco
State: CA
Country: United States
Zip Code: 94143

Related projects

Publications

Holliday, Gemma L; Akiva, Eyal; Meng, Elaine C et al. (2018) Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a ""Plug and Play"" Domain. Methods Enzymol 606:1-71

Davidson, Rebecca; Baas, Bert-Jan; Akiva, Eyal et al. (2018) A global view of structure-function relationships in the tautomerase superfamily. J Biol Chem 293:2342-2357

Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B et al. (2017) An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences. Protein Sci 26:677-699

Finn, Robert D; Attwood, Teresa K; Babbitt, Patricia C et al. (2017) InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res 45:D190-D199

Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal et al. (2017) Evaluating Functional Annotations of Enzymes Using the Gene Ontology. Methods Mol Biol 1446:111-132

Harper, Angela F; Leuthaeuser, Janelle B; Babbitt, Patricia C et al. (2017) An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins. PLoS Comput Biol 13:e1005284

Holliday, Gemma L; Brown, Shoshana D; Akiva, Eyal et al. (2017) Biocuration in the structure-function linkage database: the anatomy of a superfamily. Database (Oxford) 2017:

Akiva, Eyal; Copp, Janine N; Tokuriki, Nobuhiko et al. (2017) Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily. Proc Natl Acad Sci U S A 114:E9549-E9558

LeVieux, Jake A; Baas, Bert-Jan; Kaoud, Tamer S et al. (2017) Kinetic and structural characterization of a cis-3-Chloroacrylic acid dehalogenase homologue in Pseudomonas sp. UW4: A potential step between subgroups in the tautomerase superfamily. Arch Biochem Biophys 636:50-56

Showing the most recent 10 out of 71 publications

Comments

Be the first to comment on Patricia Babbitt's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: