Overview of the computational pipeline. Figure 1 summarizes the modeling pipeline that we propose to apply to functional assignment (software and web resources associated with each step are written in italics, red and blue, respectively). We will be guided by the Superfamily/Genome Core in choosing which sequences to model. These will contain, as a subset, all of the sequences produced by the Protein Core, as well as other enzymes found in operons with the target enzymes. Homology models will be created when structures are not available, including in cases where crystallization will be attempted by the Structure Core. Libraries of known metabolites as well as """"""""fragment"""""""" compounds will be docked against the structures and models in their ground state and, when possible, high-energy intermediate (HEI) forms. Predicted proteinlig and complexes will be refined and re-ranked after docking using a higher level of theory (all-atom force fields and implicit solvent), with the protein treated as flexible. The docking hit lists will be analyzed in an automated manner using cheminformatics methods that the Shoichet group previously developed for drug discovery applications. Finally, work is undenway to merge the protein and ligand sampling modules, by creating innovative hybrid methods The enhancements to the computational pipeline described below are motivated by 1) challenges that we have identified for the """"""""new"""""""" superfamilies (GST, HAD, and IS), 2) our goal of extending the computational methods to apply to all enzyme superfamilies, and 3) our goal of automating the computational methods, such that they are ultimately usable by the community via web interfaces (Section 3). Some of the proposed enhancements build on preliminary tests that we have performed for the AH and EN superfamilies, supported by P01 GM071790. The focus here is on generalizing these approaches, so there is no overlap. Other, somewhat more speculative, computational methods development that is planned with the support of P01 GM071790, such as treatment of ligand entropy losses and automated prediction of protein and ligand protonation states, will also be added to the general pipeline if they are successful in initial tests on the AH and EN superfamilies. An important step towards a general method: Docking """"""""fragment-like"""""""" molecules to expand chemotype exploration. A fruitful choice made in our prior work was to restrict our docking calculations to ~10,000 known metabolites. If the enzyme targeted is involved in primary metabolism, as was Tm0936, this is an appropriate choice;but, if it is not, the true substrate will be missed. Xenobiotics or secondary metabolites represent a particular challenge. It seems prudent, therefore, to expand the chemotypes represented in the database being screened in the initial docking calculation. To do so, we propose to screen a library of 130,000 fragment-like"""""""" molecules. These molecules are small, <17 non-hydrogen (heavy) atoms, and are thought to cover over 15 orders of magnitude more chemotypes than would a similar library of larger molecules [5, 6]. For this reason, they have become a focus of intense interest in inhibitor discovery [7, 8]. As smaller molecules, they will be intrinsically easier to dock, as we have found in docking for inhibitors. Finally, because they are commercially available, they will be straightfonward to acquire and test. We will also convert the 130,000 fragments in the ZINC database [9] to HEI structures. With the support of P01 GM071790, we are preparing HEIs associated with 20 core reactions catalyzed by members of the AH superfamily [10]. Here, we will expand this approach to reactions catalyzed by the enzymes of the EN, GST, HAD, and IS superfamilies. These HEI fragments will then be docked against the benchmarking set of enzymes of known structure and function to see if they can recapitulate the substrate enrichments found with larger molecules. For example, when docking against Tm0936, will adenine, which at 11 heavy atoms is certainly a fragment, rank as well, compared to the fragment decoys, as does S-adenosyl homocysteine (SAH) against the metabolite decoys? Will it show the selectivity compared to guanosine and cytidine analogs observed with the larger metabolite docking? These questions will be definitively answered by retrospective calculations. It is conceivable that this approach will not succeed. Wolfenden [11] and others have shown that when a substrate is deconstructed into fragments its recognition by the enzyme can be severely compromised. It is easy to think of pathological cases where functional groups present in the larger molecules will be critical to recognition and specificity (one will not, for instance, be able to distinguish between adenine and adenosine deaminase using merely the adenine HEI as a docked probe). Conversely, one can imagine building the larger molecules back from the initial chemotypes emerging from the fragment screen: for example, if adenine HEI ranks well, try larger variations in this restricted space. The orders-of-magnitude more chemotypes represented among the fragments compared to the core metabolites, and the ability to actually acquire and test every one of them, makes this approach worth exploring. It has the possibility of substantially increasing the reach and generality of structure-based substrate prediction.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Specialized Center--Cooperative Agreements (U54)
Project #
5U54GM093342-04
Application #
8489141
Study Section
Special Emphasis Panel (ZGM1-PPBC-3)
Project Start
Project End
Budget Start
2013-05-01
Budget End
2014-04-30
Support Year
4
Fiscal Year
2013
Total Cost
$957,731
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
041544081
City
Champaign
State
IL
Country
United States
Zip Code
61820
Gizzi, Anthony S; Grove, Tyler L; Arnold, Jamie J et al. (2018) A naturally occurring antiviral ribonucleotide encoded by the human genome. Nature 558:610-614
Kenney, Grace E; Dassama, Laura M K; Pandelia, Maria-Eirini et al. (2018) The biosynthesis of methanobactin. Science 359:1411-1416
Park, Yun Ji; Kenney, Grace E; Schachner, Luis F et al. (2018) Repurposed HisC Aminotransferases Complete the Biosynthesis of Some Methanobactins. Biochemistry 57:3515-3523
Calhoun, Sara; Korczynska, Magdalena; Wichelecki, Daniel J et al. (2018) Prediction of enzymatic pathways by integrative pathway mapping. Elife 7:
Sheng, Xiang; Patskovsky, Yury; Vladimirova, Anna et al. (2018) Mechanism and Structure of ?-Resorcylate Decarboxylase. Biochemistry 57:3167-3175
Zallot, RĂ©mi; Oberg, Nils O; Gerlt, John A (2018) 'Democratized' genomic enzymology web tools for functional assignment. Curr Opin Chem Biol 47:77-85
Barr, Ian; Stich, Troy A; Gizzi, Anthony S et al. (2018) X-ray and EPR Characterization of the Auxiliary Fe-S Clusters in the Radical SAM Enzyme PqqE. Biochemistry 57:1306-1315
Gerlt, John A (2017) Genomic Enzymology: Web Tools for Leveraging Protein Family Sequence-Function Space and Genome Context to Discover Novel Functions. Biochemistry 56:4293-4308
Koo, Byoung-Mo; Kritikos, George; Farelli, Jeremiah D et al. (2017) Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus subtilis. Cell Syst 4:291-305.e7
Holliday, Gemma L; Brown, Shoshana D; Akiva, Eyal et al. (2017) Biocuration in the structure-function linkage database: the anatomy of a superfamily. Database (Oxford) 2017:

Showing the most recent 10 out of 91 publications