The Protein Data Bank (PDB) contains more than 100,000 3D structures of proteins, many of which are directly relevant to human health and disease. Up to 10% of these structures contain carbohydrates as ligands or as post-translational modifications. While numerous tools exist to curate protein 3D structural data, no such tools have been adopted by the PDB as part of the validation checks performed upon coordinate deposition. This oversight has resulted in a large number of errors and inconsistencies in annotation and structure in the carbohydrate structural data. Here we will work with the World Wide PDB (wwPDB) to develop and implement tools to address these issues as part of a broader carbohydrate remediation initiative at the PDB. At the present time there are two serious problems that hinder the utilization of carbohydrate data stored in the protein data bank (PDB): 1) There is an unacceptably high proportion of errors in the deposited coordinates. 2) No convenient interface exists for searching for carbohydrate structures in the PDB. We will generate a software tool called ?GlyProbity for checking the accuracy and internal consistency of 3D structures of carbohydrates, and then implement this tool for the data remediation. In addition, GlyProbity will be provided as a stand-alone interface that may be used by crystallographers to validate carbohydrate structures prior to deposition in the PDB and by other researchers to validate structures obtained in any manner. Lastly, we will create a search interface, ?GlyFinder? to be implemented at GLYCAM-Web that will greatly simplify the task of locating relevant carbohydrate containing structures. Taken together, these aims should significantly impact the development of glycomimetic therapeutics, as well as the generation of structure/function relationships in glycobiology, and will be essential for achieving interoperability with additional databases or data mining services in the future.

Public Health Relevance

Because of non-standard annotation, and the absence of carbohydrate-specific structure validation, carbohydrates in structures deposited in the Protein Data Bank (PDB) frequently contain errors. This project will generate software for remediating carbohydrate structural and annotation errors in the PDB, and for simplifying the task of locating and retrieving carbohydrate-containing structures. The results will assist specialists and non-specialists working with carbohydrate structural data, and will facilitate future interoperability between the PDB and glyco-informatic websites.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project--Cooperative Agreements (U01)
Project #
5U01CA221216-02
Application #
9528535
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Krueger, Karl E
Project Start
2017-08-01
Project End
2020-07-31
Budget Start
2018-08-01
Budget End
2019-07-31
Support Year
2
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of Georgia
Department
Type
Organized Research Units
DUNS #
004315578
City
Athens
State
GA
Country
United States
Zip Code
30602