With significance in biotechnology, biomedicine, and basic research, glycobiology?s applications are widespread. Technological advancements in the field of glycobiology have expanded in parallel with the influx of an array of data within the glycosciences community. The broad range of experimental approaches, disparate nature of available datasets, and the seemingly piecemeal strategies required to construct comprehensive interpretations create inherent barriers for glycoscience researchers to utilize all available information. The mission of GlyGen has been to target and mitigate such challenges by developing procedures and a platform which integrates or builds upon glycoconjugate structure-function data from different resources. GlyGen, a NIH-funded international effort, captures and integrates over 90% of available glycoconjugate data, harmonizing and managing diverse outputs such as glycans, proteins, and genes integrated with genomics, pathway, and disease information. Since its inception, the GlyGen team has built a user-friendly platform complete with analytical tools and comprehensive, exportable data sets to ease the burden for researchers. Within the Swiss Institute of Bioinformatics (SIB), the Proteome Informatics Group (PIG) has worked extensively to develop the glycoinformatics resource GlyConnect, which focuses on the molecular characterization of protein glycosylation through an integrated, expertly- curated platform, specializing in structure analysis and producing novel data sets, such as site-specific glycan data. Despite each resource?s efforts to mitigate challenges, difficulties in amassing the amalgam of data required to fully examine microheterogeneity within glycobiology still persist. By utilizing their distinct strengths, the proposed collaborative research between GlyGen and GlyConnect will focus on further integrating site-specific protein-glycan data to generate more comprehensive data sets, where increasing the data availability in GlyGen is expected to accelerate basic and translational research. Currently, the major resources for site-specific protein-glycan data are UniCarbKB and UniProtKB, though the amount of available data from these databases, or other similar resources, is not substantial. To address this limitation, GlyConnect and GlyGen will develop an advanced, scalable, and site-specific protein-glycan annotation pipeline. This pipeline will be constructed using existing data in GlyConnect, in addition to roughly 100 publications identified and prioritized through current literature mining efforts in GlyGen. Moreover, front and back-end software developments will be implemented on the GlyGen platform, allowing glycoscience researchers to submit site-specific glycan data through a validated submission system. The proposed research will create a standardized methodology for more efficient data submission efforts, expand on the available site-specific protein-glycan data for the glycobiology community, and facilitate data sharing amongst glycoscience researchers. 1

Public Health Relevance

The proposed research will focus on improving coverage of site-specific annotations in GlyGen and result in the development of an efficient methodology for extracting and integrating site-specific glycoconjugate data into glycoinformatics resources. Along with documenting methodology (automated and manual rules), the extracted annotations will be easily accessed by the entire community via the GlyGen portal. The annotations will be linked back to the corresponding resources to increase and maintain the data flow across valuable bioinformatics databases. 2

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Melillo, Amanda A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Georgia
Organized Research Units
United States
Zip Code