GlyGen Supplement: Develop automatic literature mining tool for extracting context specific glycan-protein data that will enhance the extent and quality of data in GlyGen

Tiemeyer, Michael; Mazumder, Raja

Abstract

With significance in biotechnology, biomedicine, and basic research, glycobiology?s applications are widespread. Technological advancements in the field of glycobiology have expanded in parallel with the influx of an array of data within the glycosciences community. The broad range of experimental approaches, disparate nature of available datasets, and the seemingly piecemeal strategies required to construct comprehensive interpretations create inherent barriers for glycoscience researchers to utilize all available information. The mission of GlyGen has been to target and mitigate such challenges by developing procedures and a platform which integrates or builds upon glycoconjugate structure-function data from different resources. GlyGen, a NIH-funded international effort, captures and integrates over 90% of available glycoconjugate data, harmonizing and managing diverse outputs such as glycans, proteins, and genes integrated with genomics, pathway, and disease information. Since its inception, the GlyGen team has built a user-friendly platform complete with analytical tools and comprehensive, exportable data sets to ease the burden for researchers. Within the Swiss Institute of Bioinformatics (SIB), the Proteome Informatics Group (PIG) has worked extensively to develop the glycoinformatics resource GlyConnect, which focuses on the molecular characterization of protein glycosylation through an integrated, expertly- curated platform, specializing in structure analysis and producing novel data sets, such as site-specific glycan data. Despite each resource?s efforts to mitigate challenges, difficulties in amassing the amalgam of data required to fully examine microheterogeneity within glycobiology still persist. By utilizing their distinct strengths, the proposed collaborative research between GlyGen and GlyConnect will focus on further integrating site-specific protein-glycan data to generate more comprehensive data sets, where increasing the data availability in GlyGen is expected to accelerate basic and translational research. Currently, the major resources for site-specific protein-glycan data are UniCarbKB and UniProtKB, though the amount of available data from these databases, or other similar resources, is not substantial. To address this limitation, GlyConnect and GlyGen will develop an advanced, scalable, and site-specific protein-glycan annotation pipeline. This pipeline will be constructed using existing data in GlyConnect, in addition to roughly 100 publications identified and prioritized through current literature mining efforts in GlyGen. Moreover, front and back-end software developments will be implemented on the GlyGen platform, allowing glycoscience researchers to submit site-specific glycan data through a validated submission system. The proposed research will create a standardized methodology for more efficient data submission efforts, expand on the available site-specific protein-glycan data for the glycobiology community, and facilitate data sharing amongst glycoscience researchers. 1

Public Health Relevance

The proposed research will focus on improving coverage of site-specific annotations in GlyGen and result in the development of an efficient methodology for extracting and integrating site-specific glycoconjugate data into glycoinformatics resources. Along with documenting methodology (automated and manual rules), the extracted annotations will be easily accessed by the entire community via the GlyGen portal. The annotations will be linked back to the corresponding resources to increase and maintain the data flow across valuable bioinformatics databases. 2

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project--Cooperative Agreements (U01)
Project #: 3U01GM125267-04S1
Application #: 10154002
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Melillo, Amanda A

Project Start: 2017-09-01
Project End: 2022-05-31
Budget Start: 2020-06-01
Budget End: 2021-05-31
Support Year: 4
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Georgia
Department
Type: Organized Research Units
DUNS #: 004315578

City: Athens
State: GA
Country: United States
Zip Code: 30602

Related projects


NIH 2020 U01 GM	Computational and Informatics Resources and Tools for Glycoscience Research Tiemeyer, Michael; Mazumder, Raja / University of Georgia
NIH 2020 U01 GM	GlyGen Supplement: Develop automatic literature mining tool for extracting context specific glycan-protein data that will enhance the extent and quality of data in GlyGen Tiemeyer, Michael; Mazumder, Raja / University of Georgia
NIH 2019 U01 GM	Computational and Informatics Resources and Tools for Glycoscience Research Tiemeyer, Michael; Mazumder, Raja / University of Georgia
NIH 2019 U01 GM	Computational and Informatics Resources and Tools for Glycoscience Research Tiemeyer, Michael; Mazumder, Raja / University of Georgia
NIH 2018 U01 GM	Computational and Informatics Resources and Tools for Glycoscience Research York, William S.; Mazumder, Raja / University of Georgia
NIH 2017 U01 GM	Computational and Informatics Resources and Tools for Glycoscience Research York, William S.; Mazumder, Raja / University of Georgia

Publications

(2018) Meeting Report of the International Life Science Integration Workshop 2018. Glycobiology 28:552-555

Comments

Be the first to comment on Michael Tiemeyer's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: