This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. GlycO is a highly expressive ontology that embodies knowledge of glycan structure and the relationships between the structure of glycans and their participation in biological processes. With GlycO we are aiming at a general description of the glycobiology domain that consists of a robust schema and large knowledgebase. The schema conceptually defines classes (e.g., """"""""the class containing all N-glycans"""""""") to which specific instances are assigned and the knowledgebase is comprised of instances (e.g., a specific glycan structure) and specific relationships between instances. The schema allows reasoning about the concepts by exploiting the Web Ontology Language OWL-DL (based on Description Logic) to place restrictions on relationships. This provides the basis for automated population of the knowledge base, a process in which new instances are added and classified. The information needed to populate the GlycO knowledgebase can be automatically extracted from several partially overlapping sources, including the Kyoto Encyclopedia of Genes and Genomes (KEGG), Glycosciences.de databases (SweetDB), and the Complex Carbohydrate Structural Database (CARBBANK). In order to avoid multiple entries of identical structures, transformation and disambiguation techniques are applied. The ultimate goal is to generate a large ontology that can be used for the annotation, retrieval and processing of information regarding glycan structure-function relationships and the discovery of the knowledge implicit in that information. In GlycO, structural information is modularized at the instance level. That is, structures are composed of canonical building blocks that can be reused in different chemical contexts. Larger structures (e.g.,""""""""glycans"""""""") are composed of smaller canonical building blocks (carbohydrate_residues), which are, in turn, composed of even smaller canonical building blocks (carbohydrate_residue_atoms). The building blocks contain specific structural information, such as the absolute configuration, ring form, and anomeric configuration of a carbohydrate residue, and contextual information, such as the location of the residue in the glycan. For collections of glycans, such as N-glycans, that have significant overlap in their biosynthesis, each of the constituent canonical residues embodies contextual information that can be correlated to its interactions with biosynthetic enzymes and other biochemical entities. Thus, simply listing the residues that make up a glycan provides a description of the glycan structure and an implicit description of the biological processes (e.g., biosynthesis, catabolism, signal transduction) that it participates in. In GlycO, the links between canonical residues are expressed as full-fledged nodes. Such promotion of an edge to a node is often referred to as """"""""reification"""""""" of a relationship. This schema allows the reified links to be specified in hierarchical terms. That is, the link between two canonical residues (e.g., the proximal core ?-GlcpNAc residue and an Asn residue) explicitly embodies a more finely grained link between two atoms (i.e., the canonical carbohydrate_residue_C1_atom and the canonical amino_acid_residue_N4_atom). These links are themselves canonical objects that can be reused in a modular fashion. For example, the connection between a particular N-glycan and a given peptide corresponds to a higher level link, which, in turn, embodies the canonical link between the canonical core ?-GlcNAc residue and the canonical Asn residue. Thus, the schema provides for the complete, unambiguous specification of a complex structure simply by listing its highest level components and links. This not only provides enhanced computability and logical consistency, it provides new classes of objects (links) that can be associated with particular processes. For example, the link described above is biosynthetically formed and hydrolyzed by specific enzymes (ologosaccharyl transferase and PNGase-F, respectively).
Showing the most recent 10 out of 104 publications