Our knowledge of human biology and disease has increased tremendously in the past 50 years, primarily due to advances at the molecular level. Perhaps the most familiar example is the ability to study genes via DNA sequencing. But equally important is the study of the function of genes by understanding their protein products. The most general tool for studying proteins has become protein identification via mass spectrometry. A key to its widespread use was the development in the 1990's of software such as SEQUEST and Mascot for automatically identifying proteins from a mass spectrum. It is now recognized that protein function cannot be understood without also considering how proteins are modified. Perhaps the most important of all modifications is the addition of sugars, referred to as glycosylation. At least 50% of all proteins are glycosylated, and the role of such modifications in disease has been firmly established. But the study of glycosylation is currently limited to a small number of laboratories. One of the key bottlenecks is software - analogs of programs like SEQUEST and Mascot for identifying glycans from a mass spectrum. Such software does exist in research form. We propose to take one version, called Cartoonist, and improve its functionality and performance so that it is capable enough to be used in a very wide set of laboratories. We will do this by first working directly with a small but diverse set of research groups to make the software become part of their everyday protocols. Then we will begin wider distribution though multiple channels, including leveraging the existing NIH-funded Consortium for Functional Glycomics and Complex Carbohydrate Research Center. The key components of Cartoonist already exist - they include peak picking, software recalibration, and automatic generation of cartoons, and scoring functions for judging how well a glycan (sugar) can explain the peaks in a spectrum. Working together with a small number of labs, we will determine how to engineer and refine these components to create an easy-to-use and generally useful piece of software. Such software is an essential requirement for expanding the pool of researchers able to detect and measure glycosylation, and increasing the pool is in turn a requirement for a fuller understanding of glycosylation and its role in human disease.
Proteins are frequently modified by glycans (sugars) and these modifications are very important for human health, for example numerous studies show glycans have the potential to be markers for the screening of cancer and the monitoring its treatment. The primary tool for detecting glycans is the mass spectrometer, but interpreting the output of these machines requires an expert, and even then is very tedious. We propose to build on research software that automatically """"""""reads"""""""" glycan mass spectra, and develop it into a tool that can be used by any research laboratory, thus removing a key bottleneck to the development of therapies using glycans.