Proteins are the workhorses of cells, comprising much of the machinery of life. Chemical changes due to co- or post-translational modifications, or amino acid substitutions resulting from genetic variation, can alter protein function and have significant consequences on the functioning of a cell. Pinpointing chemical changes in proteins in an automated manner remains an elusive goal. Mass spectrometry (MS) based methodologies are promising for examining such alterations, since they are exquisitely sensitive to the resulting shifts in mass. There are two main approaches that can be used for examining proteins by MS, one which measures the intact masses of proteins to detect shifts indicative of modifications (called top-down), and the other which enzymatically digests proteins into short peptides, then analyzes their chemical structure by tandem mass spectrometry (called bottom-up). Each of the existing MS methods has limitations, such as lack of complete protein coverage for bottom-up, and the inability to use top-down data to uniquely identify modifications; these drawbacks have motivated the development of hybrid combinations such as """"""""top-down bottom-up"""""""" (TDBU) proteomics. Though these are seeing a surge of interest, there is an acute lack of comprehensive, automated software for combining measurements from the distinct MS approaches; thus, studies to date have relied upon extensive manual analysis and/or ad hoc program scripts, inhibiting progress in the field. ? ? We propose to address this issue using our two existing programs, PROCLAME for analyzing top-down data, and GFS for analyzing bottom-up data, to develop integrated, open-source software that combines data from multiple MS methodologies to pinpoint posttranslational modifications and amino acid substitutions in proteins.
Our aims are: 1) to integrate multiple MS data sources for determining the type and location of modifications on proteins, by adding a Markov chain Monte Carlo (MCMC) based engine to PROCLAME; 2) to improve the ability to analyze bottom-up data by enhancing GFS for the automatic determination of posttranslational modifications; 3) to manage and integrate results from multiple MS measurements and search engines, by developing a database system and scripts to tie the programs together; and 4) to assure program reliability and suitability through both alpha testing in-house and beta testing at external sites. ? ? Health Relevance: Both amino acid substitutions and misregulation of enzymes that modify proteins play roles in human diseases such as Cancer, Diabetes, Sickle Cell Anemia, and many others. This proposal is to build generalized software that can be used by a broad base of researchers to pinpoint the chemical changes/modifications to proteins that perturb regulatory networks in cells to cause disease.NARRATIVE ? ? Both amino acid substitutions and misregulation of enzymes that modify proteins play roles in human diseases such as Cancer, Diabetes, Sickle Cell Anemia, and many others. This proposal is to build generalized software that can be used by a broad base of researchers to pinpoint the chemical changes and modifications to proteins that perturb regulatory networks in cells to cause disease, by integrating data from the latest proteomic technologies. ? ? ? ?