This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. Primary support for the subproject and the subproject's principal investigator may have been provided by other sources, including other NIH sources. The Total Cost listed for the subproject likely represents the estimated amount of Center infrastructure utilized by the subproject, not direct funding provided by the NCRR grant to the subproject or subproject staff. With a multitude of different MS instrumentation and data analysis software platforms available for MS and proteomics, it becomes difficult to manipulate and manage various data sets. We are creating new software to process protein and peptide MS and MS/MS data, locate and assign post-translational modifications (PTMs) and to help place the results into biological context. The BUPID program was developed in C under Linux and made accessible to the main program through a CGI based web interface. The shell data conversion program was written to implement a user friendly GUI interface which may be operated in an unattended batch processing mode. Testing of the program was performed on existing MALDI-TOF MS, MALDI-FT MS and LC MS/MS data sets obtained in house. The program allowed the conversion of large volumes of data obtained on different instruments to the formats of several commercially and publicly available search engines. Files were then submitted for protein identification to the search engines with the search settings specified by the user. Our implementation of the mzXML format introduced by the Institute for Systems Biology afforded the benefits of a common data format for summation of results obtained on different MS platforms, comparative analysis of MS methodology and archiving of data. We added capabilities for interpretation of top-down tandem mass spectra of proteins (BUPID-top down) and for linking of database assignments to functionality of proteins (STRAP). These results were presented as posters at ASMS and other scientific meetings;the STRAP manuscript was published in early 2010. A further dedition (STRAP-PTM) is now being developed to assign and map PTMs. The search algorithm Boston University Protein Identifier (BUPID) provides a robust and accurate statistical model for protein identification using MS data. The algorithm offers a number of important features: 1. Using log-likelihood ratio as scoring function, the algorithm can best distinguish correctly assigned peptides from incorrect assignments. 2. Matching peaks with a background-dependent threshold offers more flexibility and accuracy than the traditional mass window. 3. The statistical model provides similar or better results with comparison to conventional database search engines. We use log-likelihood ratio to calculate the probability that a protein is present in the sample. The model distinguishes two hypotheses (1) H0: That a set of peaks in the spectrum is generated by the random background;and (2) HA: That the same set of peaks is generated by peptides corresponding to a specific protein. A peak is included in the set if the probability that it is produced by the protein is more significant than that it is otherwise produced by the random background. Final results are ranked by the E-value of their probability score using the sequence information of the protein. We have compared the performance of the BUPID server and several other public web-based database search engines. Recent efforts have focused on interpretation of top-down tandem MS data from the LTQ-Orbitrap MS and ICR-FTMS instruments. A manuscript that includes the use of BUPID Top-down was recently published in the Int. J. Mass Spectrom. A paper describing STRAP was also published and a manuscript on STRAP-PTM is now under revision.
Showing the most recent 10 out of 253 publications