Robust Accurate Identification of peptides from tandem mass spectrometry data

Yu, Yi-Kuo

Abstract

Although heavily concentrated and studied, the statistical accuracy of peptide/protein identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and few of them, if any, agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our earlier publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In 2011, we have devised a mathematical framework to completely eliminate the possible instability. In 2015, we desinged a protein identification method that combines weighted P-values of evidence peptides. This new method solves the long-standing problem of precise type-I error control in protein identification. In addition, it also reports correctly the proportion of false discoveries, indication of accurate type-II error control. In the past year, we work on designing a new peptide significance assignment method based on the extreme value statistics. The motivation of this work is to provide accurate peptide identification confidence for methods that use scoring functions that cannot be expressed as a sum of independent contributions. This new method provides a generally applicable confidence assignment for any generic scoring function whose score distribution fall in the basin of attraction of the extreme value distributions. The results we have obtained are very encouraging and are published in Bioinformatics this year. We have made the protein identification function as well as the extreme value based peptide statistics available in our RAId web service in our group website www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid_dbs/index.html Last year we finished the first phase of a large collaborative project, involving scientists in NHLBI and Clinical Center, in pathogen identifications using mass spectrometry. The fundamental idea is to use each pathogen's peptidome to represent that pathogen. Through the use of mass spectrometry analysis, if the statistical significance assignment is accurate, one will be able to correctly rank the species/genus according to their peptidome simiarilty compared with the peptides identified. Again, we have to weight the evidence peptides associated with a given species/genus as one peptide often maps to multiple species/genus. Our results are recently published this year in the Journal of American Society of Mass Spectrometry. This year, we expand the pathogen project to the more challenging phase II: simultaneous identifications of multiple pathogens. Our preliminary results arevery encouraging and we are in the process of compiling the results for our next publication along this direction. Since different diseases might share similar cause while similar diseases may actually come from different origins, we believe it is important to characterize disease relationship from a different perspective, the perspective of protein-protein interaction network. With this in mind, we have downloaded all diseases along with their associated genes stored in the Comparative Toxicogenomics Database (CTD) and analyze the disease-disease relations based on the similarity between their protein weight vectors, each obtained by using the disease genes as the sources and sinks in the interaction network containing proteins with documented interactions. We recently surveyed all such mechanism based disease similarity work and have written a mini review article in this direction. Our own results were pulbished earlier in PLoS One; and the web service DeCoaD, allowing users to look for similar diseases to the input based on interaction networks, at www.ncbi.nlm.nih.gov/CBBresearch/Yu/mn/DeCoaD/index.html along with the implementation are recently published in BMC Research Note.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Investigator-Initiated Intramural Research Projects (ZIA)
Project #: 1ZIALM092404-13
Application #: 9362447
Study Section

Project Start
Project End
Budget Start
Budget End
Support Year: 13
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: National Library of Medicine
Department
Type
DUNS #

City
State
Country
Zip Code

Related projects

Publications

Joyce, Brendan; Lee, Danny; Rubio, Alex et al. (2018) A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics. BMC Res Notes 11:182

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2018) Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry. J Am Soc Mass Spectrom 29:1721-1737

Alves, Gelio; Yu, Yi-Kuo (2016) Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution. Bioinformatics 32:2642-9

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y et al. (2016) Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance. J Am Soc Mass Spectrom 27:194-210

Hamaneh, Mehdi B; Yu, Yi-Kuo (2015) DeCoaD: determining correlations among diseases using protein interaction networks. BMC Res Notes 8:226

Hamaneh, Mehdi Bagheri; Haber, Jonah; Yu, Yi-Kuo (2015) Analytical solution and scaling of fluctuations in complex networks traversed by damped, interacting random walkers. Phys Rev E Stat Nonlin Soft Matter Phys 92:052803

Alves, Gelio; Yu, Yi-Kuo (2015) Mass spectrometry-based protein identification with accurate statistical significance assignment. Bioinformatics 31:699-706

Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo (2014) Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy. J Am Soc Mass Spectrom 25:57-70

Hamaneh, Mehdi Bagheri; Yu, Yi-Kuo (2014) Relating diseases by integrating gene associations and information flow through protein interaction network. PLoS One 9:e110936

Alves, Gelio; Yu, Yi-Kuo (2014) Accuracy evaluation of the unified P-value from combining correlated P-values. PLoS One 9:e91225

Showing the most recent 10 out of 26 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: