Although heavily concentrated and studied, the statistical accuracy of peptide/protein identification remains challenging. There are many peptide identification methods using database searches and assigning the E-value to peptide hits, however, the E-values reported by different methods do not agree with each other and few of them, if any, agree with the textbook definition of the E-value. This obviously hinders the feasibility of combining search results from different methods. In particular, if one wishes to combine methods with user-assigned weights. When prior knowledge is available, it is often desirable to weight search methods differently before combining their search results. We have provided a way to combine search results democratically in one of our 2008 publications. When different weights are present, an instability issue occurs if some of the weights are nearly degenerate. In 2011, we have devised a mathematical framework to completely eliminate the possible instability. We have recently expanded this idea to incorporate the case when the methods being considered are correlated. In our first step towards this direction, we assessed the accuracy of combined P-values upon using various published methods. The results are recently published in PLoS One. The past year we also finished the project of constructing a fast and accurate algorithm for computing the molecular mass isotopic distribution given the elemental compositions. In comparison to the existing algorithms/methods, our method provides both coarse and fine resolution of the spectrum as opposed to only coarse or fine spectrum provided. In addition, under several testing criteria, our method seems to be among the best performing ones. We have made a web service available in our group website www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html The method part is also written as a paper recently published in the Journal of American Society of Mass Spectrometry. Since most protein association (complex) data are obtained from pull down experiment analyzed using mass spectrometers to identify co-pull-down partner proteins, our efforts in proteomics also include investigation of protein-protein interaction network as well as association data organization. Our investigation on these problems has led to development of a few tools that are suitable for network exploration, hypotheses forming and a new method of organizing protein association to a directed acyclic graph which alleviates the effect of non-specific bindings and false positive associations as well as false negatives. The first set of results were written and recently published in PLoS One. For the past year, one of our major efforts is to understand physical mechanisms for peptide fragmentation so as to predict the possibility of observing a certain peak. If this works, it can significantly improve peptide scoring by providing a peptide-specific peak filtering. The results inidicate that the dissociation energy can be a good indicator for the observability of certain peaks, at least for singly charged short peptide with non-polar side-chains. Since most peptides retain more than one positive charges in MS experiments, we are currently expanding our investigation to peptides with two positive charges included. We have also worked on a protein identification method that combines weighted P-values of evidence peptides. This new method aims to solve the long-standing problem of precise type-I error control in protein identification. The results we have obtained are very encouraging, indicating the possibility of a precise control of type-I error.
Showing the most recent 10 out of 26 publications