The effectiveness of shotgun proteomics for samples from cancer samples has been curtailed by two key challenges. First, cancer is often accompanied by deficits in apoptosis, cell division, and DNA repair as well as inflammatory responses;these changes lead to hetergeneity in proteins due to mutation and chemical modification. Peptides that differ from reference sequences in databases are not identified by the standard database search algorithms. Second, the degree of homology among proteins in human sequence databases introduces significant problems when identified peptides are assembled to produce protein identifications;a peptide may be an exact match to dozens of protein sequences, leading to an amplification in the number of proteins reported by researchers. We propose an integrated set of algorithms designed to address these shortcomings. First, we will develop """"""""sequence tagging"""""""" software to infer partial sequences from tandem mass spectra by repurposing research in database search algorithms. Second, we will create algorithms to reconcile partial peptide matches to these spectra in order to identify peptides that vary from reference sequences by mutations and modifications. Third, we will develop a modular framework for assembling these peptide identifications into proteins that will incorporate estimated false positive rates and multiple forms of peptides. The algorithm will apply clustering technologies in the application of parsimony rules to reduce effects of database homology in protein list reporting. These open-source tools will be developed using standard file formats and be supported by code documentation to promote their widespread use. Proteomics can potentially make powerful contributions to clinical diagnosis and research, but the bioinformatics that enable this technology have critical shortcomings that prevent its efficient translation from a research tool to a clinical tool. We propose new systems for improving proteomics'application to clinical samples by improving identification of modified and mutant protein forms and managing sets of related proteins documentation to promote their widespread use.
Showing the most recent 10 out of 33 publications