Genome Database Searching Software for Identifying Proteins Using Mass Spect

Burlingame, Alma

Abstract

The Human Genome Project is rapidly pouring a wealth of DNA sequence data intodatabases at the National Institutes of Health (NIH). Within this vast quantity of data lie the largely not-yet-understood """"""""blueprints"""""""" which the individual cells in an organism use to build the array of proteins that serve as the molecular machines for executing the wide variety of biological processes necessary to sustain life. This ever-growing genome database serves as a fundamental resource in accelerating research using mass spectrometry for identification of proteins. The database is much like having the answers to the odd-numbered problems in the backof the book. The difficulty for scientists then becomes how to pose an odd-numbered question and then decipher the answer. Mass spectrometry (MS) techniques produce two types of information from a single sample in a matter of minutes. The first is peptide mass. A so-called """"""""peptide-mass fingerprint"""""""" is obtained after using an enzyme to digest a target protein into a mixture of smaller pieces called peptides. The molecular masses of each peptide in the mixture are measured with a mass spectrometer. The resulting set of masses constitutes a """"""""fingerprint."""""""" The second is peptide sequence. In a tandem MS experiment, individual peptides in an unseparated mixture can be selectively fragmented. Subsequent measurement of the fragment masses yields data in the form of a""""""""peptide fragment-ion tag"""""""" and allows sequence to be nominally derived from the mass differences between adjacent fragments. Because of the complexity of the data produced from these types of experiments and the tremendous sample throughput potential from automation of MS instruments we can develop software for manipulating the data into a form that allows us to posethe question: """"""""Is the sequence of the protein we have just analyzed in the genome database"""""""". If so we and our collaborators could then begin to evaluate what is already known about the protein and how it might be important in the particular disease being studied. On those increasingly rare occasions when the particular protein sequence is not in the database, our data could be used to initiate gene-cloning efforts. Moreover, even with weak MS spectra from very low quantities of material, the combination of partly ambiguous mass and partial sequence data thatcan be obtained is of high discriminating power. Hence, genome database searches could leadto unambiguous, high confidence protein identifications because only a minuscule fraction of the enormous number of theoretically possible sequences can exist in the limited genome size of aliving organism.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Biotechnology Resource Grants (P41)
Project #: 5P41RR001614-19
Application #: 6308821
Study Section

Project Start: 2000-03-01
Project End: 2002-02-28
Budget Start: 1998-10-01
Budget End: 1999-09-30
Support Year: 19
Fiscal Year: 2000
Total Cost: $9,880
Indirect Cost

Institution

Name: University of California San Francisco
Department
Type
DUNS #: 073133571

City: San Francisco
State: CA
Country: United States
Zip Code: 94143

Related projects

Publications

MacRae, Andrew J; Mayerle, Megan; Hrabeta-Robinson, Eva et al. (2018) Prp8 positioning of U5 snRNA is linked to 5' splice site recognition. RNA 24:769-777

Katsuno, Yoko; Qin, Jian; Oses-Prieto, Juan et al. (2018) Arginine methylation of SMAD7 by PRMT1 in TGF-?-induced epithelial-mesenchymal transition and epithelial stem-cell generation. J Biol Chem 293:13059-13072

Sahoo, Pabitra K; Smith, Deanna S; Perrone-Bizzozero, Nora et al. (2018) Axonal mRNA transport and translation at a glance. J Cell Sci 131:

Tran, Vy M; Wade, Anna; McKinney, Andrew et al. (2017) Heparan Sulfate Glycosaminoglycans in Glioblastoma Promote Tumor Invasion. Mol Cancer Res 15:1623-1633

Liu, Tzu-Yu; Huang, Hector H; Wheeler, Diamond et al. (2017) Time-Resolved Proteomics Extends Ribosome Profiling-Based Measurements of Protein Synthesis Dynamics. Cell Syst 4:636-644.e9

Bikle, Daniel D (2016) Extraskeletal actions of vitamin D. Ann N Y Acad Sci 1376:29-52

Twiss, Jeffery L; Fainzilber, Mike (2016) Neuroproteomics: How Many Angels can be Identified in an Extract from the Head of a Pin? Mol Cell Proteomics 15:341-3

Cil, Onur; Phuan, Puay-Wah; Lee, Sujin et al. (2016) CFTR activator increases intestinal fluid secretion and normalizes stool output in a mouse model of constipation. Cell Mol Gastroenterol Hepatol 2:317-327

Posch, Christian; Sanlorenzo, Martina; Vujic, Igor et al. (2016) Phosphoproteomic Analyses of NRAS(G12) and NRAS(Q61) Mutant Melanocytes Reveal Increased CK2? Kinase Levels in NRAS(Q61) Mutant Cells. J Invest Dermatol 136:2041-2048

Julien, Olivier; Zhuang, Min; Wiita, Arun P et al. (2016) Quantitative MS-based enzymology of caspases reveals distinct protein substrate specificities, hierarchies, and cellular roles. Proc Natl Acad Sci U S A 113:E2001-10

Showing the most recent 10 out of 630 publications

Comments

Be the first to comment on Alma Burlingame's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: