This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Although the working draft of the human genome already contains more than 25% finished sequence, the assignment of genes with their correct exons and exon/intron-boundaries remains as a major challenge. If none or just partial c-DNA sequences are present, gene-prediction can be performed in-silico, e.g., using genscan and/or homology searches. However, overprediction of nonexistent exons is a feature of these algorithms, especially if the parsing criteria are set to detect suboptimal exons so that real exons are less likely to be missed (http://genes.mit.edu/Suboptimal.html). We developed a new exon mapping strategy (using MS2 data generated from the protein gene products) that safely reveals which exons are present in an expressed protein as well as valid exon/intron boundaries. We generate high quality MS/MS data of proteolytic peptides derived from human proteins using ESI- and MALDI-ion trap MS. This MS/MS data is correlated with the available human genome sequence using a newly developed search algorithm called 'Sonar'. A single confident hit for a peptide provides evidence that the identified sequence lies within an exon. An appropriate region of the genome sequence surrounding the hit position is used to generate a rough gene prediction. The predicted exons are alternatively assembled and the different assemblies are searched with all MS/MS data obtained for the same protein. Peptides that do not hit the genomic sequence but bridge predicted exons lead to unambiguous gene annotation of exon boundaries. As an example of such an analysis, we used data from a single LC-MS/MS run of a 130kDa band obtained from the human STAGA complex. Searching the entire publicly available human genome database (as of January 27, 2001) we identified 9 different exons with 3 distinct exon junctions that belong to the TAF2C1 gene (TATA box binding protein (TBP)-associated factor, RNA polymerase II, C1). The data spanned a region of ~100 kbases of genomic sequence. In particular we obtained 7 peptides within exons and 3 exon-bridging peptides. It should be mentioned that the first and the last exon of TAF2C1 (15 exons) are putative and are interrupted by large gaps in the genomic sequence. We will provide a series of other examples and discuss the value of this strategy for annotating the human genome sequence, as well as for other applications such as the definition of alternatively spliced variants of proteins. 'Sonar' provides an effective new scoring algorithm plus a novel way of presenting meaningful MS/MS data, enabling us to screen proteins against genomic databases at high speed ( 600 msec/spectrum). We conclude that mass spectrometry used in this way can be of significant utility for annotating the human genome. A paper describing this work is in preparation.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Biotechnology Resource Grants (P41)
Project #
5P41RR000862-33
Application #
7355049
Study Section
Special Emphasis Panel (ZRG1-BNP (01))
Project Start
2006-03-01
Project End
2007-02-28
Budget Start
2006-03-01
Budget End
2007-02-28
Support Year
33
Fiscal Year
2006
Total Cost
$1,233
Indirect Cost
Name
Rockefeller University
Department
Miscellaneous
Type
Other Domestic Higher Education
DUNS #
071037113
City
New York
State
NY
Country
United States
Zip Code
10065
Manning, Lois R; Popowicz, Anthony M; Padovan, Julio C et al. (2017) Gel filtration of dilute human embryonic hemoglobins reveals basis for their increased oxygen binding. Anal Biochem 519:38-41
Boice, Michael; Salloum, Darin; Mourcin, Frederic et al. (2016) Loss of the HVEM Tumor Suppressor in Lymphoma and Restoration by Modified CAR-T Cells. Cell 167:405-418.e13
Chait, Brian T; Cadene, Martine; Olinares, Paul Dominic et al. (2016) Revealing Higher Order Protein Structure Using Mass Spectrometry. J Am Soc Mass Spectrom 27:952-65
Krutchinsky, Andrew N; Padovan, Júlio C; Cohen, Herbert et al. (2015) Maximizing ion transmission from atmospheric pressure into the vacuum of mass spectrometers with a novel electrospray interface. J Am Soc Mass Spectrom 26:649-58
Mast, Fred D; Rachubinski, Richard A; Aitchison, John D (2015) Signaling dynamics and peroxisomes. Curr Opin Cell Biol 35:131-6
Krutchinsky, Andrew N; Padovan, Júlio C; Cohen, Herbert et al. (2015) Optimizing electrospray interfaces using slowly diverging conical duct (ConDuct) electrodes. J Am Soc Mass Spectrom 26:659-67
Oricchio, Elisa; Papapetrou, Eirini P; Lafaille, Fabien et al. (2014) A cell engineering strategy to enhance the safety of stem cell therapies. Cell Rep 8:1677-1685
Zhong, Yu; Morris, Deanna H; Jin, Lin et al. (2014) Nrbf2 protein suppresses autophagy by modulating Atg14L protein-containing Beclin 1-Vps34 complex architecture and reducing intracellular phosphatidylinositol-3 phosphate levels. J Biol Chem 289:26021-37
Indiani, Chiara; O'Donnell, Mike (2013) A proposal: Source of single strand DNA that elicits the SOS response. Front Biosci (Landmark Ed) 18:312-23
Di Virgilio, Michela; Callen, Elsa; Yamane, Arito et al. (2013) Rif1 prevents resection of DNA breaks and promotes immunoglobulin class switching. Science 339:711-5

Showing the most recent 10 out of 67 publications