The first human genome sequence was published in 2001, yet as of now, eight years later, major questions remain, such as how many genes are encoded by the genome, and of those genes, how many functional products are encoded due to phenomena like alternative splicing. The Encyclopedia of DNA Elements (ENCODE) project has been coordinated by National Human Genome Research Institute (NHGRI) to answer these questions by comprehensively classifying functional elements on the human genome. The pilot phase of the project studied 1% of the genome in detail, revealing extensive transcription well beyond that predicted by classical gene models. The biological function of a significant portion of the discovered transcripts is unclear. The ENCODE project is now scaling up to examine the whole human genome. It is likely that results will echo the pilot project, revealing extensive transcription, a significant fraction of which has unexplained function. Proteomic technologies can be applied, in a process called proteogenomic mapping, to determine which of the myriad transcripts encode proteins. This approach has been used to reveal new genes, new alternative splice variants, new start sites, and upstream open reading frames (ORFs). While substantive progress has been made in developing proteogenomic mapping technologies, a significant hurdle in using proteogenomics to assist with the ENCODE project is the lack of proteomic data sets that are coordinated with the ENCODE transcription mapping efforts. Here we propose to generate large-scale proteomic data sets directly from the same tier I ENCODE cell lines studied by the transcription efforts, coordinating the results with the transcription mapping efforts to determine which of the pervasive transcripts are translated.
Our specific aims are to: 1) produce large scale proteomic data sets on ENCODE cell lines using the most advanced mass spectrometry methods, 2) use our database technologies to store, manage, and make accessible to the community all results of the project, and 3) use our software pipeline to map the results to the latest human genome drafts, producing a UCSC (University of California Santa Cruz) genome browser track with the results. We believe the result will be a significant advancement in knowledge about our genomes and the functional products they encode.

Public Health Relevance

The human genome is the blueprint for human life and human health, but we do not yet understand its language - the language of genes. The ENCODE project is deciphering that language systematically, and the goal of this proposal is to accelerate that effort by revealing which parts of the blueprint contain instructions to build proteins.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
High Impact Research and Research Infrastructure Programs (RC2)
Project #
5RC2HG005591-02
Application #
7940962
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O1))
Program Officer
Good, Peter J
Project Start
2009-09-26
Project End
2011-06-30
Budget Start
2010-07-01
Budget End
2011-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$800,000
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Microbiology/Immun/Virology
Type
Schools of Medicine
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599
Risk, Brian A; Edwards, Nathan J; Giddings, Morgan C (2013) A peptide-spectrum scoring system based on ion alignment, intensity, and pair probabilities. J Proteome Res 12:4240-7
Risk, Brian A; Spitzer, Wendy J; Giddings, Morgan C (2013) Peppy: proteogenomic search software. J Proteome Res 12:3019-25
Gunawardena, Harsha P; Feltcher, Meghan E; Wrobel, John A et al. (2013) Comparison of the membrane proteome of virulent Mycobacterium tuberculosis and the attenuated Mycobacterium bovis BCG vaccine strain by label-free quantitative proteomics. J Proteome Res 12:5463-74
Khatun, Jainab; Yu, Yanbao; Wrobel, John A et al. (2013) Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions. BMC Genomics 14:141
Djebali, Sarah; Davis, Carrie A; Merkel, Angelika et al. (2012) Landscape of transcription in human cells. Nature 489:101-8
ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57-74