Developing Proteogenomic Mapping for Human Genome Annotation

Giddings, Morgan

Abstract

Genome sequencing efforts are producing ever greater quantities of raw DNA sequence, but the annotation process for locating and determining the function of genetic elements has not kept up. While many aspects of annotation are difficult, it is particularly challenging to determine which parts of a genome sequence encode proteins, and therefore how the processes leading to protein translation are regulated. Not only are technologies for examining proteins more limited than those for studying RNA transcription, in an extensive study of transcription by the Encyclopedia of DNA elements consortium, a picture of great complexity emerged. The project uncovered many novel exons, alternative splice forms, and novel regulatory elements. These results indicate that nearly 9/10ths of human genes undergo alternative splicing, and the average gene produces approximately 6 splice variants. Rather than solidify knowledge regarding the location and function of genes, these results question whether we accurately know what constitutes a gene, and how the products encoded by genes determine the function of cells. The results particularly obfuscate determination of which transcripts are selected for translation to protein, further complicating annotation efforts. To address that gap, our project will determine which transcripts encode proteins, and how these are affected in several tissue types and disease conditions. We will use large tandem mass spectrometry-based proteomic data sets, mapping the analyzed protein data directly to several available human genome sequences, along with sets of predicted transcripts produced by the N-SCAN and CONTRAST gene finders, to reveal which parts of transcripts are translated into proteins, and in which types of cells this translation occurs. To accomplish this, our project has three specific aims: 1) to develop high-accuracy methods and software for mapping proteomic data from mass spec analyzed proteins directly to the genome locus encoding them;2) to develop an analysis pipeline software system using a novel rule-based information management approach;and 3) to apply these developments for the high-throughput analysis of large proteomic data sets, identifying the transcripts that encode proteins in distinct tissue types and disease conditions, and placing the results in a publicly accessible track in the UCSC genome browser. We believe this project will yield significant knowledge about the location and timing of protein translation in cells, which will potentiate further investigation of how misregulation of the path from transcription to translation leads to human disease conditions.

Public Health Relevance

Sequencing of the human genome is complete, but figuring out where genes are located, how they function, and how they cause or prevent human diseases like cancer has only just begun. Genes act as blueprints for RNA and proteins, the workhorses of the cell. We are developing technologies to address the key challenges of determining which genes specify the building of which proteins and how this process is orchestrated to ultimately unravel how disease processes occur.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 2R01HG003700-04
Application #: 7583730
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Good, Peter J

Project Start: 2005-09-16
Project End: 2012-03-31
Budget Start: 2009-04-10
Budget End: 2010-03-31
Support Year: 4
Fiscal Year: 2009
Total Cost: $450,000
Indirect Cost

Institution

Name: University of North Carolina Chapel Hill
Department: Microbiology/Immun/Virology
Type: Schools of Medicine
DUNS #: 608195277

City: Chapel Hill
State: NC
Country: United States
Zip Code: 27599

Related projects


NIH 2011 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / Boise State University	$431,250
NIH 2011 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / Boise State University	$797,678
NIH 2010 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$435,435
NIH 2009 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$450,000
NIH 2007 R01 HG	Developing Software For Protein-Based Gene Finding Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$402,029
NIH 2006 R01 HG	Developing Software For Protein-Based Gene Finding Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$402,318
NIH 2005 R01 HG	Developing Software For Protein-Based Gene Finding Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$400,000

Publications

Risk, Brian A; Edwards, Nathan J; Giddings, Morgan C (2013) A peptide-spectrum scoring system based on ion alignment, intensity, and pair probabilities. J Proteome Res 12:4240-7

Risk, Brian A; Spitzer, Wendy J; Giddings, Morgan C (2013) Peppy: proteogenomic search software. J Proteome Res 12:3019-25

Su, Hsun-Cheng; Khatun, Jainab; Kanavy, Dona M et al. (2013) Comparative genome analysis of ciprofloxacin-resistant Pseudomonas aeruginosa reveals genes within newly identified high variability regions associated with drug resistance development. Microb Drug Resist 19:428-36

Khatun, Jainab; Yu, Yanbao; Wrobel, John A et al. (2013) Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions. BMC Genomics 14:141

Djebali, Sarah; Davis, Carrie A; Merkel, Angelika et al. (2012) Landscape of transcription in human cells. Nature 489:101-8

ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57-74

Miller, Jameson; Parker, Miles; Bourret, Robert B et al. (2010) An agent-based model of signal transduction in bacterial chemotaxis. PLoS One 5:e9454

Maier, Christopher W; Long, Jeffrey G; Hemminger, Bradley M et al. (2009) Ultra-Structure database design methodology for managing systems biology data and analyses. BMC Bioinformatics 10:254

Brent, Michael R (2008) Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat Rev Genet 9:62-73

Giddings, Morgan C (2008) On the process of becoming a great scientist. PLoS Comput Biol 4:e33

Showing the most recent 10 out of 15 publications

Comments

Be the first to comment on Morgan Giddings's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: