Developing Software For Protein-Based Gene Finding

Giddings, Morgan

Abstract

The first human genome sequencing efforts are complete, which has opened the door to many new and challenging questions. Among these are the quantity and location of genes in the genome, both of which have proven surprisingly difficult to pinpoint. Even further from a definitive answer is the question of how many distinct functional RNA and protein products are produced by each gene through mechanisms such as alternative splicing. These unanswered questions impede a full understanding of the genome and how it functions in relation to human disease. We are proposing innovative software technology that has the potential to help overcome this obstacle, using mass spectrometry measurements of proteins to reveal the location and structure of the genes encoding those proteins within the genome. This technology can be applied to help answer several critical questions. For example, where are all the genes located in the genome? What are their exon-intron structures? How many distinct products do they encode? ? ? We propose to modify and combine the already proven software programs TWINSCAN and GFS, that were developed by our labs for genomic and proteomic purposes, respectively, to address these new challenges in genome analysis. TWINSCAN is a highly accurate, automated gene finder, and GFS is a proteomics tool that matches mass spectrometry (MS) peptide data from enzymatically digested proteins direcdy to raw (even unfinished) genome sequence, identifying the coding loci for the proteins. Here, we propose a two-pronged approach to produce a novel, protein-based method for finding genes and determining their structure.
Our aims comprise the following: a) extending GFS for automated use with multi-exon genes und very large genomes, to facilitate discovery of novel genes and gene structures; b) modifying TWINSCAN to use peptide data from GFS to enhance its rapid, automated gene finding capabilities; c) combining the two programs into an automated protein-based gene finder, and d) validating the approach for gene-finding using synthetic and experimental data sets. ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG003700-01
Application #: 6959142
Study Section: Special Emphasis Panel (ZRG1-BDMA (01))
Program Officer: Good, Peter J

Project Start: 2005-09-16
Project End: 2008-06-30
Budget Start: 2005-09-16
Budget End: 2006-06-30
Support Year: 1
Fiscal Year: 2005
Total Cost: $400,000
Indirect Cost

Institution

Name: University of North Carolina Chapel Hill
Department: Microbiology/Immun/Virology
Type: Schools of Medicine
DUNS #: 608195277

City: Chapel Hill
State: NC
Country: United States
Zip Code: 27599

Related projects


NIH 2011 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / Boise State University	$431,250
NIH 2011 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / Boise State University	$797,678
NIH 2010 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$435,435
NIH 2009 R01 HG	Developing Proteogenomic Mapping for Human Genome Annotation Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$450,000
NIH 2007 R01 HG	Developing Software For Protein-Based Gene Finding Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$402,029
NIH 2006 R01 HG	Developing Software For Protein-Based Gene Finding Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$402,318
NIH 2005 R01 HG	Developing Software For Protein-Based Gene Finding Giddings, Morgan Corinne / University of North Carolina Chapel Hill	$400,000

Publications

Khatun, Jainab; Yu, Yanbao; Wrobel, John A et al. (2013) Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions. BMC Genomics 14:141

Risk, Brian A; Edwards, Nathan J; Giddings, Morgan C (2013) A peptide-spectrum scoring system based on ion alignment, intensity, and pair probabilities. J Proteome Res 12:4240-7

Risk, Brian A; Spitzer, Wendy J; Giddings, Morgan C (2013) Peppy: proteogenomic search software. J Proteome Res 12:3019-25

Su, Hsun-Cheng; Khatun, Jainab; Kanavy, Dona M et al. (2013) Comparative genome analysis of ciprofloxacin-resistant Pseudomonas aeruginosa reveals genes within newly identified high variability regions associated with drug resistance development. Microb Drug Resist 19:428-36

Djebali, Sarah; Davis, Carrie A; Merkel, Angelika et al. (2012) Landscape of transcription in human cells. Nature 489:101-8

ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57-74

Miller, Jameson; Parker, Miles; Bourret, Robert B et al. (2010) An agent-based model of signal transduction in bacterial chemotaxis. PLoS One 5:e9454

Maier, Christopher W; Long, Jeffrey G; Hemminger, Bradley M et al. (2009) Ultra-Structure database design methodology for managing systems biology data and analyses. BMC Bioinformatics 10:254

Brent, Michael R (2008) Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat Rev Genet 9:62-73

Giddings, Morgan C (2008) On the process of becoming a great scientist. PLoS Comput Biol 4:e33

Showing the most recent 10 out of 15 publications

Comments

Be the first to comment on Morgan Giddings's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: