High Throughput Annotation of Genomic DNA Sequence

Overton, G.

Abstract

High-throughput genomic sequencing efforts must be accompanied by high throughput, cost-effective sequence annotation to fully realize the value of the data. Annotation encompasses the identification and archiving of putative biological signals, sequence characteristics, and features, including genes, and, wherever cost-effective, the further characterization of those features experimentally. While one might hope that annotation could be entirely computational and thus inexpensive and rapid, computational predictions, especially of gene models, must ultimately be confirmed experimentally as an additional and independent validation of the genomic sequence data, and as a means to establish the firm foundation necessary to simplify and accelerate future biological research. The proposed work integrates computational and experimental approaches, creating a test-bed and ultimately a production system for high-throughput, high-information-gain annotation. It is designed as an open system where new computational and experimental components, and new scientific visualization tools, can be easily installed and maintained in the data management and analysis framework. Experimental annotation will be streamlined, targeted versions of standard techniques, including single pass sequencing of cDNAs selected from EST hits of genomic DNA, RT-PCR across inter- and intra-regions of putative, and dot-blots of plasmid DNA used in genomic sequencing against labeled mRNA. The three basic goals of experimental annotation are to 1) establish laboratory protocols, management structures and automation techniques for high-throughput experimental annotation; 2) validate and refine computational annotation, especially for gene model finders such as GRAIL; and 3) extract high-information-gain data, for example, by concentrating single pass cDNA sequencing efforts on ESTs from unknown gene classes, to extend the sequence similarly databases and computational gene finders. In its initial phase, development of the system infrastructure will be tightly coupled to ongoing high-throughput sequencing at the University of Oklahoma with the goal of transitioning the technology for deployment to the genomics research community at large.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG001539-01
Application #: 2026924
Study Section: Genome Study Section (GNM)

Project Start: 1997-02-12
Project End: 2000-01-31
Budget Start: 1997-02-12
Budget End: 1998-01-31
Support Year: 1
Fiscal Year: 1997
Total Cost
Indirect Cost

Institution

Name: University of Pennsylvania
Department: Genetics
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2003 R01 HG	High Throughput Annotation of Genomic DNA Sequence Stoeckert, Christian J. / University of Pennsylvania	$580,163
NIH 2002 R01 HG	High Throughput Annotation of Genomic DNA Sequence Stoeckert, Christian J. / University of Pennsylvania	$566,554
NIH 2001 R01 HG	High Throughput Annotation of Genomic DNA Sequence Stoeckert, Christian J. / University of Pennsylvania	$557,847
NIH 2000 R01 HG	High Throughput Annotation of Genomic DNA Sequence Stoeckert, Christian J. / University of Pennsylvania	$376,401
NIH 1999 R01 HG	High Throughput Annotation of Genomic DNA Sequence Overton, G. / University of Pennsylvania
NIH 1998 R01 HG	High Throughput Annotation of Genomic DNA Sequence Overton, G. / University of Pennsylvania
NIH 1998 R01 HG	High Throughput Annotation of Genomic DNA Sequence Overton, G. / University of Pennsylvania
NIH 1997 R01 HG	High Throughput Annotation of Genomic DNA Sequence Overton, G. / University of Pennsylvania

Publications

Mazzarelli, Joan M; White, Peter; Gorski, Regina et al. (2006) Novel genes identified by manual annotation and microarray expression analysis in the pancreas. Genomics 88:752-61

Schug, Jonathan; Schuller, Winfried-Paul; Kappen, Claudia et al. (2005) Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol 6:R33

Ananko, E A; Podkolodny, N L; Stepanenko, I L et al. (2005) GeneNet in 2005. Nucleic Acids Res 33:D425-7

Jones, Andrew; Hunt, Ela; Wastling, Jonathan M et al. (2004) An object model and database for functional genomics. Bioinformatics 20:1583-90

Manduchi, E; Grant, G R; He, H et al. (2004) RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies. Bioinformatics 20:452-9

Levitsky, Victor G; Katokhin, Alexey V (2003) Recognition of eukaryotic promoters using a genetic algorithm based on iterative discriminant analysis. In Silico Biol 3:81-7

Grant, G R; Manduchi, E; Pizarro, A et al. (2003) Maintaining data integrity in microarray data management. Biotechnol Bioeng 84:795-800

Schug, Jonathan; Diskin, Sharon; Mazzarelli, Joan et al. (2002) Predicting gene ontology functions from ProDom and CDD protein domains. Genome Res 12:648-55

Crabtree, J; Wiltshire, T; Brunk, B et al. (2001) High-resolution BAC-based map of the central portion of mouse chromosome 5. Genome Res 11:1746-57

Bailey Jr, L C; Searls, D B; Overton, G C (1998) Analysis of EST-driven gene annotation in human genomic sequence. Genome Res 8:362-76

Showing the most recent 10 out of 13 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: