Knowledge Based Biological Modeling Information System

Overton, G.

Abstract

(Taken from the application abstract): Gene expression databases will take on growing importance as the complexities of development and differentiation, and normal versus diseased states are studied in greater detail. Vast amounts of sequence data are now available for the study of gene expression, along with an anticipated surge in high-throughput data on differential gene expression. Much of the key information remains in the primary literature inaccessible for computational analysis. The goal of the proposed project is to provide in a single integrated system, the information management, analysis, and visualization tools containing these data sources. Such a system requires the representation of gene expression encompassing spatial, temporal and quantitative dimensions; the collection and encoding of information from online resources and the primary literature; the integration of analysis methodologies tailored to the study of gene expression; and the availability of interfaces able to query and visualize the data in human comprehensible form. The prototype system, EpoDB, focuses on erythropoiesis, but will generalize to the study of gene expression along any pathway of differentiation. This research will enhance and extend the existing information management technology through integration of a declarative constraint language into the representation language, development of an integrity constraint system to facilitate synchronization with external databases, and implementation of a query language and optimizer. Schemas and controlled vocabularies will be tailored to represent DNA and chromosomal features relating to gene regulation, temporal events describing expression levels during development and differentiation, and descriptions of gene control processes, pathways and networks. The foundation for EpoDB will be extracted from online resources (GenBank, TRANSFAC, MedLine, etc.), restructured and analyzed to remove errors. Data relevant to gene expression during erythropoiesis will be entered from the literature by trained annotators. Improved versions of data entry editing tools will be developed to improve quality control, ease of annotation, and allow annotation by external users through Web interfaces. Incorporated into EpoDB will also be results of data analysis such as transcriptional regulatory patterns discovered by statistical techniques, by pattern matching techniques, and by classification hierarchies of genes and patterns. EpoDB will be accessible through query interfaces and visualization tools built for the WWW using the evolving bioTk system. Data and the system tools will be distributed on a regular basis.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Research Project (R01)
Project #: 5R01RR004026-09
Application #: 2797084
Study Section: Biomedical Library and Informatics Review Committee (BLR)

Project Start: 1991-09-30
Project End: 2000-09-30
Budget Start: 1998-09-30
Budget End: 2000-09-30
Support Year: 9
Fiscal Year: 1998
Total Cost
Indirect Cost

Institution

Name: University of Pennsylvania
Department: Genetics
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 1998 R01 RR	Knowledge Based Biological Modeling Information System Overton, G. / University of Pennsylvania
NIH 1997 R01 RR	Knowledge Based Biological Modeling Information System Overton, G. / University of Pennsylvania
NIH 1996 R01 RR	Knowledge Based Biological Modeling Information System Overton, G. / University of Pennsylvania
NIH 1992 R01 RR	Knowledge Based Biological Modeling Information System Overton, G. / University of Pennsylvania
NIH 1991 R01 RR	Knowledge Based Biological Modeling Information System Overton, G. / University of Pennsylvania
NIH 1990 R01 RR	Knowledge Based Biological Modeling Information System Overton, G. / Unisys

Publications

Stoeckert, C; Pizarro, A; Manduchi, E et al. (2001) A relational schema for both array-based and SAGE gene expression experiments. Bioinformatics 17:300-8

Manduchi, E; Grant, G R; McKenzie, S E et al. (2000) Generation of patterns from gene expression data by assigning confidence to differentially expressed genes. Bioinformatics 16:685-98

Kolchanov, N A; Podkolodnaya, O A; Ananko, E A et al. (2000) Transcription regulatory regions database (TRRD): its status in 2000. Nucleic Acids Res 28:298-301

Phillips, R L; Ernst, R E; Brunk, B et al. (2000) The genetic program of hematopoietic stem cells. Science 288:1635-40

Kolchanov, N A; Ponomarenko, M P; Frolov, A S et al. (1999) Integrated databases and computer systems for studying eukaryotic gene expression. Bioinformatics 15:669-86

Babenko, V N; Kosarev, P S; Vishnevsky, O V et al. (1999) Investigating extended regulatory regions of genomic DNA sequences. Bioinformatics 15:644-53

Stoeckert Jr, C J; Salas, F; Brunk, B et al. (1999) EpoDB: a prototype database for the analysis of genes expressed during vertebrate erythropoiesis. Nucleic Acids Res 27:200-3

Ponomarenko, M P; Ponomarenko, J V; Frolov, A S et al. (1999) Oligonucleotide frequency matrices addressed to recognizing functional DNA sites. Bioinformatics 15:631-43

Ponomarenko, M P; Ponomarenko, J V; Frolov, A S et al. (1999) Identification of sequence-dependent DNA features correlating to activity of DNA sites interacting with proteins. Bioinformatics 15:687-703

Ponomarenko, J V; Ponomarenko, M P; Frolov, A S et al. (1999) Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics 15:654-68

Showing the most recent 10 out of 16 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: