(Taken from the application abstract): Gene expression databases will take on growing importance as the complexities of development and differentiation, and normal versus diseased states are studied in greater detail. Vast amounts of sequence data are now available for the study of gene expression, along with an anticipated surge in high-throughput data on differential gene expression. Much of the key information remains in the primary literature inaccessible for computational analysis. The goal of the proposed project is to provide in a single integrated system, the information management, analysis, and visualization tools containing these data sources. Such a system requires the representation of gene expression encompassing spatial, temporal and quantitative dimensions; the collection and encoding of information from online resources and the primary literature; the integration of analysis methodologies tailored to the study of gene expression; and the availability of interfaces able to query and visualize the data in human comprehensible form. The prototype system, EpoDB, focuses on erythropoiesis, but will generalize to the study of gene expression along any pathway of differentiation. This research will enhance and extend the existing information management technology through integration of a declarative constraint language into the representation language, development of an integrity constraint system to facilitate synchronization with external databases, and implementation of a query language and optimizer. Schemas and controlled vocabularies will be tailored to represent DNA and chromosomal features relating to gene regulation, temporal events describing expression levels during development and differentiation, and descriptions of gene control processes, pathways and networks. The foundation for EpoDB will be extracted from online resources (GenBank, TRANSFAC, MedLine, etc.), restructured and analyzed to remove errors. Data relevant to gene expression during erythropoiesis will be entered from the literature by trained annotators. Improved versions of data entry editing tools will be developed to improve quality control, ease of annotation, and allow annotation by external users through Web interfaces. Incorporated into EpoDB will also be results of data analysis such as transcriptional regulatory patterns discovered by statistical techniques, by pattern matching techniques, and by classification hierarchies of genes and patterns. EpoDB will be accessible through query interfaces and visualization tools built for the WWW using the evolving bioTk system. Data and the system tools will be distributed on a regular basis.

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Research Project (R01)
Project #
5R01RR004026-09
Application #
2797084
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Project Start
1991-09-30
Project End
2000-09-30
Budget Start
1998-09-30
Budget End
2000-09-30
Support Year
9
Fiscal Year
1998
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Genetics
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Stoeckert, C; Pizarro, A; Manduchi, E et al. (2001) A relational schema for both array-based and SAGE gene expression experiments. Bioinformatics 17:300-8
Manduchi, E; Grant, G R; McKenzie, S E et al. (2000) Generation of patterns from gene expression data by assigning confidence to differentially expressed genes. Bioinformatics 16:685-98
Kolchanov, N A; Podkolodnaya, O A; Ananko, E A et al. (2000) Transcription regulatory regions database (TRRD): its status in 2000. Nucleic Acids Res 28:298-301
Phillips, R L; Ernst, R E; Brunk, B et al. (2000) The genetic program of hematopoietic stem cells. Science 288:1635-40
Kolchanov, N A; Ponomarenko, M P; Frolov, A S et al. (1999) Integrated databases and computer systems for studying eukaryotic gene expression. Bioinformatics 15:669-86
Babenko, V N; Kosarev, P S; Vishnevsky, O V et al. (1999) Investigating extended regulatory regions of genomic DNA sequences. Bioinformatics 15:644-53
Stoeckert Jr, C J; Salas, F; Brunk, B et al. (1999) EpoDB: a prototype database for the analysis of genes expressed during vertebrate erythropoiesis. Nucleic Acids Res 27:200-3
Ponomarenko, M P; Ponomarenko, J V; Frolov, A S et al. (1999) Oligonucleotide frequency matrices addressed to recognizing functional DNA sites. Bioinformatics 15:631-43
Ponomarenko, M P; Ponomarenko, J V; Frolov, A S et al. (1999) Identification of sequence-dependent DNA features correlating to activity of DNA sites interacting with proteins. Bioinformatics 15:687-703
Ponomarenko, J V; Ponomarenko, M P; Frolov, A S et al. (1999) Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics 15:654-68

Showing the most recent 10 out of 16 publications