(Taken from the application abstract): Gene expression databases will take on growing importance as the complexities of development and differentiation, and normal versus diseased states are studied in greater detail. Vast amounts of sequence data are now available for the study of gene expression, along with an anticipated surge in high-throughput data on differential gene expression. Much of the key information remains in the primary literature inaccessible for computational analysis. The goal of the proposed project is to provide in a single integrated system, the information management, analysis, and visualization tools containing these data sources. Such a system requires the representation of gene expression encompassing spatial, temporal and quantitative dimensions; the collection and encoding of information from online resources and the primary literature; the integration of analysis methodologies tailored to the study of gene expression; and the availability of interfaces able to query and visualize the data in human comprehensible form. The prototype system, EpoDB, focuses on erythropoiesis, but will generalize to the study of gene expression along any pathway of differentiation. This research will enhance and extend the existing information management technology through integration of a declarative constraint language into the representation language, development of an integrity constraint system to facilitate synchronization with external databases, and implementation of a query language and optimizer. Schemas and controlled vocabularies will be tailored to represent DNA and chromosomal features relating to gene regulation, temporal events describing expression levels during development and differentiation, and descriptions of gene control processes, pathways and networks. The foundation for EpoDB will be extracted from online resources (GenBank, TRANSFAC, MedLine, etc.), restructured and analyzed to remove errors. Data relevant to gene expression during erythropoiesis will be entered from the literature by trained annotators. Improved versions of data entry editing tools will be developed to improve quality control, ease of annotation, and allow annotation by external users through Web interfaces. Incorporated into EpoDB will also be results of data analysis such as transcriptional regulatory patterns discovered by statistical techniques, by pattern matching techniques, and by classification hierarchies of genes and patterns. EpoDB will be accessible through query interfaces and visualization tools built for the WWW using the evolving bioTk system. Data and the system tools will be distributed on a regular basis.
Showing the most recent 10 out of 16 publications