The purpose of this application is to implement an integrated computational environment around a database on transcriptional regulation in E. coli. This database, RegulonDB, contains information gathered from the literature on regulatory elements and operon organization, their location in the genome, and experimental evidence supported by more than 1000 Medline original literature references. The project would transform the database into a useful tool for analysis of transcriptome and proteome experiments.
Aim 1 is to gather data on growth conditions and their associated signal metabolites, and to expand the graphic capabilities of the system.
Aim 2 consists in implementing and coupling the database with tools for genomic regulatory analyses, such as sequence retrieval, pattern discovery and pattern search, as well as a syntactic recognizer to detect multiple potential regulatory elements within an upstream region.
Aim 3 centers on programs which would use as input a set of genes from a transcriptome experiment, and generate graphical or tabular information about their operon organization, upstream regulatory sites, functional classes of genes, and regulators affected. All these tools would integrate a flexible navigation path where the output of one query is the input for another one.
Aim 4 consists in expanding and applying a Bayesian clustering method designed to deal with the heterogeneous type of information of gene regulation and metabolism. E. coli is here the model system, however, this approach and tools can in principle be applied to the study of other microbial organisms, resulting in more efficient ways to make use of currently available massive amounts of knowledge for the purpose of a better understanding of their biology, and potentially of their mechanisms affecting human health.