A computational biology collaboratory of computer scientists and biologists that gathers every 5 months for 3-5 days at the NIH or ANL to work on the development of a new set of integrated tools for the manipulation of genomic information has been organized. Development of these informatics tools and assembly and analysis of the integrated data continues over the Internet between meetings. The initial objective of this research was to establish the minimal criteria necessary to describe genomic map data that may be logically manipulated. The result of this collaborative effort has been the development of several prototype deductive database systems: First, the E. coli chromosome query system that contains information provided by Kenn Rudd at NCBI for aligned DNA sequences, a high resolution physical map, identified structural genes, and an aligned phage map; second, DCRT- integrated collective genetic and DNA sequence data for S. typhimurium; third, an integration of the genome information for S. pombe provided by Hans Lehrach of the Imperial Cancer Research Fund, London, U.K., including a genetic linkage map, and Yeast Artifical Chromosome (YAC) and cosmid- hybridization data. The aligned chromosome information for each of these prototypes may be viewed using a common graphical display program developed at the Argonne National Laboratory (ANL) for the collaboratory. A new technology, the integrated Genome Database developed by our ANL colleagues, allows the integration of the collected genetic and physical data of multiple organisms. We have developed numerous tools to facilitate the rapid integration of genomic data into this system. The common feature of each of these prototype data representation systems is that each system uses the logic programming language Prolog. We can rapidly develop complex queries of the integrated data that take advantage of the complicated inter-relationships inferred. For example, in the E. coli system, finding the longest repeated sequences that are found on the same face of the DNA helix within any gene is a simple prolog query. We have taken advantage of this advanced query capacity to begin the analysis of the global organization of selected genomes. The analysis of the distribution of transcription factor binding sites relative to known promoters and genes has allowed us to begin defining local regulatory grammars for the genetic regulation of metabolic pathways. We are now using these systems to correlate the arrangement of different types of genetic information represented in each chromosome type.

Agency
National Institute of Health (NIH)
Institute
Center for Information Technology (CIT)
Type
Intramural Research (Z01)
Project #
1Z01CT000228-02
Application #
3838539
Study Section
Project Start
Project End
Budget Start
Budget End
Support Year
2
Fiscal Year
1992
Total Cost
Indirect Cost
Name
Center for Information Technology
Department
Type
DUNS #
City
State
Country
United States
Zip Code