SRI International and a group of collaborators propose to further develop the Escherichia coli EcoCyc database (DB). EcoCyc is and will continue to be freely and openly available, and is accessible to scien- tists through the Internet, as downloadable data ?les, and as a downloadable software application. Scientists from multiple disciplines make wide use of EcoCyc; it has been cited 3,428 times, and from 2015?2017, an average of 99,800 users per year visited the EcoCyc website. It serves as a gen- eral reference source on E. coli for experimental biologists, and is particularly useful for the analysis of functional-genomics experiments. The DB serves computational biologists who are undertaking global studies of E. coli; metabolic engineers who are developing new methods for chemicals production, in- cluding biofuels; and researchers and bioinformaticists who are using EcoCyc as the gold-standard dataset to develop new computational methods, including the prediction of operons, promoters, and protein functional linkages. Educators also use the DB. We will update EcoCyc in an ongoing fashion to re?ect new information about the genes, metabolic pathways, and regulatory interactions of these important model organisms. Information will be inte- grated from the biomedical literature and from large-scale experiments, such as data on gene essential- ity, on nutrients supporting growth, and on protein interactions. We will continue a comprehensive and ongoing effort to re?ne steady-state metabolic network models of these organisms by validating model predictions against many conditions of growth and non-growth for wildtype and knock-out strains. The resulting models will have applications in anti-microbial drug discovery and metabolic engineer- ing, and the model development process will lead to many improvements in the EcoCyc DB. We will launch a new effort to curate the genes and proteins of E. coli strains other than the strain served by EcoCyc. Thousands of E. coli strains have been sequenced; yet in many cases, their genome annotations are of low quality. We will project curated gene annotations from EcoCyc and from other E. coli strains to orthologs of those genes in the BioCyc DBs for hundreds of other E. coli strains, thus signi?cantly improving the annotation quality of other E. coli strains, in a cost-effective fashion. The project will also expand the Pathway Tools software used to query and analyze EcoCyc, such as adding a tool for projecting newly curated gene and protein annotations to orthologs in other E. coli strains.

Public Health Relevance

Escherichia coli is the most thoroughly studied bacterium on earth; therefore, a computer knowledge base that integrates experimental ?ndings for this organism from thousands of scienti?c publications is a valuable and cost-effective resource for science and education. The comprehensive knowledge and computational tools available through EcoCyc accelerate the research of scientists who use this organism to develop biofuels, of scientists who study related pathogenic bacteria, and of scientists who work with the bacteria comprising the human microbiome.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM077678-29
Application #
9972948
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
1992-08-15
Project End
2023-06-30
Budget Start
2020-07-01
Budget End
2021-06-30
Support Year
29
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Sri International
Department
Type
DUNS #
009232752
City
Menlo Park
State
CA
Country
United States
Zip Code
94025