Escherichia coli is the single most utilized cell in biology. It is the cornerstone of the biotechnology revolution, and the principal model organism for our understanding of bacterial molecular and cellular physiology. Despite its central importance, there is no single resource providing access to the growing body of E. coli genomics data. RegulonDB is already the primary portal for curated knowledge of E. coli gene regulation. We propose to expand RegulonDB to be a critically needed portal for E. coli genomic data in three key ways: (1) We will curate all available E. coli highthroughput genomic data sets, analyze these data sets in a consistent fashion, and integrate the data into RegulonDB with tools for access, query, visualization, and analysis. The datasets will include the first comprehensive map of the E. coli regulatory network based on ChIP-Seq and RNA-seq currently being generated by two of the PIs. (2) We will extend RegulonDB with comparative genomic data spanning the Enterobacteriaceae. This will include genomic data for representatives of all major Enterobacteriaceae genera. It will also include deep manual curation of the literature and genomic data sets for three key pathogens related to E. coli: Salmonella enterica, Klebsiella pneumoniae, and Yersinia pestis. In addition, we will perform the first comparative ChIP-Seq mapping in a set of conserved TFs in these three species and E. coli. This will provide the first broad view of the evolution of regulation between E. coli and these pathogens based on the comprehensive mapping of binding sites in conserved TFs. It will also provide a framework for generating complete regulatory maps in these and other Enterobacteriaceae species. (3) We will provide an online server for bacterial ChIP-Seq analysis. The server will be specifically tailored to the unique characteristics and pitfalls of ChIP-Seq in bacteria. The server will enable a new generation of microbiologists access to the power of ChIPSeq for the comprehensive mapping of transcription factor-DNA binding.

Public Health Relevance

Escherichia coli is the single most utilized cell in biology; it is the cornerstone of the biotechnology revolution, and the principle model organism for our understanding of bacterial molecular and cellular physiology. Despite its central importance, there is no single resource providing access to the growing body of E. coli genomics data. We propose to expand the RegulonDB database of gene regulation to be a critically needed portal for all E. coli genomics.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM131643-03
Application #
10110021
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2019-06-01
Project End
2023-02-28
Budget Start
2021-03-01
Budget End
2022-02-28
Support Year
3
Fiscal Year
2021
Total Cost
Indirect Cost
Name
Boston University
Department
Engineering (All Types)
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
049435266
City
Boston
State
MA
Country
United States
Zip Code
02215