The genomic sequences regulating gene expression-transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs)-are poorly represented in current genome annotations, and these features are not currently curated by any major database resources. This is a serious deficiency as CRMs play critical roles with respect to birth defects and chronic diseases as well as normal development, phenotypic variation, and evolution. Knowledge of experimentally validated CRMs and TFBSs is vital for many areas of research including interpretation and validation of data developed by the ENCODE and modENCODE projects, by GWAS studies, and by other large-scale genomics projects. We have developed a unique resource, the REDfly (Regulatory Element Database for Fly) database of Drosophila transcriptional regulatory elements, to help fill this gap in CRM annotation. REDfly includes experimentally verified CRMs and TFBSs along with their DNA sequence, their associated genes, and the expression patterns they direct. Only REDfly integrates all of the available cis-regulatory information for Drosophila from multiple sources to provide a comprehensive collection of regulatory elements. REDfly is the most detailed existing platform for metazoan regulatory element annotation and is widely acknowledged as the premier resource in the regulatory genomics arena. The objective of this proposal is to keep REDfly up to date with the rapidly increasing numbers of CRMs and TFBSs being identified, to increase REDfly's coverage of important regulatory feature types and of the species for which it collects data, and to update the REDfly site to maximize its utility to the research community. The focus is on: (1) Curation. REDfly will continue to be kept up-to-date as new studies are published. (2) User interface. More flexible search, display, and download capabilities will be implemented to ensure that all data can be easily searched, displayed, and downloaded. (3) Expansion to other insects. Data from other insect species, including important disease vectors such as the mosquitoes Anopheles gambiae and Aedes aegypti, are currently limited but rapidly growing. REDfly will create structures to hold data for regulatory sequences from other insect species, which will also allow for easy future expansion to any organism. (4) Tools. REDfly will develop built-in tools for manipulating REDfly-specific data and a series of workflows to facilitate use of the Galaxy platform for integrating REDfly data with other genomic data, e.g. from modENCODE and various model organism databases. The proposed activities are significant as they will bring up-to-date a valuable resource for multiple research communities, increase its utility by adding needed new features and tools, and expand its relevance by adding support for additional organisms. REDfly has already demonstrated its essential and unique value as an important platform for supporting hypothesis-driven empirical and computational research in multiple research areas and will have greatly increased impact through the updates and enhancements proposed here.

Public Health Relevance

The proposed research is relevant to public health because gene regulatory sequences have been linked to birth defects and both chronic and acute diseases. The REDfly database has proven value in contributing to our understanding of regulatory sequences by serving as a significant source of raw data for analysis, hypothesis generation, assessment and validation, and empirical research in a wide spectrum of important research areas spanning molecular and developmental biology, genetics, genomics, and bioinformatics.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM114067-03
Application #
9405574
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Sledjeski, Darren D
Project Start
2016-02-05
Project End
2019-12-31
Budget Start
2018-01-01
Budget End
2018-12-31
Support Year
3
Fiscal Year
2018
Total Cost
Indirect Cost
Name
State University of New York at Buffalo
Department
Biochemistry
Type
Schools of Medicine
DUNS #
038633251
City
Amherst
State
NY
Country
United States
Zip Code
14228
Halfon, Marc S (2018) Studying Transcriptional Enhancers: The Founder Fallacy, Validation Creep, and Other Biases. Trends Genet :
Rivera, John; Keränen, Soile V E; Gallo, Steven M et al. (2018) REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Res :