A primary focus of genome biology is the study of gene regulation. Complex regulatory programs give rise to diverse tissues and cell types, and allow cells to respond to ever-changing molecular signals in their environments. The development of tissues from stem cells, the metastasis of cancer cells and the response of cells to chemicals in the environment, are all directly tied to gene regulation. Gene regulation is performed in part by nuclear proteins called transcription factors, many of which recognize specific DNA sequences within and surrounding genes. The DNA sequences bound by the transcription factors can be determined by high- throughput lab procedures. In addition to the experimental studies, computational algorithms are improving for the prediction of target sequences bound by these key proteins. Unfortunately, the results from such experimental and computational studies are dispersed in a haphazard manner, often with static annotations to expired versions of genome sequence assemblies. The lack of organized information about DNA binding proteins slows research, decreasing the value of the exceptional research being performed in laboratories around the globe. An organized data repository coupled to software for gene analysis will enhance the success of studies across diverse fields of biomedical research. This proposal outlines the development of the PAZAR open-access repository for regulatory sequence and transcription factor annotation, as well as the development of accompanying computational services for the applied study of regulatory sequences. PAZAR, modeled on an analogy to a shopping mall, allows an individual or research team to share their data through a virtual boutique. These boutiques are managed independently but use a common infrastructure. The underlying data model is intended to enable experts to organize and share the gene regulation data generated in their research studies while keeping it up-to-date with the current reference genome. This proposal outlines a series of extensions to the base system, allowing PAZAR to serve as the central open-source and open-access resource for investigators interested in gene regulation. The key objectives are: 7 Expansion and extension of PAZAR software: Incorporate controlled vocabularies and transcription factor referencing catalogs, improve data submission procedures and expand middle-layer software for the handling of additional data classes 7 Analysis software coupling to PAZAR: Link existing bioinformatics software to PAZAR for dynamic analyses of gene regulation 7 Transcription Factor encyclopedia (TFe): Continue development of the TFe collection of articles about transcription factors, which serves as a key means of recruiting expert laboratories to deposit data into PAZAR
The proposed work will provide the scientific community with an open-access and open-source resource for the dissemination and analysis of DNA regulatory sequence data. This bioinformatics resource is critically important for understanding when and where genes will be active in the body, a key problem in modern health research studies ranging from stem cells to cancer. The PAZAR system captures and organizes data from laboratory research on transcription regulation, facilitating studies of gene regulation and genetic networks.
Showing the most recent 10 out of 21 publications