A primary focus of genome biology is the study of gene regulation. Complex regulatory programs give rise to diverse tissues and cell types, and allow cells to respond to ever-changing molecular signals in their environments. The development of tissues from stem cells, the metastasis of cancer cells and the response of cells to chemicals in the environment, are all directly tied to gene regulation. Gene regulation is performed in part by nuclear proteins called transcription factors, many of which recognize specific DNA sequences within and surrounding genes. The DNA sequences bound by the transcription factors can be determined by high- throughput lab procedures. In addition to the experimental studies, computational algorithms are improving for the prediction of target sequences bound by these key proteins. Unfortunately, the results from such experimental and computational studies are dispersed in a haphazard manner, often with static annotations to expired versions of genome sequence assemblies. The lack of organized information about DNA binding proteins slows research, decreasing the value of the exceptional research being performed in laboratories around the globe. An organized data repository coupled to software for gene analysis will enhance the success of studies across diverse fields of biomedical research. This proposal outlines the development of the PAZAR open-access repository for regulatory sequence and transcription factor annotation, as well as the development of accompanying computational services for the applied study of regulatory sequences. PAZAR, modeled on an analogy to a shopping mall, allows an individual or research team to share their data through a virtual boutique. These boutiques are managed independently but use a common infrastructure. The underlying data model is intended to enable experts to organize and share the gene regulation data generated in their research studies while keeping it up-to-date with the current reference genome. This proposal outlines a series of extensions to the base system, allowing PAZAR to serve as the central open-source and open-access resource for investigators interested in gene regulation. The key objectives are: 7 Expansion and extension of PAZAR software: Incorporate controlled vocabularies and transcription factor referencing catalogs, improve data submission procedures and expand middle-layer software for the handling of additional data classes 7 Analysis software coupling to PAZAR: Link existing bioinformatics software to PAZAR for dynamic analyses of gene regulation 7 Transcription Factor encyclopedia (TFe): Continue development of the TFe collection of articles about transcription factors, which serves as a key means of recruiting expert laboratories to deposit data into PAZAR

Public Health Relevance

The proposed work will provide the scientific community with an open-access and open-source resource for the dissemination and analysis of DNA regulatory sequence data. This bioinformatics resource is critically important for understanding when and where genes will be active in the body, a key problem in modern health research studies ranging from stem cells to cancer. The PAZAR system captures and organizes data from laboratory research on transcription regulation, facilitating studies of gene regulation and genetic networks.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM084875-03
Application #
8247660
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
2010-04-15
Project End
2014-03-31
Budget Start
2012-04-01
Budget End
2014-03-31
Support Year
3
Fiscal Year
2012
Total Cost
$332,522
Indirect Cost
$24,631
Name
University of British Columbia
Department
Type
DUNS #
251949962
City
Vancouver
State
BC
Country
Canada
Zip Code
V6 1-Z3
Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W (2018) Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 19:202
Shi, Wenqiang; Fornes, Oriol; Mathelier, Anthony et al. (2016) Evaluating the impact of single nucleotide variants on transcription factor binding. Nucleic Acids Res 44:10106-10116
Hickmott, Jack W; Chen, Chih-Yu; Arenillas, David J et al. (2016) PAX6 MiniPromoters drive restricted expression from rAAV in the adult mouse retina. Mol Ther Methods Clin Dev 3:16051
Chen, Chih-Yu; Chang, I-Shou; Hsiung, Chao A et al. (2014) On the identification of potential regulatory variants within genome wide association candidate SNP sets. BMC Med Genomics 7:34
FANTOM Consortium and the RIKEN PMI and CLST (DGT) (see original citation for additional authors) (2014) A promoter-level mammalian expression atlas. Nature 507:462-70
Worsley Hunt, Rebecca; Mathelier, Anthony; Del Peso, Luis et al. (2014) Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment. BMC Genomics 15:472
Yang, Lin; Zhou, Tianyin; Dror, Iris et al. (2014) TFBSshape: a motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res 42:D148-55
Mathelier, Anthony; Zhao, Xiaobei; Zhang, Allen W et al. (2014) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42:D142-7
van Karnebeek, Clara D; Sly, William S; Ross, Colin J et al. (2014) Mitochondrial carbonic anhydrase VA deficiency resulting from CA5A alterations presents with hyperammonemia in early childhood. Am J Hum Genet 94:453-61
Worsley Hunt, Rebecca; Wasserman, Wyeth W (2014) Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets. Genome Biol 15:412

Showing the most recent 10 out of 21 publications