Genome-wide studies have demonstrated that trans-acting factors, including transcription factors, chromatin regulators and other chromatin-associated factors, are frequently mutated in cancer, reaffirming that aberrant gene regulation is a key mechanism in oncogenesis. The way in which these trans-acting factors regulate transcription on a genome-wide basis is poorly understood, motiving ever increasing number of ChIP-seq and DNase-seq experiments to map genome-wide transcription factor binding (cistrome) and chromatin status (epigenome). Novel and significant biological insights have been gained through the analysis of ChIP-seq and DNase-seq data integrated with other published ChIP-seq and DNase-seq data sets as well as expression profiles. Most cancer biologists, however, find computational data analysis and integration of cistrome and epigenome data to be the major bottleneck of such studies due to the lack of informatics expertise and infrastructure. The objective of this proposal is to develop the informatics technologies to improve the acquisition, analysis, integration and reuse of ChIP-seq and DNase-seq data so as to allow experimental cancer biologists to model transcriptional and epigenetic gene regulation in cancer research. Specifically, we propose to develop informatics technologies to address three critical aspects of epigenome and cistrome data analysis. First, we will implement software to automate data collection, processing and quality control, enabling diverse types of unpublished and public ChIP-seq and DNase-seq data to be analyzed and converted into statistics and formats that can be readily used for integrative analysis. Second, we will develop systems to allow gene expression data to be interpreted with cistrome and epigenome data in order to elucidate regulatory mechanisms. Third, we will develop tools to quickly and accurately identify informative public datasets and to infer combinatorial rules of regulation and interactions. Finally, we will develop the infrastructure and interface to host the algorithms and tools developed in the first three aims, and provide the experimental cancer biologists with a flexible and intuitive user experience. We will design our software to interact easily with complementary software systems and databases. The software developed in this proposal will be freely available open-source, and we will work with our collaborators and users to improve its functions and user interface.

Public Health Relevance

Decades of research have shown that cancer is essentially a disease of aberrant gene regulation. Although there are powerful new genomic technologies to study gene regulation, the resulting high throughput data creates significant computational challenges for experimental cancer biologists. This project will develop comprehensive informatics technologies, including the algorithms, database, and computing infrastructure, to model gene regulation in mammalian systems. The technologies we propose will allow cancer biologists to conduct exploratory and integrated analyses, search and reuse other relevant public data, interpret results and generate hypotheses on the mechanism of gene regulation in different cancer systems without programming expertise or informatics resources. The research team has an excellent track record in both computational algorithm development and innovative cancer research, so the proposal is expected to generate a valuable resource to accelerate many cancer gene regulation studies.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project--Cooperative Agreements (U01)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Dana-Farber Cancer Institute
United States
Zip Code
Xiao, Tengfei; Li, Wei; Wang, Xiaoqing et al. (2018) Estrogen-regulated feedback loop limits the efficacy of estrogen receptor-targeted breast cancer therapy. Proc Natl Acad Sci U S A 115:7869-7878
Jiang, Peng; Lee, Winston; Li, Xujuan et al. (2018) Genome-Scale Signatures of Gene Interaction from Compound Screens Predict Clinical Efficacy of Targeted Cancer Therapies. Cell Syst 6:343-354.e5
Li, Bo; Li, Taiwen; Wang, Binbin et al. (2017) Ultrasensitive detection of TCR hypervariable-region sequences in solid-tissue RNA-seq data. Nat Genet 49:482-483
Mei, Shenglin; Meyer, Clifford A; Zheng, Rongbin et al. (2017) Cistrome Cancer: A Web Resource for Integrative Gene Regulation Modeling in Cancer. Cancer Res 77:e19-e22
Li, Bo; Liu, Jun S; Liu, X Shirley (2017) Revisit linear regression-based deconvolution methods for tumor gene expression data. Genome Biol 18:127
Huang, Tianhao; Zhang, Peng; Li, Wang et al. (2017) G9A promotes tumor cell growth and invasion by silencing CASP1 in non-small-cell lung cancer cells. Cell Death Dis 8:e2726
Liu, X Shirley; Mardis, Elaine R (2017) Applications of Immunogenomics to Cancer. Cell 168:600-612
Li, Taiwen; Fan, Jingyu; Wang, Binbin et al. (2017) TIMER: A Web Server for Comprehensive Analysis of Tumor-Infiltrating Immune Cells. Cancer Res 77:e108-e110
Li, Bo; Li, Taiwen; Pignon, Jean-Christophe et al. (2016) Landscape of tumor-infiltrating T cell repertoire of human cancers. Nat Genet 48:725-32
Du, Zhou; Sun, Tong; Hacisuleyman, Ezgi et al. (2016) Integrative analyses reveal a long noncoding RNA-mediated sponge regulatory network in prostate cancer. Nat Commun 7:10982

Showing the most recent 10 out of 22 publications