Bioconductor is an ecosystem of more than 1,500 open-source software and data packages for the statistical analysis and comprehension of high-throughput genomic data. It is widely used by the cancer genomics research community for statistical analysis and visualization. This software ecosystem is supported by core data classes and methods, reused by both users and developers, that provide convenient representations and efficient operations for many kinds of high-throughput molecular data. Falling sequencing costs and single-cell assays enable increasingly resolved study of the molecular biology of cancer, through combined assaying of DNA sequence, epigenetics, gene expression, protein, and other aspects, even at the single-cell level, for a single specimen. These developments present new challenges in complexity, size, and interpretability of the data. The overarching goal of this project is to create and adapt core Bioconductor software infrastructure to meet these challenges, through the following aims. First, we develop infrastructure for the analysis of single-cell multi-omic experiments. Second, we implement FAIR principles for improved somatic variant prioritization, by defining performant data architecture that harmonizes and integrates the large amount of experimental and annotation data available through Bioconductor. Users of our system will be able to create provenance-rich interoperable reports on structural and functional contexts of somatic variants for use in prioritization. Third, we develop scalable infrastructure for the curation, distribution, maintenance, discoverability, and usability of cancer data resources within and externally to Bioconductor. Finally, we develop a program of user training and new outreach approaches to support adoption of advanced Bioconductor infrastructure by developers of new cancer-related packages and existing packages critical to the cancer research community.

Public Health Relevance

Researchers collect diverse types of complex genetic information about factors that contribute to cancer. This proposal provides software and data resources to help researchers manage and analyze this information using advanced computational and statistical approaches.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
Application #
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Chen, Huann-Sheng
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Roswell Park Cancer Institute Corp
United States
Zip Code
Ma, Siyuan; Ogino, Shuji; Parsana, Princy et al. (2018) Continuity of transcriptomes among colorectal cancer subtypes based on meta-analysis. Genome Biol 19:142
Chen, Gregory M; Kannan, Lavanya; Geistlinger, Ludwig et al. (2018) Consensus on Molecular Subtypes of High-Grade Serous Ovarian Carcinoma. Clin Cancer Res 24:5037-5047
Waldron, Levi (2018) Data and Statistical Methods To Analyze the Human Microbiome. mSystems 3:
Pasolli, Edoardo; Schiffer, Lucas; Manghi, Paolo et al. (2017) Accessible, curated metagenomic data through ExperimentHub. Nat Methods 14:1023-1024
Quiroz-Zárate, Alejandro; Harshfield, Benjamin J; Hu, Rong et al. (2017) Expression Quantitative Trait loci (QTL) in tumor adjacent normal breast tissue and breast tumor tissue. PLoS One 12:e0170181
Ramos, Marcel; Schiffer, Lucas; Re, Angela et al. (2017) Software for the Integration of Multiomics Experiments in Bioconductor. Cancer Res 77:e39-e42
Myint, Leslie; Kleensang, Andre; Zhao, Liang et al. (2017) Joint Bounding of Peaks Across Samples Improves Differential Analysis in Mass Spectrometry-Based Metabolomics. Anal Chem 89:3517-3523
Fortin, Jean-Philippe; Triche Jr, Timothy J; Hansen, Kasper D (2017) Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi. Bioinformatics 33:558-560
Kannan, Lavanya; Ramos, Marcel; Re, Angela et al. (2016) Public data and open source tools for multi-assay genomic investigation of disease. Brief Bioinform 17:603-15
Spratt, Daniel E; Chan, Tiffany; Waldron, Levi et al. (2016) Racial/Ethnic Disparities in Genomic Sequencing. JAMA Oncol 2:1070-4

Showing the most recent 10 out of 12 publications