Bioconductor is an ecosystem of more than 1,500 open-source software and data packages for the statistical analysis and comprehension of high-throughput genomic data. It is widely used by the cancer genomics research community for statistical analysis and visualization. This software ecosystem is supported by core data classes and methods, reused by both users and developers, that provide convenient representations and efficient operations for many kinds of high-throughput molecular data. Falling sequencing costs and single-cell assays enable increasingly resolved study of the molecular biology of cancer, through combined assaying of DNA sequence, epigenetics, gene expression, protein, and other aspects, even at the single-cell level, for a single specimen. These developments present new challenges in complexity, size, and interpretability of the data. The overarching goal of this project is to create and adapt core Bioconductor software infrastructure to meet these challenges, through the following aims. First, we develop infrastructure for the analysis of single-cell multi-omic experiments. Second, we implement FAIR principles for improved somatic variant prioritization, by defining performant data architecture that harmonizes and integrates the large amount of experimental and annotation data available through Bioconductor. Users of our system will be able to create provenance-rich interoperable reports on structural and functional contexts of somatic variants for use in prioritization. Third, we develop scalable infrastructure for the curation, distribution, maintenance, discoverability, and usability of cancer data resources within and externally to Bioconductor. Finally, we develop a program of user training and new outreach approaches to support adoption of advanced Bioconductor infrastructure by developers of new cancer-related packages and existing packages critical to the cancer research community.
Researchers collect diverse types of complex genetic information about factors that contribute to cancer. This proposal provides software and data resources to help researchers manage and analyze this information using advanced computational and statistical approaches.
|Ma, Siyuan; Ogino, Shuji; Parsana, Princy et al. (2018) Continuity of transcriptomes among colorectal cancer subtypes based on meta-analysis. Genome Biol 19:142|
|Chen, Gregory M; Kannan, Lavanya; Geistlinger, Ludwig et al. (2018) Consensus on Molecular Subtypes of High-Grade Serous Ovarian Carcinoma. Clin Cancer Res 24:5037-5047|
|Waldron, Levi (2018) Data and Statistical Methods To Analyze the Human Microbiome. mSystems 3:|
|Pasolli, Edoardo; Schiffer, Lucas; Manghi, Paolo et al. (2017) Accessible, curated metagenomic data through ExperimentHub. Nat Methods 14:1023-1024|
|Quiroz-Zárate, Alejandro; Harshfield, Benjamin J; Hu, Rong et al. (2017) Expression Quantitative Trait loci (QTL) in tumor adjacent normal breast tissue and breast tumor tissue. PLoS One 12:e0170181|
|Ramos, Marcel; Schiffer, Lucas; Re, Angela et al. (2017) Software for the Integration of Multiomics Experiments in Bioconductor. Cancer Res 77:e39-e42|
|Myint, Leslie; Kleensang, Andre; Zhao, Liang et al. (2017) Joint Bounding of Peaks Across Samples Improves Differential Analysis in Mass Spectrometry-Based Metabolomics. Anal Chem 89:3517-3523|
|Fortin, Jean-Philippe; Triche Jr, Timothy J; Hansen, Kasper D (2017) Preprocessing, normalization and integration of the Illumina HumanMethylationEPIC array with minfi. Bioinformatics 33:558-560|
|Kannan, Lavanya; Ramos, Marcel; Re, Angela et al. (2016) Public data and open source tools for multi-assay genomic investigation of disease. Brief Bioinform 17:603-15|
|Spratt, Daniel E; Chan, Tiffany; Waldron, Levi et al. (2016) Racial/Ethnic Disparities in Genomic Sequencing. JAMA Oncol 2:1070-4|
Showing the most recent 10 out of 12 publications