The long-term objective of this project is to enable comprehension and analysis of high-throughput genomic data using advanced statistical and bioinformatics approaches. The vehicle for attaining this objective is the Bioconductor project, based on the R statistical programming language and encompassing an established repository of more than 400 software packages. Bioconductor software packages are used in many areas of health-related academic, government, pharmaceutical, and clinical pursuits. Typical uses include microarray or RNA-seq differential representation in case / control or more elaborate experimental designs; characterization of copy number, single nucleotide polymorphism, or other variants assayed with microarray or sequence data;assaying regulatory or epigenetic changes with array or sequence-based (e.g., ChlP-seq) technologies;high-throughput analysis of flow cytometric data;and other high-throughput (e.g., imaging) assays.
Specific aims of this proposal include: 1, Enabling Bioconductor software distribution and use by a wide community of users and developers;2, Developing computational analytic infrastructure needed to track rapidly emerging and computationally demanding data types;and 3, Contributing new statistical methods for genome scale biology, including clinical application of microarray and RNA-seq data, analysis of the genetics of gene expression, and approaches to comparative assessment of analytic methodologies. The overall approach involves production and maintenance of internet-based resources for software distribution and support, ensuring conformance of Bioconductor packages to high standards of software assurance, and significant documentation and training efforts to enable Bioconductor use and development by a broad community;focused efforts on software implementations for the representation and processing of large data sets and new data types, integrated solutions for efficient software deployment and re-use in third party projects, and data structures and approaches that foster reproducible research;and statistical methods development and implementation as new Bioconductor packages. Software from the Bioconductor pro eci will contribute to analysis and comprehension of genome-scale data on a daily basis in national and international settings. Bioconductor N serve as a conduit for introduction of sophisticated methods to appropriately analyze existing and new high throughput data types. Bioconductor provides an important platform on which doctoral and post-doctoral individuals are trained to engage high-throughput data using statistically appropriate methodologies.

Public Health Relevance

Modern health sciences researchers are gaining access to increasingly large and complicated data, for instance entire genome sequences. Effective interpretation of this data is as essential to understanding fundamental biological principles as it is to informing personalized medicine. The S/oconc/uctor project facilitates development and use of statistical software to enable researchers to arrive at this understanding.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Biotechnology Resource Cooperative Agreements (U41)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Pillai, Ajay
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Fred Hutchinson Cancer Research Center
United States
Zip Code
Pasolli, Edoardo; Schiffer, Lucas; Manghi, Paolo et al. (2017) Accessible, curated metagenomic data through ExperimentHub. Nat Methods 14:1023-1024
Ramos, Marcel; Schiffer, Lucas; Re, Angela et al. (2017) Software for the Integration of Multiomics Experiments in Bioconductor. Cancer Res 77:e39-e42
Waldron, Levi; Riester, Markus; Ramos, Marcel et al. (2016) The Doppelgänger Effect: Hidden Duplicates in Databases of Transcriptome Profiles. J Natl Cancer Inst 108:
Carlson, Marc R J; Pagès, Hervé; Arora, Sonali et al. (2016) Genomic Annotation Resources in R/Bioconductor. Methods Mol Biol 1418:67-90
Kannan, Lavanya; Ramos, Marcel; Re, Angela et al. (2016) Public data and open source tools for multi-assay genomic investigation of disease. Brief Bioinform 17:603-15
Spratt, Daniel E; Chan, Tiffany; Waldron, Levi et al. (2016) Racial/Ethnic Disparities in Genomic Sequencing. JAMA Oncol 2:1070-4
Huber, Wolfgang; Carey, Vincent J; Gentleman, Robert et al. (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12:115-21
Lawrence, Michael; Morgan, Martin (2014) Scalable Genomics with R and Bioconductor. Stat Sci 29:214-226
Obenchain, Valerie; Lawrence, Michael; Carey, Vincent et al. (2014) VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants. Bioinformatics 30:2076-8
Shioda, Toshi; Rosenthal, Noel F; Coser, Kathryn R et al. (2013) Expressomal approach for comprehensive analysis and visualization of ligand sensitivities of xenoestrogen responsive genes. Proc Natl Acad Sci U S A 110:16508-13

Showing the most recent 10 out of 12 publications