Biomedical research has been rapidly transformed into an informatics intensive discipline. This has created challenges at many levels, from the availability of computational infrastructure and expertise, the burden of keeping up with rapidly developing tools and best-practices, communication difficulties between experimentalists and computational researchers, and difficulties ensuring reproducibility. Over the last six years we have developed an open-source software framework, Galaxy (http://usegalaxy.org), to address these issues. Galaxy provides an accessible analysis environment allowing experimentalists to use cuttingedge tools on large datasets, with automated tracking to ensure reproducibility. Galaxy makes it easy for tool developers to quickly put their tools into experimentalist's hands. Galaxy has become an indispensable resource for the genomic research community. First, for the thousands of experimentalists using Galaxy's tools in their research (as evidenced in many publications). Beyond that, Galaxy has been adopted as the local analysis infrastructure for many dozens of labs and institutes. Galaxy is flexible enough to be deployed on a variety of different compute resources, particularly important as data-production is increasingly de-centralized. At Galaxy's core is a powerful extensible framework that other important community resource projects are now integrating or building on. Thus Galaxy is ideally positioned to become a substrate for sharing and communicating analysis. We propose to expand the Galaxy resource with novel approaches for accessible, transparent, and reproducible analysis in a decentralized world. Driven by biological projects, we will build best practice workflows for several sequencing based experiments. We will create innovative methods to automate packaging and deploying analysis tools. We will build and maintain the Galaxy Tool Shed, a hub for sharing tools, best-practice workflows, and analysis strategies. We will develop a novel approach for publishing analysis. We will create a framework for visual analytics leveraging existing Galaxy Tools. Finally, we will build a complete solution for managing sequencing workflows including sample tracking and instrument integration.

Public Health Relevance

Rapid proliferation of genomic approaches is revolutionizing medical field by creating novel diagnostic applications. This project will make cutting edge biomedical analysis tools available to every clinical researcher fulfilling the translation promise of sequencing technologies.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
5U41HG006620-04
Application #
8832737
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O2))
Program Officer
Wellington, Christopher
Project Start
2012-02-22
Project End
2015-12-31
Budget Start
2015-01-01
Budget End
2015-12-31
Support Year
4
Fiscal Year
2015
Total Cost
$1,402,446
Indirect Cost
$215,908
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802
Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea et al. (2018) Community-Driven Data Analysis Training for Biology. Cell Syst 6:752-758.e1
Uritskiy, Gherman V; DiRuggiero, Jocelyne; Taylor, James (2018) MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6:158
Grüning, Björn; Chilton, John; Köster, Johannes et al. (2018) Practical Computational Reproducibility in the Life Sciences. Cell Syst 6:631-635
Nekrutenko, Anton; Team, Galaxy; Goecks, Jeremy et al. (2018) Biology Needs Evolutionary Software Tools: Let's Build Them Right. Mol Biol Evol 35:1372-1375
Afgan, Enis; Baker, Dannon; Batut, Bérénice et al. (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537-W544
Jung, Yoon Hee; Sauria, Michael E G; Lyu, Xiaowen et al. (2017) Chromatin States in Mouse Sperm Correlate with Embryonic and Adult Regulatory Landscapes. Cell Rep 18:1366-1382
Grüning, Björn A; Rasche, Eric; Rebolledo-Jaramillo, Boris et al. (2017) Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers. PLoS Comput Biol 13:e1005425
Børnich, Claus; Grytten, Ivar; Hovig, Eivind et al. (2016) Galaxy Portal: interacting with the galaxy platform through mobile devices. Bioinformatics 32:1743-5
Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried et al. (2016) Streamlined analysis of duplex sequencing data with Du Novo. Genome Biol 17:180
Afgan, Enis; Baker, Dannon; van den Beek, Marius et al. (2016) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44:W3-W10

Showing the most recent 10 out of 30 publications