Biomedical research has been rapidly transformed into an informatics intensive discipline. This has created challenges at many levels, from the availability of computational infrastructure and expertise, the burden of keeping up with rapidly developing tools and best-practices, communication difficulties between experimentalists and computational researchers, and difficulties ensuring reproducibility. Over the last six years we have developed an open-source software framework, Galaxy (http://usegalaxy.org), to address these issues. Galaxy provides an accessible analysis environment allowing experimentalists to use cuttingedge tools on large datasets, with automated tracking to ensure reproducibility. Galaxy makes it easy for tool developers to quickly put their tools into experimentalist's hands. Galaxy has become an indispensable resource for the genomic research community. First, for the thousands of experimentalists using Galaxy's tools in their research (as evidenced in many publications). Beyond that, Galaxy has been adopted as the local analysis infrastructure for many dozens of labs and institutes. Galaxy is flexible enough to be deployed on a variety of different compute resources, particularly important as data-production is increasingly de-centralized. At Galaxy's core is a powerful extensible framework that other important community resource projects are now integrating or building on. Thus Galaxy is ideally positioned to become a substrate for sharing and communicating analysis. We propose to expand the Galaxy resource with novel approaches for accessible, transparent, and reproducible analysis in a decentralized world. Driven by biological projects, we will build best practice workflows for several sequencing based experiments. We will create innovative methods to automate packaging and deploying analysis tools. We will build and maintain the Galaxy Tool Shed, a hub for sharing tools, best-practice workflows, and analysis strategies. We will develop a novel approach for publishing analysis. We will create a framework for visual analytics leveraging existing Galaxy Tools. Finally, we will build a complete solution for managing sequencing workflows including sample tracking and instrument integration.

Public Health Relevance

Rapid proliferation of genomic approaches is revolutionizing medical field by creating novel diagnostic applications. This project will make cutting edge biomedical analysis tools available to every clinical researcher fulfilling the translation promise of sequencing technologies.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Biotechnology Resource Cooperative Agreements (U41)
Project #
5U41HG006620-02
Application #
8432034
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O2))
Program Officer
Bonazzi, Vivien
Project Start
2012-02-22
Project End
2015-12-31
Budget Start
2013-01-01
Budget End
2013-12-31
Support Year
2
Fiscal Year
2013
Total Cost
$1,373,678
Indirect Cost
$330,438
Name
Pennsylvania State University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
003403953
City
University Park
State
PA
Country
United States
Zip Code
16802
Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei; Stoler, Nicholas et al. (2014) Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA. Proc Natl Acad Sci U S A 111:15474-9
Blankenberg, Daniel; Johnson, James E; Galaxy Team et al. (2014) Wrangling Galaxy's reference data. Bioinformatics 30:1917-9
Dickins, Benjamin; Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei et al. (2014) Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach. Biotechniques 56:134-6, 138-41
Leo, Simone; Pireddu, Luca; Cuccuru, Gianmauro et al. (2014) BioBlend.objects: metacomputing with Galaxy. Bioinformatics 30:2816-7
Blankenberg, Daniel; Von Kuster, Gregory; Bouvier, Emil et al. (2014) Dissemination of scientific software with Galaxy ToolShed. Genome Biol 15:403
Goecks, Jeremy; Eberhard, Carl; Too, Tomithy et al. (2013) Web-based visual analysis for high-throughput genomics. BMC Genomics 14:397
Sandve, Geir Kjetil; Nekrutenko, Anton; Taylor, James et al. (2013) Ten simple rules for reproducible computational research. PLoS Comput Biol 9:e1003285