Overall High throughput data production technologies, particularly next generation DNA sequencing, have ushered in the most disruptive changes to biomedical research in decades. Making sense of the large datasets produced by high throughput technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in biomedical research, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has been working to address this problem by providing a framework that makes advanced computational tools usable by non-experts. Galaxy seeks to make data-intensive research more accessible, transparent, and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In the proposed project, we will improve Galaxy in several specific ways. We will greatly increase Galaxy's usability for working with large numbers of datasets. Modern experiments often involve hundreds of datasets organized in complex ways; we will make analyzing such data simple and intuitive. We will improve the development and distribution of software tools, making it much easier for developers to distribute tools and for users to acquire them, all while preserving provenance. We will greatly improve access to a wide variety of computational resources such as cloud computing and high- performance clusters, enabling biomedical researchers to use resources that have traditionally been difficult to work with. Finally, we will engage in training, outreach, and dissemination, including the development of scalable training materials that can be used by others to conduct biomedical data analysis training.

Public Health Relevance

Galaxy (http://galaxyproject.org) is a widely used resource for data analysis in genomics, high- throughput biology, and other areas. This project will make specific improvements to Galaxy allowing it to scale to handle the increasingly large and complex 'Big Data' now present across biomedical research, as well as serve an even large user community and reach out to new kinds of users. The result will continue to enable accessible, transparent, and reproducible science across biomedical research and beyond

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Biotechnology Resource Cooperative Agreements (U41)
Project #
Application #
Study Section
Special Emphasis Panel (ZHG1-HGR-M (O1))
Program Officer
Wellington, Christopher
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Johns Hopkins University
Schools of Arts and Sciences
United States
Zip Code
Batut, Bérénice; Hiltemann, Saskia; Bagnacani, Andrea et al. (2018) Community-Driven Data Analysis Training for Biology. Cell Syst 6:752-758.e1
Uritskiy, Gherman V; DiRuggiero, Jocelyne; Taylor, James (2018) MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6:158
Grüning, Björn; Chilton, John; Köster, Johannes et al. (2018) Practical Computational Reproducibility in the Life Sciences. Cell Syst 6:631-635
Nekrutenko, Anton; Team, Galaxy; Goecks, Jeremy et al. (2018) Biology Needs Evolutionary Software Tools: Let's Build Them Right. Mol Biol Evol 35:1372-1375
Afgan, Enis; Baker, Dannon; Batut, Bérénice et al. (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 46:W537-W544
Grüning, Björn A; Rasche, Eric; Rebolledo-Jaramillo, Boris et al. (2017) Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers. PLoS Comput Biol 13:e1005425
Jung, Yoon Hee; Sauria, Michael E G; Lyu, Xiaowen et al. (2017) Chromatin States in Mouse Sperm Correlate with Embryonic and Adult Regulatory Landscapes. Cell Rep 18:1366-1382
Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried et al. (2016) Streamlined analysis of duplex sequencing data with Du Novo. Genome Biol 17:180
Afgan, Enis; Baker, Dannon; van den Beek, Marius et al. (2016) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44:W3-W10
Børnich, Claus; Grytten, Ivar; Hovig, Eivind et al. (2016) Galaxy Portal: interacting with the galaxy platform through mobile devices. Bioinformatics 32:1743-5

Showing the most recent 10 out of 30 publications