The translation from large volumes of experimental data to clinically relevant insights relies on sophisticated computational analysis tools that can handle the enormous high-throughput sequence, polymorphism, and functional datasets. Developing appropriate tools is necessary but not sufficient, because the independent analysis tools in themselves do not solve an increasingly problematic barrier blocking the bench-to-bedside path outlined in the NIH Roadmap for medical research: making powerful new computational tools readily accessible and useful for experimental biologists. Developing usable and consistent user interfaces requires significant effort, and few tool developers can afford to devote time and resources to this goal. Currently many powerful, independent analysis tools exist, but lack integrated, easy-to-use interfaces that would allow experimental biologists to take advantage of them. Thus, developing tools to analyze overwhelming amounts of data is no longer the main challenge in biomedical research. Instead the problem lies in making existing tools usable for bench biologists so that they can take full advantage of existing data. We have developed a system - GALAXY - that makes substantial progress toward solving this problem. For experimental biologists, it provides an intuitive and consistent interface for performing sophisticated analyses with minimal effort, regardless of the scale of data involved. For computational tool developers, it makes it easy to integrate existing tools with a modern user interface by writing a simple, concise interface description. For data providers, it features a simple, elegant data access protocol. Thus, GALAXY bridges a critically important gap between data resources, computational tools and users, by making it easy to modernize the interfaces of any existing tool, freeing developers of new tools from the need to develop interfaces from scratch, and facilitating tool interoperability and complex analyses by seamlessly integrating analysis outputs, applications and external data. Here we propose to develop novel features specifically designed for translational research. First, we will engineer a tool integration framework streamlining delivery of analysis software to experimentalists. Second, we will develop a statistical genetics toolkit allowing clinicians to manipulate and interpret human variation data on any scale. Third, we will implement the first integrated system for analysis of short-read sequencing data. Fourth, we will design utilities for manipulation of the most valuable comparative genomics resource - multi- genome alignments. Finally, we will build a workflow system to enable reproducible and collaborative analysis of genomic data.

Public Health Relevance

Genomic data discovery is no longer a limiting factor for much of the medical research. The NIH Roadmap recognizes that many challenges in biomedical research will only be overcome through appropriate investment to improve integrative access to existing data and tools, so researchers can more effectively and rapidly trans- late their findings into practice. The proposed project addresses this challenge by allowing biomedical re- searchers to take advantage of the enormous sequence, polymorphism, and functional datasets easily and effectively.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Bonazzi, Vivien
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Pennsylvania State University
Schools of Arts and Sciences
University Park
United States
Zip Code
Børnich, Claus; Grytten, Ivar; Hovig, Eivind et al. (2016) Galaxy Portal: interacting with the galaxy platform through mobile devices. Bioinformatics 32:1743-5
Afgan, Enis; Baker, Dannon; van den Beek, Marius et al. (2016) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44:W3-W10
Tang, Yin; Bouvier, Emil; Kwok, Chun Kit et al. (2015) StructureFold: genome-wide RNA secondary structure mapping and reconstruction in vivo. Bioinformatics 31:2668-75
Blankenberg, Daniel; Taylor, James; Nekrutenko, Anton (2015) Online resources for genomic analysis using high-throughput sequencing. Cold Spring Harb Protoc 2015:324-35
Blankenberg, Daniel; Von Kuster, Gregory; Bouvier, Emil et al. (2014) Dissemination of scientific software with Galaxy ToolShed. Genome Biol 15:403
Dickins, Benjamin; Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei et al. (2014) Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach. Biotechniques 56:134-141
Blankenberg, Daniel; Johnson, James E; Galaxy Team et al. (2014) Wrangling Galaxy's reference data. Bioinformatics 30:1917-9
Goecks, Jeremy; Eberhard, Carl; Too, Tomithy et al. (2013) Web-based visual analysis for high-throughput genomics. BMC Genomics 14:397
Sandve, Geir Kjetil; Nekrutenko, Anton; Taylor, James et al. (2013) Ten simple rules for reproducible computational research. PLoS Comput Biol 9:e1003285
Goecks, Jeremy; Coraor, Nate; Galaxy Team et al. (2012) NGS analyses by visualization with Trackster. Nat Biotechnol 30:1036-9

Showing the most recent 10 out of 28 publications