Modern biology continues to be revolutionized by high throughput data production technologies. Nowhere is this more obvious than in the case of """"""""next-generation"""""""" DNA sequencing technologies, which have dramatically higher throughput and lower cost then previous approaches. Not only do these technologies make genome sequencing and resequencing more widely available, they have driven the development of a variety of novel genome-wide (and data-intensive) functional assays. But are these methods really accessible for experimental- ists? Although the financial cost of sequencing has been substantially reduced, there is still a significant barrier preventing experimental biologists from making effective use of this data. Translating the data generated by these new technologies requires sophisticated computational infrastructure - both for data large-scale data management and analysis - that is accessible to experimentalists. Genomic data discovery is no longer the limiting factor for much genomic research, instead the problem lies in providing the data, analysis tools, and protocols in a form that is usable for bench biologists, so that they can take full advantage of their data. We have developed a framework - Galaxy - that makes it easy to provide accessible interfaces to computational tools, and provides experimental biologists with an intuitive and consistent interface for per- forming sophisticated analyses with minimal effort, regardless of the scale of data involved. Here we propose to build, using this existing framework, a complete """"""""turnkey"""""""" solution for accessible management and analysis of next-generation sequence data. This solution will allow data produced by sequencing instruments to be automatically made available to bench biologists through Galaxy's user-friendly analysis environment. Into this environment we will integrate a large set of tools for sequence data analysis, along with pre-defined best- practice """"""""workflows"""""""" for common analysis problems. The entire solution will be provided as a pre-configured ready-to-run package which any lab or provider of sequencing services can easily deploy, enabling their users to truly realize the promise of next-generation sequencing technologies.

Public Health Relevance

A new generation of high-throughput DNA sequencing technologies has made a variety of novel data-intensive genome-scale experiments both possible and relatively inexpensive, putting these techniques within the reach of many more labs. However, these dramatic improvements in the availability and cost of sequencing have not yet been matched with easy-to-use, scalable, integrated and flexible data analysis capabilities. The proposed project will develop an integrated data management and analysis solution that allows biomedical researchers to easily and efficiently work with the data produced by these revolutionary new technologies.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Exploratory/Developmental Grants (R21)
Project #
5R21HG005133-02
Application #
7882283
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Good, Peter J
Project Start
2009-07-01
Project End
2012-06-30
Budget Start
2010-07-01
Budget End
2012-06-30
Support Year
2
Fiscal Year
2010
Total Cost
$189,750
Indirect Cost
Name
Emory University
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
066469933
City
Atlanta
State
GA
Country
United States
Zip Code
30322
Børnich, Claus; Grytten, Ivar; Hovig, Eivind et al. (2016) Galaxy Portal: interacting with the galaxy platform through mobile devices. Bioinformatics 32:1743-5
Afgan, Enis; Baker, Dannon; van den Beek, Marius et al. (2016) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 44:W3-W10
Goecks, Jeremy; El-Rayes, Bassel F; Maithel, Shishir K et al. (2015) Open pipelines for integrated tumor genome profiles reveal differences between pancreatic cancer tumors and cell lines. Cancer Med 4:392-403
Harris, Nomi L; Cock, Peter J A; Chapman, Brad A et al. (2015) The Bioinformatics Open Source Conference (BOSC) 2013. Bioinformatics 31:299-300
Blankenberg, Daniel; Taylor, James; Nekrutenko, Anton (2015) Online resources for genomic analysis using high-throughput sequencing. Cold Spring Harb Protoc 2015:324-35
Budd, Aidan; Corpas, Manuel; Brazas, Michelle D et al. (2015) A quick guide for building a successful bioinformatics community. PLoS Comput Biol 11:e1003972
Blankenberg, Daniel; Von Kuster, Gregory; Bouvier, Emil et al. (2014) Dissemination of scientific software with Galaxy ToolShed. Genome Biol 15:403
Blankenberg, Daniel; Johnson, James E; Galaxy Team et al. (2014) Wrangling Galaxy's reference data. Bioinformatics 30:1917-9
Goecks, Jeremy; Mortimer, Nathan T; Mobley, James A et al. (2013) Integrative approach reveals composition of endoparasitoid wasp venoms. PLoS One 8:e64125
Goecks, Jeremy; Eberhard, Carl; Too, Tomithy et al. (2013) Web-based visual analysis for high-throughput genomics. BMC Genomics 14:397

Showing the most recent 10 out of 24 publications