The era of big data is both exciting and challenging for many biomedical investigators - exciting because of the scientific and medical possibilities, and challenging because of their lack of background in information technology (IT). As DNA sequencing costs continue to drop, many investigators would like to sequence and analyze new genomes as well as build knowledge of genome function through ChIP-seq and RNA-seq technologies, etc., but find themselves limited by the IT demands. Our goal here is to create a widely-accessible Web-based application (christened G-OnRamp) for interactive analysis and visualization of genomic data to analyze and annotate eukaryotic genomes. This project will develop integrated software that will enable any investigator to construct genome browsers with evidence tracks from multiple analysis tools, using the browser to interactively annotate the functional elements of a genome. Project utilization will be amplified by the use of the system in undergraduate education. Genome annotation projects have been shown to provide a terrific vehicle both for bringing bioinformatics into the undergraduate curriculum, and for engaging undergraduates in research. Once they experience some success, and see and appreciate the power of such a system, both biomedical researchers and undergraduates alike may be motivated to learn additional IT skills. Approach: the project is a partnership between the Galaxy Project, a web-based workbench that provides access to a large suite of bioinformatics tools and shared workflows, and the Genomics Education Partnership (GEP), made up of faculty from >100 primarily undergraduate institutions, with broad experience in introducing students to genomics.
Aim 1 : a collaborative effort between biologists and computer scientists to create better pipelines for genomics analysis, and integrate them with a genome browser construction pipeline that will facilitate genome annotation, enabling numerous distributed projects for both existing and novel genomes (e.g., a parasite). The system will enable biomedical researchers and faculty to set up new, local projects; link users to the needed web resources; and provide a means for collaborative annotation projects. Crowd-sourcing genome annotation will benefit both students and the biomedical research community. Beta-testing of G-OnRamp will be done by GEP faculty and other early adopters.
Aim 2 : develop training materials on the use of the system for both beginners and experts. This will be done with extensive dialogue between GEP and Galaxy. The final system will be accessible for a novice, and useful for an expert.
Aim 3 will focus on dissemination. This will be done through Web-based announcements, short meeting workshops and longer train the trainer workshops; introducing the system through the GEP will provide a starting pool of >100 campuses and >1000 undergraduates. Providing broad access to a system for genome analysis and annotation will greatly strengthen bioinformatics research and training in the biomedical sector, enabling both basic and applied studies that will contribute to NHGRI's base pairs to bedside charter.

Public Health Relevance

The goal of this proposal is to provide a better on-ramp for interactive analysis and visualization of genomes, enabling any biomedical investigator to make better use of the large genomic datasets that are now available. This will allow investigators to more easily analyze new genomes (for example, of a parasite or other organism causing infectious disease) and to add different kinds of data to previously sequenced genomes. The new system will enable undergraduates to participate, providing a means of 'crowd-sourcing' analysis of genomes.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Education Projects (R25)
Project #
5R25GM119157-02
Application #
9149280
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Ravichandran, Veerasamy
Project Start
2015-09-15
Project End
2018-06-30
Budget Start
2016-07-01
Budget End
2017-06-30
Support Year
2
Fiscal Year
2016
Total Cost
Indirect Cost
Name
Washington University
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
068552207
City
Saint Louis
State
MO
Country
United States
Zip Code
63130
Elgin, Sarah C R; Hauser, Charles; Holzen, Teresa M et al. (2017) The GEP: Crowd-Sourcing Big Data Analysis with Undergraduates. Trends Genet 33:81-85