Acquisition of system-wide molecular information on genomes, proteomes and metabolomes (multi-omics data) is now possible for biological researchers. Approaches that jointly analyze and interpret multi-omic data provide new understanding of the molecular interplay between genes, proteins and metabolites. For example, proteogenomics integrates DNA or RNA sequence data with mass spectrometry (MS)-based proteomics data to identify novel protein variants expressed under different conditions; metaproteomics integrates metagenomic data with MS-based proteomic data to identify proteins expressed by microbial communities; metabologenomics integrates genomic and proteomic data with metabolomic profiling data, connecting genotypes to function. To enable new discoveries mined from multi-omic data, the raw data must first be analyzed with specialized, domain specific software. Unfortunately, the requirement to master and integrate different software for large-scale data analysis discourages many researchers who may otherwise use and benefit from multi-omic approaches in their work. The goal of this project is to provide a solution to this problem by developing a unified, web-based analytical hub for multi-omic data accessible to bench scientists. Broader Impact activities will include promotion and dissemination of these software tools to the greater research community and training of undergraduates in bioinformatics via internship projects.

This project will leverage the web-based, user-friendly and research enabling Galaxy bioinformatics workbench. In addition to proteogenomics and metaproteomics, novel Galaxy extensions will be built to enable data analysis for metabolomics, a field in desperate need of a unified resource to consolidate its many useful but scattered software tools. Combined with new interactive visualization tools, data exchange features, and deployment in scalable and accessible cloud-based resources, Galaxy's functionality will be significantly enhanced to empower more researchers in multi-omic data analysis. The project has these Specific Aims: 1) Enhance the Galaxy environment with new interactive visualization tools and data exchange functionalities necessary for effective multi-omic data analysis; 2) Extend the Galaxy environment to analyze and process diverse metabolomics data and support workflows for metabolic activity profiling; 3) Extend the Galaxy environment for integrative genomic-proteomic data analysis supporting proteogenomic and metaproteomic applications; 4) Ensure Broader Impact by promoting usage of Galaxy for multi-omics by the research community and providing undergraduate training opportunities in computational systems biology via a local area institutional network. Results from this project can be found at usegalaxyp.org.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
1458524
Program Officer
Peter McCartney
Project Start
Project End
Budget Start
2015-09-01
Budget End
2019-08-31
Support Year
Fiscal Year
2014
Total Cost
$1,780,233
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455