Improvements in sequencing technology have allowed genome and transcriptome profiling of large groups of research subjects. Projects such as The Cancer Genome Atlas (TCGA), the Encyclopedia of DNA Elements (ENCODE), the Genotype-Tissue Expression Project (GTEx), and other have placed large, complex, multi- omic data into the public domain. While these large projects and the use of new sequencing technologies has made an unprecedented quantity of data available, technical challenges such as moving and analyzing large multi-omic data sets, and the lack of intuitive and easy to use tools for data analysis, have limited broad exploration of the available data, often separating experimental biologists and domain experts from directly exploring relationships within the data. More than 15 years ago, we began development of MeV, a freely-available, open source software tool for intuitive analysis of genomic data. The simple graphical user interface and the extensive library of state-of-the- art analytical methods made MeV one of the most widely used software tools in bioinformatics, with nearly 260,000 downloads since we began keeping statistics in 2008 and downloads of nearly 30,000 per year for the past few years. Despite the success of MeV and its continued use, we recognized that large-scale, multi-omic data sets can no longer be analyzed easily using a desktop application. To keep pace with the data, we needed to develop a new platform that draws on modern computing technologies, including cloud-based computing and scalable data storage. The solution, funded by the NCI through the ITCR program (5U01CA151118), is a cloud-based, web- enabled version of MeV (WebMeV; http://mev.tm4.org). WebMeV uses Google Cloud Platform (GCP) and its Compute Engine infrastructure to leverage cloud-computing resources for analyzing large public genomic data sets. In April 2016, we released a robust version of WebMeV and have seen use of the system grow dramatically. The system has already been used to perform more than 350,000 analyses; WebMeV currently performing more than 100 analyses per day, 3,735 users who have registered with the system and that group is growing by 400 per month (registration is not required). To ensure wide use, we have done numerous online tutorials, including two ?sold out? tutorials for intramural investigators at the NCI where WebMeV has become a critical tool for genomic analysis. In this application, we propose to continue to maintain and improve WebMeV, to expand its capabilities by implementing methods for network inference and representation, to integrate with the Cancer Genomics Cloud Pilots program, and to implement methods that can advance reproducible research.

Public Health Relevance

/ Relevance: As part of the Informatics Technology for Cancer Research (ITCR) program, we will develop and support WebMeV, a robust, scalable data analysis software tool that uses intuitive visual interfaces to provide users with access to advanced data analysis methods for large-scale genomic data. Our goal is to help assure that analytical access to large public data is democratized so that scientists and physicians can test hypotheses by directly interacting with the data in a way that is not limited by their available computational resources and in a system that helps ensure their research is reproducible.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Resource-Related Research Projects--Cooperative Agreements (U24)
Project #
5U24CA231846-02
Application #
10001456
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Li, Jerry
Project Start
2019-09-01
Project End
2024-08-31
Budget Start
2020-09-01
Budget End
2021-08-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Harvard University
Department
Biostatistics & Other Math Sci
Type
Schools of Public Health
DUNS #
149617367
City
Boston
State
MA
Country
United States
Zip Code
02115