. Understanding the evolutionary history of living organisms is of central importance to every field of biological and biomedical research. Evolutionary information is critical to the discovery process across all biological scales, from proteins to populations. The CIPRES Science Gateway (CIPRES) is a global resource that speeds up inference of evolutionary history from DNA sequence data. CIPRES provides public access to community phylogenetics software run on high performance computing (HPC) resources at no cost to the user. CIPRES allows investigators to access all the capabilities of phylogenetics software efficiently though a browser interface, without having to install the codes, learn the details of schedulers, and construct command lines. CIPRES is the sole public resource for many of these codes, completes analyses 5-30 fold faster than a typical laptop computer, allows users to run many analyses simultaneously, and provides indefinite storage of the results. At present, approximately 2,000 users per month submit 20,000 jobs through CIPRES; they have produced more than 3,000 peer reviewed publications in subject areas relevant to NIH priorities, from enzymology to epidemiology. CIPRES has provided support for research ranging from HIV virus transmission to the discovery of an entire new branch in the tree of life. All of these discoveries happen more quickly because CIPRES provides easy access to HPC resources. The project proposed here will improve the software used to create CIPRES, making it more effective and easier to access. It will provide an environment that allows researchers to easily collaborate, share data selectively, and make their results publicly available. It will expose access to CIPRES services through other important software environments, including Galaxy and Geneious. The project will add a number of user-requested features so work is more efficient, including restarting jobs that have terminated prematurely, transferring large files, and input file validation with automatic configuration of jobs for optimal execution. The project will provide access to many new community codes that have been requested, and those that appear during the project lifetime. Improvements to the interface will make it faster, more intuitive, and useable on smart phones and tablets. The project will also give users ?cloud-bursting? capabilities. Users will be able to submit jobs to a commercial cloud provider on a fee-for-service basis or via NIH commons account when their job is too large for the standard CIPRES community resources. This capability means CIPRES can be scaled and sustained indefinitely for a user population of any size. These improvements (both in capabilities and compute capacity) are expected to greatly expand the number of users who incorporate CIPRES into their day-to-day workflow. The improvements made here will be available to the global research community through release of the underlying software as an open source, distributable package that can be used by any community of practice to access HPC resources. As a result, all improvements created for CIPRES users can be implemented simply and quickly in other online resources for other specific research communities.

Public Health Relevance

The CIPRES Science Gateway is an online resource that provides scientists around the world with tools to analyze DNA sequence data on large supercomputers at no cost. The analyses done on CIPRES speed discoveries in a wide range of health-related areas, from basic research (how do genomes evolve?) to epidemiology (how does the HIV virus spread?). The project will improve CIPRES, so more scientists can use it, it will be easier to use, there will be more tools for their analyses, and they can run larger, longer calculations.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM126463-04
Application #
10067556
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2017-12-15
Project End
2021-11-30
Budget Start
2020-12-01
Budget End
2021-11-30
Support Year
4
Fiscal Year
2021
Total Cost
Indirect Cost
Name
University of California, San Diego
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
804355790
City
La Jolla
State
CA
Country
United States
Zip Code
92093