The goal of the FaceBase III Hub is to create a FAIR data repository to serve the entire community of dental and craniofacial researchers by sharing diverse data related to craniofacial development and dysmorphia. To meet this goal, FaceBase is built on Deriva, an open-source data management system designed with FAIR data principles in mind. This platform has allowed FaceBase to evolve with changing requirements for data on new experimental methodologies and instruments, additional model organisms, cell characterization, integration of computational pipelines, and visualization interfaces. Currently, we implement Deriva on private and public clouds using a ?data center-in-the-cloud? format; i.e., treating the cloud like a traditional remote computer, to run virtual machine images and conventional data storage. However, cloud platforms such as Amazon Web Services offer a wide range of cloud-native services beyond virtual machines which if fully leveraged would drastically improve important aspects of Deriva that would directly benefit FaceBase. Hence, we propose to enhance Deriva for cloud-based operations to address three key aspects of Deriva in support of FaceBase and its other NIH communities: scalability, reliability, and sustainability. Specifically, we plan to use AWS native services to improve its scalability (Aim 1), decouple Deriva services to run in containerized execution environments to ensure its reliability (Aim 2), and develop cost management dashboards to monitor and predict costs of operating in the cloud to achieve sustainability (Aim 3). The AWS native services are fully managed and highly-scalable, and offload much of the overhead of system operations and maintenance. Improvements in Deriva scalability, reliability, and sustainability achieved by these Aims will allow the FaceBase Hub to provide the growing community of data contributors and users with better service. In addition, many other user communities such as GUDMAP, (Re)Building a Kidney, the Kidney Precision Medicine Project (NIDDK) and the Common Fund Data Environment (OD) rely on Deriva, and all of the improvements resulting from these Aims would yield a direct and immediate benefit to thousands of additional users.

Public Health Relevance

Craniofacial dysmorphia is one of the leading causes of birth defects, and in recognization, the major goal of the FaceBase III project is to advance research by creating comprehensive datasets of craniofacial development and dysmorphologies, and to disseminate these datasets to the wider craniofacial research community. As the usage of FaceBase grows, the data repository will be required to continuously scale up its storage and compute capability and to provide ever-increasing reliability and availability. Our proposed project will help FaceBase meet these needs by integrating scalable and reliable cloud services into the FaceBase repository, providing enhanced users to the entire FaceBase user community.

Agency
National Institute of Health (NIH)
Institute
National Institute of Dental & Craniofacial Research (NIDCR)
Type
Research Project--Cooperative Agreements (U01)
Project #
3U01DE028729-02S2
Application #
10166440
Study Section
Special Emphasis Panel (ZDE1)
Program Officer
Wang, Lu
Project Start
2019-08-01
Project End
2021-07-31
Budget Start
2020-08-26
Budget End
2021-07-31
Support Year
2
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Southern California
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089