A Data Commons that realizes the goal of efficiency in research needs to transform the way we access, use, and generate data. This vision will require the efforts of a multidisciplinary, multiinstitutional investigative team with complementary expertise in biomedicine, cloud architecture, software engineering, analytical tools, and data harmonization. Each of the eight Key Capabilities (KCs) addresses specific challenges faced by scientists working with large-scale biomedical data. The proposed projects are designed such that each KC has unique objectives and deliverables in the form of stand-alone Minimum Viable Products (MVPs), yet together, the KCs form a continuum of insights and approaches that capture the five V?s of data and reflect FAIR principles. The specific scientific use case for KC8 is sex as biological variable (SABV). SABV is agnostic with respect to any disease or medical condition, manifests across multiple clinical and model systems, is relevant to all types of data and datasets emphasized in the RFA, requires a data model and data harmonization across data, and addresses challenges in scientific rigor and transparency, as recently emphasized by NIH. Moreover, the use case achieves these goals by examining and computing over the data sources identified by NIH, namely, TOPMed, GTEx, and MODs. Furthermore, SABV as a use case enables contributing KCs to examine each KC in the context of real-life challenges. Indeed, a key challenges in data integration across multiple knowledge domains is the identification of commonalities and trends that can be used in a predictive manner. SABV serves as an exemplar that requires maximizing data utility for computational use. To exemplify cross-KC connectivity with the proposed work, consider a collaborative team with interest in determining whether specific dietary interventions differ in effectiveness by sex. Using KC8.MVP1, the team examines the impact of SABV on gene expression in the pancreas (GTEx) and on metabolic gene products (MODs). Results are moved into the cloud environment provided by KC4 PIVOT and the compute capabilities provided by KC5 Data Science Stacks and CWL Execution Tools, where the team leverages whole-genome sequencing and phenotypic data from TOPMed via KC8.MVP2 and using KC3 API and Tool Suite, KC2 GUIDs Best Practices and Registry, and KC7 Indexing/Search Capabilities to facilitate the process. The team conducts analyses to identify covariates and develop models for further analysis of loci that exhibit sexual dimorphisms and/or sex interactions with dietary interventions. All results are deposited into the Data Commons and used to guide the efficient design of randomized controlled clinical trials. The team?s resources and products are assessed using KC1 FAIR-TLC METRICS, and KC6 Governance Council oversees team activities to ensure that all ethical, security, and privacy issues have been considered and requirements are enforced. We envision a set of independent, yet interoperable, KCs designed to seamlessly address biomedical data challenges in the context of SABV, with likely complementarity to the KCs proposed by other groups. Our team was assembled specifically for its collaborative and open science values and has circulated a draft Consortium Agreement among the partnering institutions.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Project #
3OT3OD025464-01S1
Application #
9668320
Study Section
Data Coordination, Mapping, and Modeling (DCMM)
Program Officer
Kutkat, Lora
Project Start
2017-09-30
Project End
2018-11-30
Budget Start
2017-09-30
Budget End
2018-11-30
Support Year
1
Fiscal Year
2018
Total Cost
Indirect Cost
Name
University of North Carolina Chapel Hill
Department
Miscellaneous
Type
Organized Research Units
DUNS #
608195277
City
Chapel Hill
State
NC
Country
United States
Zip Code
27599