Perhaps the biggest challenge in designing and implementing a data sharing strategy for HEAL is its broad range of programs, projects and types of data being collected. We anticipate that programs and/or projects will vary according to the capability of their Data Management Center (DMC) to support broad sharing, and an effective platform will need to work smoothly and avoid interfering with the most capable DMCs while at the same time providing adequate, targeted support to the others. The range of data types is extensive, spanning multiple measurement modalities (e.g., clinical data, bioassays, wearables and self-report data) as well as varying in size and complexity. Finally, the range of scientific disciplines represented not only among the HEAL investigators but among other researchers likely to use HEAL data, together with the range of scientific questions that might be pursued, must inform the way data are organized, documented and made accessible. This substantial heterogeneity in data sources, types and uses has implications for which strategy is most appropriate. Specifically, a single monolithic platform is likely to fail, or at least be sub- optimal, leading the larger and more capable programs to consider building their own systems. At the same time, asking all programs or projects to build their own systems conforming to a set of common requirements would be both expensive and likely to result in many inadequate systems given the inherent difficulty and high failure rate of IT projects - especially those involving data. Moreover, even with a common set of requirements, disparate systems will create challenges when researchers try to build applications over them. The strategy we propose here is designed to avoid these problems by meeting each program or project exactly where it is. First, we shall separate individual programs into those that are capable of building an adequate data sharing platform (call these platform competent) and those that are not (call these platform challenged). We will not interfere with the platform competent programs except to help them integrate G3FS into their platforms to allow them to interoperate within the HDE, and when necessary, identify areas where further harmonization or other data enhancement would facilitate new uses of their data. In contrast, we shall further separate the platform challenged programs into natural groups based on similarities of study design and data being collected, and work with each group to develop and operate a Gen3 Data Commons (G3DC) built over the G3FS to accommodate all of the data collected by that group. Finally, we shall implement an HDE Browser that can search across the HEAL data commons as well as the other data platforms that interoperate via their exposed APIs and their use of the G3FS.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Project #
1OT2OD030208-01
Application #
10167308
Study Section
Special Emphasis Panel (ZOD1)
Project Start
2020-09-01
Project End
2025-08-31
Budget Start
2020-09-01
Budget End
2023-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Chicago
Department
Type
DUNS #
005421136
City
Chicago
State
IL
Country
United States
Zip Code
60637