This project creates a platform that simplifies knowledge extraction from diverse scientific datasets, by integrating multimodal data, computing software, and interactive tools for search and exploration.  The effort provides a collaborative environment for sharing research data and computational and statistical algorithms.  During research investigations, the environment uploads and organizes data, executes and tracks computational jobs, and connects input and output to analysis results. When investigations are complete, the environment helps publish data, algorithms and investigative workflows for public access. The environment also simplifies the discovery, search, and exploration of published datasets - providing interactive tools for viewing and analyzing data. Four distinct use cases (chemistry, electrical engineering, nutrition science, and environmental science) provide a testbed for platform usability and functionality. The use cases produce important datasets that are intended for use by HHS, USDA, EPA, DOE, and other government agencies to support decision-making about policy and regulations.
The project builds upon a 2014 DIBBs pilot demonstration award (#1443017 - DataHub) which created a data management platform for the publishing and discovery of scientific research datasets. The team extends this platform to support the full research investigation process by 1) connecting data to computational and statistical modeling software, 2) tracking research workflows to link data, algorithms, and results, 3) automatically capturing metadata and classifying data by type, and 4) providing interfaces to define complex hierarchical, structured data and operate on the data using analytical toolkits. Published datasets are explored with interactive tools that interpret data types for advanced navigation, viewing, search, analysis, and visualization. Collaborations with research projects in chemistry, electrical engineering, nutrition science, and environmental science produce system requirements and guarantee that the general, discipline-neutral platform provides end-to-end support for their use cases. The project helps researchers curate their own findings, and also facilitates the sharing of data and findings for the purposes of preservation, replication, and extension.