A big obstacle to such sharing is the inordinate amount of time and effort that must be spent in creating, communicating, receiving, and interpreting specifications of data, models, and associated knowledge. This inability to quickly and conveniently share is particularly a problem in computational geoscience, wherein scientists spend significant portions of their time managing the many input and output files that are typically associated with a model. When developing, testing, validating, and comparing models, particularly coupled models, the number of such data elements and the complexity associated with their management soon outgrows human memory capacity. The unfortunate consequence is that researchers often narrow the scope of a model analysis, compromise research quality, or conduct analysis within restricted teams. This pilot project will demonstrate a mechanism to overcome this challenge in the scientific community.
The GeoDataspace pilot will develop a new data-centric approach to describing models and associated data resources for computational geoscience. This new approach will both simplify model use and enhance the shareability, reusability, and reproducibility of models, data, and computations?properties widely sought by computational geoscientists. Specifically, the project will develop methods for defining, sharing, and accessing geounits, collections of descriptive metadata that define a m the entire collection of files needed to run a computational model, including details about the model run. In the case of files, processing and manipulation scripts, manifests, spreadsheets, or one-off databases, the encapsulation may consist simplify of the elements location and specification of each element. The GeoDataspace team includes (a) experts in cyberinfrastructure, data management systems, and SaaS at UChicago; (b) experienced and leading geoscientists in four domains of solid earth, climate, hydrology, and space science, and a leading expert, as well as, geoscientist on model coupling frameworks. Together the team has identified a cross-cutting data management barrier that must be critically addressed in a domain-independent manner so as to extend capabilities to a broader set of geoscientists.All participating geoscientists are initiators, leaders, working-group chairs, and/or representatives of the five modeling communities we represent, including Computational Infrastructure for Geodynamics (CIG), Community Earth System Models (CESM), Consortium of Universities for the Advancement of Hydrological Science, Inc. (CUAHSI), and Community Coordinated Modeling Center (CCMC) at NASA, and finally Earth Systems Bridge (ESB), a community invested in developing model coupling frameworks. In total, the number of geoscientists either directly or indirectly involved in GeoDataspace is in the hundreds, if not thousands.