This EAGER award creates an interoperability test bed to identify the components of an effective layered architecture for geoscience and environmental science research. In a layered architecture, every layer consists of different technologies, each of which uses different interaction protocols. The proposed project will examine a wide variety of existing technologies in terms of their effectiveness in working across present data silos. These technologies include data grids, workflow systems, policy management systems, web visualization services, and security protocols that work with various repository catalogs. Project goals are focused on developing cyberinfrastructure tools and approaches that allow geoscience data repositories to enable new science and more effectively make their data holdings discoverable and available to the public. Essential elements of the project include the collection and comparision of various approaches and existing tools to check effectiveness in handling and integrating geoscience data, and by automating processes needed to integrate various databases and data types. The project is led by a team of experts in cyberinfrastructure and geoscience data management and employs a spiral softwar3ee development approach. Broader impacts of the work include building infrastructure for science in order to facilitate data-enabled science in the geosciences. It will also produce results that are likely to be applicable to fields outside of the geosciences. The effort supports a larger NSF effort to establish a new paradigm in the development of an integrative and interoperable data and knowledge management system for the geosciences for a new NSF initiative called EarthCube.
(Award #1239678, Moore PI), was to explore standard interoperability mechanisms for building data cyberinfrastructure for the Geo-Science sub-disciplines. Each sub-discipline has developed community resources such as data repositories, information catalogs, and data manipulation services. Examples include a precipitation database created by the Consortium of Universities for the Advancement of Hydrologic Science, Inc., and climate data records stored at the National Climatic Data Center. Each community resource is accessed through a different web service, manages different types of data formats, and uses different vocabularies for describing the data. This EAGER award explores the interoperability mechanisms that link the resources to collaboration environments. The Intellectual Merit of the work was 1) identifying a loosely coupled federation architecture that enables integration of community resources with collaboration environments, 2) demonstrating interoperability mechanisms that encapsulate the knowledge needed for data access or data processing, 3) identifying classes of interoperability mechanisms appropriate for Earth Sciences, 4) demonstrating how to chain the interoperability mechanisms into a workflow, and 5) demonstrating the ability to support reproducible data-driven research through the registration and sharing of workflows. Multiple demonstrations of interoperability mechanisms were developed for interacting with community resources by linking them to a collaboration environment. The community resources included workflow systems, data repositories, information catalogs, and analysis services. The work done under this award is already having a Broader Impact: the approach developed within this grant has been applied to address interoperability challenges in other NSF data management projects. The use of collaboration environments as the unifying middleware is also being used by other disciplines, including plant biology, astronomy, astrophysics, high energy physics, genomics, neuroinformatics, and cognitive science. The Outcomes of this EAGER award include: Identification of the use of collaboration environments as middleware that enables interoperability. The collaboration environment serves as a unifying infrastructure that links community resource to compute resources. The collaboration environment enables sharing of data and workflows, and the re-execution of workflows. Identification of mechanisms to enable reproducible data-driven research. This includes the automation of data retrieval from community resources, the automated transformation of the data to formats required for research analyses, and the archiving of the analysis workflow, the input files, and the output files. Workflows can be executed at the storage system, within the researcher’s computing environment, or within the XSEDE compute grid. Demonstration of interoperability mechanisms at the EarthCube Charrettes. The demonstrations used the data cyberinfrastructure provided by the NSF DataNet Federation Consortium (DFC). The mechanisms that were implemented included re-usable functions (micro-services) that encapsulate the knowledge needed to interact with a community resource and retrieve data, and micro-services that encapsulate interactions with remote workflow systems.