This EAGER award creates an interoperability test bed to identify the components of an effective layered architecture for geoscience and environmental science research. In a layered architecture, every layer consists of different technologies, each of which uses different interaction protocols. The proposed project will examine a wide variety of existing technologies in terms of their effectiveness in working across present data silos. These technologies include data grids, workflow systems, policy management systems, web visualization services, and security protocols that work with various repository catalogs. Project goals are focused on developing cyberinfrastructure tools and approaches that allow geoscience data repositories to enable new science and more effectively make their data holdings discoverable and available to the public. Essential elements of the project include the collection and comparision of various approaches and existing tools to check effectiveness in handling and integrating geoscience data, and by automating processes needed to integrate various databases and data types. The project is led by a team of experts in cyberinfrastructure and geoscience data management and employs a spiral softwar3ee development approach. Broader impacts of the work include building infrastructure for science in order to facilitate data-enabled science in the geosciences. It will also produce results that are likely to be applicable to fields outside of the geosciences. The effort supports a larger NSF effort to establish a new paradigm in the development of an integrative and interoperable data and knowledge management system for the geosciences for a new NSF initiative called EarthCube.

Project Report

National data cyber-infrastructure in support of the Geo-Science Directorate can be implemented through use of data grid technology that integrates existing data management capabilities. The types of existing infrastructure include distributed sensors, data repositories, metadata catalogs, web services, workflow systems, community physics models, and knowledge management systems. Data grids provide virtualization mechanisms to manage interactions with the security, access, authentication, authorization, accounting, and auditing protocols that govern interactions with each system. Data grids also provide the mechanisms (soft links) for building a collaboration environment in which a researcher can register data from remote repositories, register workflows, and manage provenance information needed to re-create research results. A reasonable goal is the creation of cyber infrastructure that facilitates reproducible science. Intellectual Merit: The interoperability testbed identifies the components of a layered architecture that enable Geo-Science research. The layers can be differentiated into applications and clients, virtualization and integration mechanisms, facilities, and federation systems. Each layer is represented by multiple existing technologies, which may use different interaction protocols. The data grid virtualization layer enables the integration across the multiple protocols, and seamless access to the rich set of Geo-science technologies that exist today. The technologies considered for integration were production systems supported by collaborating team members. The technologies included data grids (GMU geospatial data grid, iRODS, DataONE OneDrive), workflow systems (iRODS, NCSA Cyberintegrator, UCSD Kepler, GMU BPELPower), policy management systems (iRODS, NCSA Cyberintegrator), web services (OGC Sensor Web Enablement standard (SWE), WHOI observation assessment (SWE), NCSA Semantic Geostreaming toolkit (SWE, W3C), GMU Geospatial client), web visualization services (GRASS, Colorado State University NextGen Network Enabled Weather), network and security protocols (iRODS - Grid Security Infrastructure, Kerberos, Shibboleth, Reliable Blast UDP, parallel TCP/IP), repositories (NOAA CLASS, NASA Echo, GEOSS ClearingHouse), catalogs (GMU GI-Cat, CUAHSI), community physics models (UC Boulder Community Surface Dynamics Modeling System, UNC RHESSys), and federation systems (iRODS, CLASS). Broader Impacts: A high impact focus was on the creation of infrastructure that helps a researcher automate the analysis tasks. This is a collaboration environment in which a researcher can automate data retrieval, register workflows, manage sensor and workflow provenance, link research data sets to analysis workflows, share data and workflows, and re-execute workflows with revised input parameters. The goal is to facilitate research by providing an environment that manages interactions with community resources, while enabling a researcher to manage the active research products. The collaboration environment is highly extensible, enabling research initiatives that span multiple disciplines. The environment is highly controlled, enabling a researcher to either share research with collaborators, or formally publish results, or preserve research in an archive. Through federation with similar systems used in federal agencies, the goal is seamless access to community resources for all Geo-Science researchers.

Agency
National Science Foundation (NSF)
Institute
Division of Earth Sciences (EAR)
Type
Standard Grant (Standard)
Application #
1239702
Program Officer
Barbara Ransom
Project Start
Project End
Budget Start
2012-04-01
Budget End
2013-03-31
Support Year
Fiscal Year
2012
Total Cost
$26,895
Indirect Cost
Name
Woods Hole Oceanographic Institution
Department
Type
DUNS #
City
Woods Hole
State
MA
Country
United States
Zip Code
02543