This EAGER award promotes a new paradigm in the development of an integrative and interoperable data and knowledge management system for the geosciences for a new NSF initiative called EarthCube. Led by a team of experts from across the fields of geoscience, cyberinfrastructure, engineering, and computer science this project focuses on using a collaborative, as opposed to competitive, method to create community consensus on the most appropriate means to realize interoperability of databases containing disparate data types that are widely distributed among many large and small data repositories. To achieve this goal, the project will create registries of interoperable infrastructure components for information-sharing of geoscience data; define user requirements for such a system; and carry out proof-of-concept, cross-domain interoperability workflows using federated catalogs, cross-linked vocabularies, common data services, and shared model semantics. One of the unique aspects of this project is that the approach employed utilizes an inclusive, community-driven, collaborative approach. It also engages individuals from most of the major NSF-funded geoscience data facilities, creating a much needed forum to exchange tools, services, and best practices allowing for a better return on research money investment. Broader impacts of the project include a focus on engaging students and young researchers in cross-domain data-intensive research, the likelihood that impacts of the project will crossover to other science domains, and the fact that some core project members were from under-represented groups in science and engineering. An additional broader impact is an international component that engages UK scientists as members of the core project team.
EarthCube is a new initiative to create a community-driven cyberinfrastructure for the geosciences. It is expected to enable unprecedented data sharing across geoscience disciplines, by leveraging and integrating many information systems already established in different domains, supporting collaborative research teams, and training geoscience researchers and students in advanced information technologies. The key result of this project is the development of an initial EarthCube Roadmap from the perspective of enabling cross-domain interoperability in the geosciences. The Roadmap defined challenges of cross-domain information sharing and re-use, specified requirements for enabling technologies and organizational arrangements, provided a status assessment (including interoperability readiness assessment) of available approaches and infrastructure components, outlined processes and development timelines, and described potential governance mechanisms and risk mitigation strategies. The Roadmap is available at the EarthCube web site at http://workspace.earthcube.org/content/earthcube-cross-domain-interoperability-roadmap. Project results have been presented at a number of professional meetings, including several international conferences and workshops. The Roadmap was also submitted as a discussion document to the Open Geospatial Consortium, a leading standards development organization for spatial information and services. Development of the Roadmap was organized as a collaborative effort involving geoscientists and cyberinfrastructure researchers. Several innovative approaches were explored in the project and became Roadmap components, including: 1) A novel methodology for evaluating building blocks of domain infrastructure from the interoperability perspective. The interoperability readiness model conceptualized infrastructure readiness both with respect to existing cyberinfrastructure products/components, and with respect to processes aimed at better geoscience-wide interoperability. 2) Interoperability readiness assessment of existing systems. The readiness assessment derived from the analysis of the following four basic infrastructure components: metadata catalogs, which describe available information resources of the domain and support resource discovery and navigation; disciplinary vocabularies, which support unambiguous interpretation of domain data and metadata; services for accessing data repositories and other resources including models, visualizations and workflows; and formal information models, which define structure and semantics of the information returned on service requests. Results of the readiness assessment have been demonstrated in an online application that supported visual analysis and faceted search over an inventory of information system resources from multiple domains, and allowed users to contribute to the inventory. 3) A collection of environmental model registries, from several projects and federal agencies, assembled and jointly analyzed for the first time. The analysis and visualization of this collection, and exploration of data flows into the models from different disciplinary information systems, led to a development of a cross-domain connectivity map illustrating existing data linkages between geoscience disciplines. The model collections are available online as interactive applications supporting browsing and faceted search. 4) Fitness-for-use workflows have been explored and implemented using Kepler, a workflow management system. Assessment of fitness for use is a critical requirement for data re-use, but is hard due to the absence of systematically collected information about previous usage and associated successes and failures. Several proof-of-concept demonstrations explored how usage-based dataset annotations can be created by scientific workflows to better inform subsequent data search and understanding. 5) The project team has also developed an EarthCube research match-making application (http://connections.earthcube.org). This application can be used to browse and search registered EarthCube members based on their disciplinary domains, research interests and other characteristics, in order to find partners for collaborative projects. In additional to scientific and methodological results of the project, there have been notable education and outreach outcomes. The project involved a number of students in different education categories, from high school to undergraduates. The students were mentored by project members and introduced to challenges of information integration in the geosciences and to a range of potential solutions. The students also contributed to compilation and analysis of information resources, including environmental models and cyberinfrastructure components. The inventory applications, which are accessible and searchable online, are expected to contribute to earth science curricula and generally enhance opportunities for research and teaching in the geosciences. Project results were also included in a half-day training workshop presented as part of the EarthCube Summer Institute (ECSITE’2013). Interoperability across heterogeneous computer systems from multiple disciplinary domains is a key requirement of successful cyberinfrastructure development, which is likely to benefit society as a whole by enabling more efficient resource use and avoiding duplication of efforts. Key societal benefits will come from more efficient and scientifically sound re-use of different types of information collected in geoscience disciplines. As EarthCube is a pioneering research endeavor, methodologies and roadmaps developed in this initial phase of the program are likely to influence cyberinfrastructure development and data re-use in multiple other disciplines, facilitate collaborative and open cross-disciplinary science enterprise, and foster development of a new type of researchers who adopt data integration and analysis of information across domains as natural tools of the trade.