The seafloor serves and the primary conduit for mass and heat transfer between the sub-seafloor and the overlying ocean water column, with both operating on vastly different time and mass scales. Dynamics at this interface drive global biogeochemical and geochemical elemental cycles, control global ocean chemistry, and shape the atmosphere and climate systems. Deep sea ecosystems also host some of the most diverse and extreme ecosystems including those inhabiting hydrothermal vents, cold seeps, mid-ocean ridges, ridge flanks, and plate margins. Without data from a wide variety of disciplines, such as geology, petrology, geophysics, hydrogeology, and micro/macro/evolutionary biology, it is not possible to realistically model these important systems and understand the complex interactions between their various physical, chemical, and biological components. Instrumental to understanding processes and dynamics of the deep sea requires integration of the disparate datasets and models that represent and predict the behavior of the components of this complex, important system. The goal of this workshop is to surface requirements in the field of deep sea processes for a major new NSF data and knowledge management initiative (i.e., EarthCube) that is dedicated to revolutionizing geoscience by providing easy access to, discovery of, and visualization of data from across the geo- and environmental sciences. This workshop will bring together ~55 oceanographers from across the relevant disciplines. It will also include cyber/computer science experts. Together workshop participants will collectively define future science goals in this important scientific area and focus on identifying the most critical, widespread cyberinfrastructure and data management issues and problems presently holding back scientific advances in deep sea science in order to guide the development of NSF EarthCube cyberinfrastructure. The workshop will also focus on strategies that help scientists and data that they need to cross sub-discipline barriers to enable more interdisciplinary research to take place. Workshop participants will address topics such as science drivers in deep sea process research in the next 15 years, data and data management needs and problems, and software and visualization needs to help model and understand data. Broader impacts of the work include support of an organization in an EPSCoR state, support of two PIs whose gender is under-represented in the sciences and engineering, and engagement of early career scientists. A virtual component of the workshop will be held to help broaden participation beyond those present on-site.
This award supported a workshop focused on the data management and cyberinfrastructure needs of scientists who work in the deep sea environment. A total of 61 scientists from all career stages participated in the workshop either in person or through the online broadcast of the meeting. As a group we identified (1) several scientific drivers that we anticipate will be guiding our work over the next decade, (2) the current challenges we face that impact our ability to do high-impact interdisciplinary science, and (3) next steps we can pursue as a community to advance our ability to create and make use of high-quality open access digital data. Challenges/Opportunities we identified include: Data integration challenges - spatial and temporal co-registration of high-quality data sets is critical to enabling data integration, but is difficult to achieve in the deep sea environment. This is further complicated by different data quality needs for different kinds of questions. Data acquisition and completeness - Data in the deep sea are sparse in both space and time, making it difficult to find necessary data to perform analyses. Ensuring that data are available and fit for re-use requires training and awareness coupled with a easy tools that can be integrated into scientific workflows. Once data are online, there are endless possibilites for the development of new and exciting tools that will enable exploration and result in new discoveries. We recognize that several tools and databases already exist to support our community, and we need better training on how to use them. We also recognize the need for training and best practices on how to incorporate data management best practices into our scientific workflows. We identified several cost-effective rapid solutions to overcoming many of our data management obstacles, but are unsure of funding mechanisms despite the recognized impact these solutions would have on our community. Finally, we recognized the growing need for a "data wrangler" to participate in field programs to handling data and metadata, ensure that standards are met, and facilitate contemporaneous data documentation. The role of the data scientist who sits at the intersection of domain science and geoinformatics is rising, but resources are necessary to ensure good data management practices.