This project develops a real-time processing system capable of handling a large mix of sensor observations. The focus of this system is automation of the detection of natural hazard events using machine learning, as the events are occurring. A four-organization collaboration (UNAVCO, University of Colorado, University of Oregon, and Rutgers University) develops a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research. This work will support rapid analysis and understanding of data associated with hazardous events (earthquakes, volcanic eruptions, tsunamis).
This project uses a collaboration between computer scientists and geoscientists to develop a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research. It focuses on the aggregation and integration of a large number of data streams into a coherent system that supports analysis of the data streams in real-time. The framework will offer machine-learning-based tools designed to detect signals of events, such as earthquakes and tsunamis, that might only be detectable when looking at a broad selection of observational inputs. The architecture sets up a fast data pipeline by combining a group of open source components that make big data applications viable and easier to develop. Data sources for the project draw primarily upon the 1500+ sensors from the EarthScope networks currently managed by UNAVCO and the Incorporated Research Institutions for Seismology (IRIS), as well as the Ocean Observatories Initiative (OOI) cabled array data managed by Rutgers University. Machine learning (ML) algorithms will be researched and applied to the tsunami and earthquake use cases. Initially, the project plans to employ an advanced convolutional neural network method in a multi-data environment. The method has only been applied to seismic waveforms, so the project will explore extending the method to a multi-data environment. The approach is expected to be extensible beyond detection and characterization of earthquakes to include the onset of other geophysical signals such as slow-slip events or magmatic intrusion, expanding the potential for new scientific discoveries. The framework is applied to use cases in the Cascadia subduction zone and Yellowstone: these locations combine the expertise of the science team with locations where EarthScope and OOI have the greatest concentration of instruments. The architecture will be transportable and scalable, running in a Docker environment on laptops, local clusters and the cloud. Integral to the project will be development, documentation and training using collaborative online resources such as GitLab and Jupyter Notebooks, and utilizing NSF XSEDE resources to make larger datasets and computational resources more widely available.
This award by the NSF Office of Advanced Cyberinfrastructure is jointly supported by the Cross-Cutting Program and Division of Earth Sciences within the NSF Directorate for Geosciences, the Big Data Science and Engineering Program within the Directorate for Computer and Information Science and Engineering, and the EarthCube Program jointly sponsored by the NSF Directorate for Geosciences and the Office of Advanced Cyberinfrastructure.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.