Project Proposed: This RAPID project, collecting, processing, and disseminating appropriate sensor data, aims to contribute to an effective recovery. The work addresses the challenges of sensor data flood during an emergency, through integration, evaluation, and enhancement of current data management tools, particularly with respect to meta-data. Automation of data and meta-data collection, processing, and dissemination are expected to alleviate the time pressure on human operators. The fundamental tools support quality information dimensions such as provenance, timeliness, security, privacy, and confidentiality, enabling an appropriate interpretation of the sensor data in the long term. For the short term, the tools are expected to help relief the workers as data producers and consumers; for the long term, they will provide high quality information for disaster recovery decision support systems. Additionally, the cloud-based system architecture and implementation of the CERCS cluster of Open Cirrus provide high availability and ease of access for recovery efforts in Japan as well as for researchers worldwide. The integration of techniques from several information dimensions (e.g., data provenance, surety, and privacy) and the application of code generation techniques to automate the data and metadata management tools constitute the intellectual merit of the proposed research. New challenges will be encountered in the potential interferences among the quality of information dimensions. It is also a new challenge to apply code generation techniques in the adaptation of software tools to accommodate changes imposed by environmental damages and contextual as well as cultural differences among countries. The investigator collaborates with Prof. Masaru Kitsuregawa from the University of Tokyo, Japan, a leading researcher in data management. He is the first database researcher from Asia to win the ACM SOGMOD Innovation Award (2009). In addition to a letter of support and biographical sketches of the Japanese collaborator, a support letter has been submitted by Intel to OISE, CISE and Engineering. Intel has offered access to the Intel Open Cirrus cluster to conduct the research. Broader Impacts: The proposed tools should contribute to improve both the quantity and quality of data being collected by a variety of sensors, thus improving the effectiveness of short and long term decision making. For example, measured radiation levels in agricultural products can serve as an indication of spreading radioactive contaminations that complement the direct readings of radiation in soil samples. The project enables informed decisions based on precise interpretation of real sensor data that may improve the quality of life at both human and social levels, while reducing costs. The project will also contribute in graduate student education.
Large scale disasters such as the Sandy Superstorm of 2012 (more than 100 deaths and estimated economic damages of $50B) and Tohoku Earthquake of 2011 (more than 15,000 deaths and estimated economic damages of $235B) have tremendous social, economic, and human impact. The preparedness for, response to, and recovery from disasters are difficult tasks due to the unpredictable alterations of the environment by such disasters. In this project, we explored the improvement in the information technology (IT) infrastructure to support the use of big data, particularly detailed and large scale sensor data. By big data we mean the upcoming convergence of sensors, networks, and cloud computing, which will provide unprecedented new and high quality information for effective decision making in disaster management. In order to achieve the potential benefits of utilizing big data, we need to meet the primary challenges of big data, generically described as the 3 V’s (Volume, Velocity, and Variety). In this project, we focused on the Volume challenge (very large data set sizes) and the Variety challenge (availability of many data sources). On the Volume challenge, we worked on the efficient semantic query processing over large RDF data sets, queried through the SPARQL language. Currently, it is impractical to run SPARQL semantic queries over RDF data sets of more than a million entries. These are serious constraints for real applications such as disaster management. Concretely, we applied our approach to the SPEEDI radiation data set of radiation levels in Japan. On the Variety challenge, many kinds of radiation sensor data are collected for a variety of recovery efforts, including radiation measurements in the air, water, and food safety check. This variety of sensors introduces significant challenges in the interpretation of sensor readings. While the sensor readings have been reduced to a common unit (e.g., Bequerel and Sievert). Although approximate readings are acceptable during emergencies, for long term tracing and analysis it is important to know the details about each reading as precisely as possible. Thus all metadata (e.g., brand and model of the radiation meter and its calibration parameters) will become important for long term analysis for applications such as insurance liabilities. Intellectual Merit The intellectual merit of the research conducted in the project is in the exploration of big data challenges (Volume and Variety). Our first contribution is in the optimization of semantic (SPARQL) query processing for RDF data stores. Our approach, of separating data from metadata by storing them in two dedicated database management systems, has been shown to achieve much higher efficiency than conventional approaches of storing them in the same database management system. Our second contribution is in the integration of related, but different data sources without a global schema, using code generation techniques. Our approach is validated by a concrete demonstration of the potentially effective integration of several radiation data sets, including SPEEDI and Safecast. Ongoing research challenges include the integration of techniques from several quality of information dimensions (e.g., data provenance, security and privacy) and the application of code generation techniques to automate the data and metadata management tools. For example, it is a non-trivial problem to apply code generation techniques in the adaptation of software tools to accommodate changes imposed by environmental damages and contextual as well as cultural differences among societies and countries. Broader Impact. Effective utilization of big data for disaster management is one of the main new approaches to improve the entire process substantially. The big data sets such as SPEEDI and Safecast are already being collected. It is very important that effective methods are found and used to improve the use of these and other big data sets for next generation mission-critical disaster management applications and tools. Concretely, we partially integrated the SPEEDI data with Safecast data to produce a world radiation map, by converting and averaging sensor readings over each region. Our technical contributions will lead to the development of software tools capable of integrating a variety of big data sources, particularly radiation sensor reading data and metadata. In the long term, these techniques can be applied to the integration of other big data sets. Finally, this project led to a collaborative activity that transcended the technical aspects described above. The significant investments being made by the Japanese government on the Tohoku earthquake recovery are producing significant research and development results in the entire Japan. Our joint work with University of Tokyo resulted in a much broader collaboration effort in the area of applying big data to disaster management, in the form of SAVI (Science Across Virtual Institutes) for Global Research on Applying Information Technology to Disaster Management (GRAIT-DM), an effort funded by NSF/CISE/CNS. The SAVI activities can be seen in the web portal [http://grait-dm.org].