Problems in managing, automatically discovering, and disseminating information are of critical importance to national defense, homeland security, and emergency preparedness and response. Much of this data originates from on-line sensors that act as streaming data sources, providing a continuous flow of information. As sensor sources proliferate, the flow of data becomes a deluge, and the extraction and delivery of important features in a timely and comprehensible manner becomes an ever increasingly difficult problem. More specifically, developing data mining and assimilation tools for data deluged applications faces three fundamental challenges. The amount of distributed real time streaming data is so large that even current extreme scale computing cannot effectively process it. Second, today's broadly deployable network protocols and web services do not provide the low latency and high bandwidth required by high volume real time data streams and distributed computing resources connected over networks with high bandwidth delay products. Finally, the vast majority of today's statistical and data mining algorithms assume that all the data is co-located and at rest in files. Here, the real time data streams are distributed and the applications that consume them must be optimized to process multiple high volume real time streams. The goal is to develop novel algorithms and hardware acceleration schemes to allow real-time statistical modeling and change detection on such large-scale streaming data sets. By using Service Oriented Architecture principles, a framework for integrating high -performance change detection software services, including accelerations of commonly used kernels in statistical modeling, into a Grid messaging substrate will be developed and tested. Geographical Information System (GIS) services will be supported using Open Geospatial Consortium standards to enable geo-referencing.
This project has the potential to have near-term and long-term impact in several important areas. In the near-term, the implementation of kernels and modules of statistical modeling and change detection algorithms will allow the end-user applications (e.g., homeland security, defense) to achieve one to two orders of magnitude improvement in performance for data driven decision support. In the longer term, the availability of toolkits and kernels for the change detection and data mining algorithms will facilitate the development of applications in many areas including defense, security, science and others. Furthermore, this research will bring the use of reconfigurable architectural acceleration of functions on streaming data including change detection and data mining, thereby opening new avenues of research and enabling newer data-driven applications on complex datasets. Both graduate and undergraduate students (through undergraduate fellowships) are engaged in the research. In addition, team members actively engage with minority serving institutions using audio/video and distance education tools.