Technical Description: This CAREER Award project will develop middleware to support Distributed Dynamic Data-intensive (D3) science on Distributed Cyberinfrastructure (DCI). Existing NSF-funded CI systems, such as the Extreme Science and Engineering Discovery Environment (XSEDE) and the Open Science Grid (OSG), use distributed computing to substantially increase the computational power available to research scientists around the globe; however, such distributed systems face limitations in their ability to handle the large data-volumes being generated by today?s scientific instruments and simulations. To address this challenge, the PI will develop and deploying extensible abstractions that will facilitate the integration of high-performance computing and large-scale data sets. Building on previous work on pilot-jobs, these new abstractions will implement the analogous concept of ?pilot-data? and the linking principle of ?affinity.? The result will be a unified conceptual framework for improving the matching of data and computing resources and for facilitating dynamic workflow placement and scheduling. This research has the potential to significantly advance multiple areas of science and engineering, by generating production-grade middleware for accomplishing scalable big-data science on a range of DCI systems.
Broader Importance: Increasingly, the high-performance computing resources available to scientific researchers are distributed across multiple machines in multiple locations. The integration of these resources requires a fabric of ?middleware,? upon which a wide variety of user applications, tools and services can be built and run. As more accurate and ubiquitous scientific instruments and models produce ever-larger volumes of data, however, this distributed cyberinfrastructure (DCI) is confronting unprecedented data-handling challenges that exceed the capabilities of existing DCI middleware. In this project, the PI will develop, test and implement new middleware solutions, specifically designed for the coming era of big-data distributed supercomputing.
The project will also develop new curricula and new teaching and outreach materials for introducing secondary and college students, secondary school teachers, and the general public to the emerging field of distributed data-intensive science. In partnership with FutureGrid, the PI will design simple and effective vehicles for sharing these resources with Historically Black Colleges and Universities (HBCUs) and other institutions where faculty might otherwise have relatively limited opportunity to develop advanced course materials. The PI will also partner with the Douglass Program for Women in Science, and The Academy at Rutgers for Girls in Engineering and Technology (TARGET), to increase engagement of, and support for, female students in the DCI research community.