A science gateway is a community-developed set of tools, applications, and data collections that are integrated via a portal or a suite of applications. It provides easy, typically browser-based, access to supercomputers, software tools, and data repositories to allow researchers to focus on their scientific goals and less on the cyberinfrastructure. These gateways are fostering collaboration and exchange of ideas among thousands of researchers from multiple communities ranging from atmospheric science, astrophysics, chemistry, biophysics, biochemistry, earthquake engineering, geophysics, to neuroscience, and biology. However due to limited development and administrative personnel resources, science gateways often leverage only a small subset of the NSF-funded CI to mitigate the complexities involved with using multiple resource and services at scale in part due to software and hardware failures. Since many successful science gateways have had unprecedented growth in their user base and ever increasing datasets, increasing their usage of CI resources without introducing additional complexity would help them meet this demand.
In response to this need, an Automated Monitoring AnalySis Service (AMASS) will be built to provide a flexible and extensible service for automated analysis of monitoring data initially focused on science gateways. AMASS will be based on data mining and machine learning techniques and emerging big data technologies to analyze monitoring data for improving the reliability and operational efficiency of CI as well as progress on fundamental questions in systematic and population biology, computational neuroscience, and biophysics communities. Along with AMASS, a simulation framework will be built for testing automated analysis algorithms and adaptive execution techniques. An intuitive query API will be provided for science gateway software to use and will be integrated into the following three target science gateways that will drive the project's research and development: the Cyberinfrastructure for Phylogenetic Research (CIPRES), the Neuroscience Gateway (NSG), and UltraScan. The proposed approach does not require any changes to the end user applications, and the software developments will significantly enhance the science productivity and user satisfaction of science gateways by integrating monitoring data into their infrastructure to enable adaptive execution of their applications, allowing scientists to answer more sophisticated questions without having to understand the complexities of a large-scale distributed environment. The developed software products will be available as open source products under an Apache License and will be integrated into the NSF-funded SciGap project in order to impact a broader range of science gateways.