The majority of NSF-funded research occurs in small and medium-sized laboratories (SMLs) that often comprise a single PI and a few students and postdocs. For these small teams, the growing importance of cyberinfrastructure and its applications in discovery and innovation is as much problem as opportunity. With limited resources and expertise, even simple data discovery, collection, analysis, management, and sharing tasks are difficult. An unfortunate consequence is that in this "long tail" of science, modern computational methods often are not exploited, much valuable data goes unshared, and too much time is consumed by routine tasks. To date, research investments in science cyberinfrastructure have disproportionately emphasized big science projects, providing tools for use by IT staff and technology savvy researchers rather than complete applications consumable by end users. This project?s goal is to lay a foundation for a more balanced research agenda by focusing exclusively on the needs of SMLs.

This Software Institute Conceptualization project aims to determine whether these obstacles to discovery and innovation can be overcome via the use of software as a service (SaaS) methods. Such methods have proven immensely effective for small and medium businesses due to their ability to deliver advanced capabilities while streamlining the user experience and achieving economies of scale. To determine whether similar benefits can apply for SMLs, the project team will engage with multiple science communities to identify science practices, match science practices against candidate SaaS offerings, and evaluate business models that could permit sustainable development of those offerings. The outcome of this process is intended to be a compelling and competitive strategic plan for an NSF Software Innovation and Sustainability Institute that both meets immediate needs of the initial science communities and provides a basis for a new, more cost-effective method of addressing cyberinfrastructure needs across all NSF directorates.

Project Report

Intellectual Merit In several key ways, the distribution of NSF grant funding follows a Pereto ("long tail") distribution. In 2011, for example, more than 10,000 awards were made by NSF, the largest at $50M. However, 80% of these awards were of average size $160k [1], which at most institutions is roughly the cost of one full-time employee. Our analysis of 2007 NSF grants showed that 80% of the awards were for amounts less than $350k, or just over two full-time employees. These grants are awarded to a broad diversity of institutions of wide geographic distribution. Research teams can spend too much time wrangling data: finding relevant data, transforming it to a usable format and processing it. Federal funding agencies now require funded projects are to share data and publications in as open a manner as is possible. However, the data being generated from a project may not be stored and distributed under the best data curation practices because there is not sufficient time, training or computational resources in the lab to efficiently accomplish this important work. One solution for this dilemma is to identify data management and processing bottlenecks in laboratories and then to develop more streamlined processes on the cloud. Such services free researchers from managing software packages and inherently can make data available to broader audiences. As part of a broader collaborative team at five universities across the country, the University of Arizona Team focused on understanding the cyberinfrastructure needs of the biodiversity community. Two main methods were used to collect data. First, an online survey of data practices and projected need was sent to members of the field biology research community. The resulting findings indicate that researchers need to manage both "big data" and smaller data sets simultaneously. For example, a small data set may need to be superimposed on an existing global map with existing data overlays. There are needs in all phases of research including data acquisition, processing and dissemination. We also held a multiday workshop with science users and administrators of biological research stations. Participants were selected based on recent publications using field station data distributed across NEON eco-climate domains. The group identified a set of ecological and environmental grand challenges where the solution is bound by the availability of data and computing resources. The results identify a number of climate and land use scenarios where the long-term historical data of research stations can be combined with sometimes "big data" remote sensing and sensor data to help provide predicative models at the local and global scale. The computing and data requirements go beyond the computational resources available to most researchers and the capacity of research stations. Cloud-based Software-as-a-Service (SaaS) can address these needs. Broader impacts While basic science questions can be addressed with this data and computing environment, the resulting data and understanding can directly support informed decision making for intelligent use of natural resources for the health and wellbeing of the general population. This information will allow us to design building and roads as well as plan agriculture to maximize outcomes.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1216884
Program Officer
Rajiv Ramnath
Project Start
Project End
Budget Start
2012-10-01
Budget End
2014-09-30
Support Year
Fiscal Year
2012
Total Cost
$49,819
Indirect Cost
Name
University of Arizona
Department
Type
DUNS #
City
Tucson
State
AZ
Country
United States
Zip Code
85719