The NSF Software Infrastructure for Sustained Innovation (SI2) program solicitation states that software is "central to NSF's vision of a Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21)," and goes on to emphasize that in general software is essential to computational and data-enabled science. Indeed, the SI2 program is one vehicle by which the NSF hopes to enable sustained and well supported software providing services and functionality needed by the US science and engineering community. This yearlong study of cyberinfrastructure projects will identify best practices in the development, deployment, and support of robust cyberinfrastructure software.

Through a combination of detailed case studies and surveys of software producers and users, the investigators will identify best practices for the process of moving software from a "discovery" process to well maintained and sustainable infrastructure for 21st century science and engineering, focusing in particular on the following: Given a piece of software that provides interesting capabilities and a community that wants to use (and possibly contribute to the further development of) that software, what steps are necessary to transform that software from "interesting tool" to "robust and widely used element of national infrastructure, contributing to the NSF vision for CIF21" - ands then support and maintain that tool sustainably? This research will lead to greater availability of widely usable software tools and curriculum materials, increasing the quality of education in computer science, computational science, and STEM disciplines.

Project Report

Scientific researchers in the US depend on many kinds of very specialized software – so specialized that the size of the user market makes it impossible to support commercial software packages in most areas. (Statistical, mathematical, and engineering software are the exceptional areas where there are several commercial software packages operated by private companies). Much software used by scientific researchers and engineers is produced by the scientific community, and released to the community with a license that makes it free to use and modify. The idea behind such ‘open source’ software is that it enables the scientific community to have software that it needs even if it is not possible to sustain a private company producing and supporting such software. The National Science Foundation (NSF) supports the creation of many open source scientific software packages. The purpose of this particular study was to determine what factors lead to a particular open source software package being sustained over time, so that the scientific community and the NSF can identify and adopt strategies that will allow such software to be sustained and usable over a long period of time. This should enable the community and the federal government to reduce the amount of time and money invested in reinventing new versions of software functions that already exist, and to improve software that has already been made and is being used now. The first step in our research was to survey researchers funded by the NSF, to understand the factors that are important to them when selecting software to use in their research. A random sample of 5,000 individuals, drawn from a list of NSF researchers who received funding between 2007 and 2011 were invited to take part in a survey about software. The factors most important to researchers were: 1) Capabilities and features of a software product are the most important factors to consider when adopting a software package 2) Total cost of ownership (purchase and annual license maintenance fees) 3) Long-term availability 4) Reliability and maturity 5) Initial purchase cost of the software We also asked researchers to identify the software they used most often and depended upon the most. Based on this list, and input from many leaders in the scientific computing community, we decided to do case studies of a number of successful software projects that had been maintained over several years. We investigated a number of factors related to software creation, maintenance, and how software projects were organized, looking particularly at software products that had managed to sustain themselves over a long period of time. What we discovered was that all of the software projects that were successful over a long period of time employed good software engineering practices: such software projects had definitive software repositories; where the definitive copies of software code was kept; software was well documented; there were good practices for ensuring that the code itself was re-usable, using criteria established by government agencies for code reusability. In our study of software projects that had been well maintained and sustained over time, what really stood out was that successful projects – ones that had lasted over a number of years and were important to the scientific community – had leadership that was deeply committed to continuing the development and sustaining the software. This factor was the one that seems to have distinguished many successful projects from those that were not successful over a long period of time. This finding echoes the standard wisdom one hears from venture capitalists when they are asked about investing in a company. Venture capitalists will often say that the fate of a startup company in the long run depends more on the quality of the Chief Executive Officer than the quality of the company’s product. This is a fairly similar finding: the details of software license terms, the way software is tested, and the way it is distributed and supported stands out less as a factor that distinguishes successful and sustained software than the level of commitment of the leadership of the software project. This has implications for the scientific community and potentially for funding agencies. For funding agencies, this study suggests that the track record and degree of commitment of the leaders of a software project be one of the most important factors in making decisions about what software to fund. For members of the scientific community there are also very clear implications of this research: if a researcher wants a particular piece of software to be sustained and useable over a long period of time, then either the creator has to be dedicated to making this happen, or the creator has to find someone to lead a software project over the long haul who is committed to ensuring that the software is sustained and maintained.

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1147606
Program Officer
Daniel Katz
Project Start
Project End
Budget Start
2011-09-01
Budget End
2013-08-31
Support Year
Fiscal Year
2011
Total Cost
$296,637
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401