This study of software infrastructure for the NSF community includes analysis and inventory the state of software technologies and their use in science and engineering environments dependent on cyberinfrastructure. Results include a taxonomy of software funded by NSF to support NSF science and engineering research and education, recommended questions for NSF to consider in the analysis of impact by software investments, the identification of a target set within the NSF community to engage in a future possible survey intended to inform NSF on use of software infrastructure, and culminating in a presentation to NSF's Advisory Committee on Cyberinfrastructure (ACCI). The study addresses the need for a more accurate understanding of the state of comprehensive software infrastructure in current NSF community settings from a combined qualitative and quantitative perspective - the project attempts to provide new insight and awareness of the software infrastructure.
The purpose of this project was to study the current utilization of the national NSF funded software environment and develop a software taxonomy framework that identifies NSF funded projects which involve software development and categorize these activities based upon the type of software development conducted. In addition, a series of case studies were conducted to identify common sustainability characteristics associated with software utilized by the various science and engineering communities and perceived as being successfully sustained and an integral part of these communities. Based upon the case study interactions and subsequent analysis of the results, a set of areas for further future study, consideration, and analysis was developed related to how the software environments are adding value, benefit, and other such insights that will provide data and information that NSF may utilize for future funding decisions on software development investments. Three major activities were accomplished during the duration of this project. A taxonomy was developed that categorizes NSF funded projects by the type of software development activity. Utilizing the publicly available NSF awards database, awarded projects were extracted from January 1, 2000 through June 30, 2012, resulting in the extraction of 141,510 records. Also, a series of case studies were identified through a series of interaction with the science and engineering community as well as the projectâ€™s advisory group. The case studies enabled the identification of a common set of sustainability characteristics that are found in software that is perceived to be successfully sustained and utilized throughout the research community. The case studies selected had a diverse set of characteristics related to size of user base, level of user base growth, disciplines supported, distribution model, and funding level characteristics. Lastly, based upon case studies conducted, a recommended set of topics for future consideration by NSF was developed. These considerations were obtained via examining common themes found when conducted the case studies. Major results of this project include a taxonomy that was developed utilizing NSFâ€™s public awards database and determining which NSF award projects from January 1, 2000 through June 30, 2012 included software development activity. This searchable taxonomy enables an initial understanding of NSF investment by category as well as the timing of investment, where investments were made, the discipline area involved, PI and organizational elements, project abstract, and other aspects found in the NSF awards database. Further, the purpose of the case study efforts related to this project was to provide insights as to the contributing practices and characteristics present in software products that are perceived to be successful, sustained, and utilized by the research community. This project found that key elements of sustainability contributing factors involved engineering and design practices, community formation and associated social infrastructure, a broad development base, an underlying governance paradigm, a viable economic model, capabilities that are fluid and address user needs, ease of use and local integration, and education, training, and support. Additionally, elements of future study for NSF to consider were sustainability incentives, development of common success best practices, addressing data gaps in future collections, the notion of scientists as programmers, learning from failures, investigating and modeling the timing effect of NSF investments in projects, software distribution and licensing models, and governance and user engagement models.