This award will support a 1.5 day workshop in Arlington, VA to bring together the community of SI2 awardees with the aims of: 1) serving as a forum for focused PI technical exchange, through an early evening poster session; 2) serving as a forum for discussion of topics of relevance to the PIs from topics emerging both from within NSF and from the broader community, by informing the attendees of emerging best practices, and stimulating thinking on new ways of achieving sustainability and of ensuring that the foundation laid by SI2 is preserved into the future; and 3) gathering experiences and a shared sense of best practice that results in a published workshop report.

The workshop will bring together researchers who are a proto-community of NSF open source software developers. The meeting will examine the characteristics of the community, and consider whether the products from the program can be enhanced by giving the community a new identify and new way of looking at itself. The meeting will also address citation, attribution, and reproducibility, which are three related topics often discussed in the context of data, but less so in the context of software. The attendees will consider practical steps that could be taken to advance software citation and science reproducibility. Finally, sustainability of software is a major topic for NSF and for the SI2 PIs. The meeting will highlight new ways of thinking about software sustainability, drawing on experts in the field and on recent SI2 EAGER funded projects that are studying the community to help the workshop attendees in their thinking about sustainability.

The community outputs of the workshop will be: posters developed by the SI2 PIs that will be shared amongst the attendees and shared more broadly on the workshop web site; an experiences report (licensed under a Creative Commons license) produced by the award PIs, distributed via the workshop web site, via email to participants who will be asked to disseminate among their project colleagues and peers, and via an archive repository through which it will be accessible through a persistent ID; and attendee journalism during the event in the form of a public Google doc and public Twitter stream.

Project Report

Software in Science: a Report of Outcomes of the 2014 SI2 PI Meeting The second annual NSF Software Infrastructure for Sustained Innovation (SI2) PI meeting took place in Arlington, VA February 24-25, 2014. It was hosted by Beth Plale, Indiana University; Douglas Thain, University of Notre Dame; and Matt Jones, National Center for Ecological Analysis and Synthesis. As stated in www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf, "software is fundamentally computer code. [...] Software must be reliable robust, and secure; able to produce trustable and reproducible scientific results; yet its architecture must be flexible enough to easily incorporate new scientific algorithms, new capabilities, and new opportunities provided by emerging technologies. Software also must be supported, maintained, developed and eventually replaced in part or in entirety, over its lifecycle." The workshop identified challenges around the role and use of software in scientific research, and suggested ways forward. The workshop focused on four major topics: i) Attribution and Citation: How do software developers who create software that is used in and advances scientific research get credit for their products, parts of which could advance understanding in computer science as well? How is the impact and intellectual contribution of software developers measured? ii) Reproducibiilty, reusabiity, and preservation: Software plays a central role in the long term repeatability and reproducibility of computationally-based science. But the mechanisms and best practices for handling the long term availability and usability of software that contributed, for instance to an important Nature article 20 years ago, are still in the early stages of development. iii) Sustainability: The long term availability of software that is still serving a useful purpose in scientific and scholarly research is an ongoing issue largely because much of the software developed for science use is grant funded. Additionally, software that is developed by domain experts who are able to write software but have not had formal training in software engineering can result in software that is hard to understand and maintain. The workshop report presents several options for long term sustainability of research software. Finally, the workshop considered the career trajectories of research programmers in academic and institutional research settings. The research programmer is a software developer works in an academic or lab setting and develops software that is used in support of science and scholarship. The academic or lab setting may be oriented to software innovation, but is more likely oriented to non-IT research. The path that an individual takes to find themselves in the research programmer role is varied: 1) He or she may come out of a science discipline and have taken an interest in technology so acquired software development skills. These people have strong discipline knowledge, but do work that is of a software nature. 2) The person may come out of an informatics background, and have been trained in both discipline and computer science skills (e.g., the "bioinformatics" person). 3) The person may have come out of a computer science background, and have acquired enough expertise in one or a small number of science disciplines to be effective. Regardless of how they got there, the data science research programmer is characterized by being one in an academic or lab setting where he/she architects, develops software and tools in support of science and scholarship. The career of the research programmer is frequently not stable over the long term. Labs are grant funded, this person often does not hold a tenure track position, and may not even be a research faculty member (which may include a small commitment to providing bridge support should grant funding hit a dry spell.) Coupled with this, the incentives for this career path are not well structured because publications on the science or scholarship produced by the research group are focused on the primary result, and fail to acknowledge the innovation in the software (which may have research value in and of itself.) Yet the role of software in science is becoming increasingly more important, particularly as data sources grow. The workshop report suggests options to improve the career path including Communities of Practice. A copy of the workshop report is available for download at http://hdl.handle.net/2022/19760

Agency
National Science Foundation (NSF)
Institute
Division of Advanced CyberInfrastructure (ACI)
Type
Standard Grant (Standard)
Application #
1419131
Program Officer
Daniel Katz
Project Start
Project End
Budget Start
2014-01-15
Budget End
2014-12-31
Support Year
Fiscal Year
2014
Total Cost
$66,224
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401