Large-scale science and engineering campaigns have typically considered data management from the inception of the project and funds for data management have been included in the projects' budgets. This proposal is aimed at data acquisition and curation strategies in support of single PI or small group research projects at academic institutions, data in the so-called "long tail". Long-term data management in these projects is much more problematic and particularly acute. Smaller research projects are often strapped for funds to conduct the research that generates the data; management of the data was in the past often an afterthought. With data management plans now being required by funding agencies, the issues must be considered as part of a proposal, but the funding available for date management is still frequently small and economical resources available to researchers still need to be cultivated. At academic institutions, the institutional repository (IR) has emerged as the means of harnessing technology to improve scholarly communication and it is the IR that offers the potential to address the data curation problems of smaller projects. Although institutional repositories have a broad intuitive appeal to all the stakeholders involved with science and engineering data management, they have met with very limited acceptance in practice. This proposal seeks to increase faculty contribution of their data to the IR by appealing to their needs directly and providing them with tools and support for developing personal repositories that can subsequently be federated into the IR. The strategy is to lower the barrier of entry to archiving facilities and to provide incentives for researchers to participate.

Broader impacts will be realized in two key areas. First, archive and preservation of datasets will be enhanced by increasing the participation of faculty and researchers generating data at the nation's research institutions. Second, open source software well be available for deployment by other institutions beyond the project's partners thereby increasing the effectiveness of dataset archiving and sharing across a growing set of participating institutions. The project will also offer research and training opportunities to undergraduate and graduate students involved as software developers and data consultants who interact with faculty and other researchers as part of the project.

Project Report

Data acquisition and archiving is a necessary step in the preservation of data for subsequent access and analysis by current researchers, as well as new generations of researchers. The NSF INSPIRE project and its OmniMea prototype (http://omnimea.org) were aimed at data acquisition and curation strategies in support of single PI or small group research projects at academic institutions. Long-term data management in these projects is problematic and particularly acute. Smaller research projects are often strapped for funds to conduct the research that generates the data; management of the data was in the past often an afterthought. The INSPIRE project considered ways in which a user-centric concept of personal repositories might be used to capture and organize intellectual output as a first step to placing selected material in an institutional repository for long-term curation. Early in the project it became obvious that before we could adequately address the capture and submission of research data to the institutional repository it would be necessary to address the issue of providing an appropriate user interface to personal research materials. This would be key to acquiring all research artifacts, not just research data. We chose the approach and organizational structure of individual CVs. This approach was motivated by our belief that the CV is a natural user interface to an academic's personal repository. The challenge is to get academics to use a personal repository-based CV in preference to their present choice, e.g., MS Word. Our approach was to provide extremely low barrier tools for importing information and sufficient value to the transition that one would elect to undertake a small amount of manual work for the subsequent benefits. We implemented both a straightforward single-item import capability and a bulk import capability. The power of our approach is that the imported CV is not treated as a monolithic entity. Each item imported into the OmniMea personal repository, be it a journal article, a data set, or a professional activity, is represented as a digital object. This representation allows a great deal of flexibility and allowed us to support a number of key benefits, including the potential to crowd-source metadata for institutional repositories. This is possible because only one author of a multi-authored paper need enter it into OmniMea, because the paper will accrete to the CV’s of all authors registered with the service. Over the course of the project, we developed two prototype demonstration systems to explore the issues involved and to provide a framework for gathering feedback from research communities. The first prototype led to one major realization and a common theme in user feedback. We found that flexibility of organization was required, even for a relatively structured format such as a CV. We realized that the information contained in a fully-populated personal repository would be sufficient to populate annual reports, shorter versions of a CV, biographical sketches, etc. Researchers routinely expressed interest in tools that would help them automate these tasks. During the second year of the project we redesigned the CV prototype to operate on a generic infrastructural framework that supports "collections of collections." This change gave us the ability to easily incorporate new user services that can effectively transform a CV into some useful derived product. Overall, we found that a successful system would require a robust service with low barrier to entry and significant value to motivate entry. We identified five major points related to lowering the barrier of entry. Provide a low-effort way for users to input information. For example, cut/paste entry of publications from a traditional CV. Provide for ingest of publication metadata from other resources, e.g., ACM DL and IEEE DL. To the maximum extent possible provide access to digital copies of publications and data by looking up and adding DOIs for published articles and providing persistent IDs for other content. Provide value-added features to allow users to use and re-use the information they have input, e.g. alternate display formats like annual reports and biographical sketches. Provide a low-effort way to users to disseminate, print or export the information they have previously input. The project also worked to broaden participation. While the overall technical development was guided by a professional software developer, much of the work on the OmniMea prototype was done by five undergraduate interns. Four of those interns have since graduated with degrees in computer science and the fifth is currently pursuing a computer science major. None of these young women had any previous experience in web development and related technologies. All were trained on the job and each made useful contributions to the overall development of the prototype and gained an additional useful skill set.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1152481
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2011-09-01
Budget End
2014-02-28
Support Year
Fiscal Year
2011
Total Cost
$360,000
Indirect Cost
Name
Corporation for National Research Initiatives (NRI)
Department
Type
DUNS #
City
Reston
State
VA
Country
United States
Zip Code
20191