Evaluating the return on investment in science involves accurately associating research inputs (e.g., grants and contracts) with research outputs (e.g., publications and patents). Because many of today's significant discoveries are produced by large multi-institution, cross-disciplinary teams -- often supported by different sponsors, with potential impact spread across different fields -- accurately linking outputs to inputs can be challenging. This research investigates whether and to what extent an online profiling system that aggregates research data around the individual researcher facilitates the processes of linking research inputs to outputs and provides benefits to scientists, institutions, publishers, and agencies. It compares the value of using different data sources, such as federal systems, institutional repositories, commercial databases, manual data entry by librarians and administrators, and data entry by the scientists themselves, to populate a prototype website that profiles computer scientists from multiple institutions.
Intellectual Merit: The research breaks new ground in a variety of ways. It determines the cost and effort required to obtain each data source, including the resources needed to disambiguate names in order to link data on scientific contributions to the correct people responsible for them; it calculates the potential reduction in administrative burden each data source provides by measuring the amount of time it takes for a scientist, without the help of a researcher profiling system, to manually locate the data and enter it into an online form; and it evaluates which data sources contain the most information about high-impact cross-institutional or multi-disciplinary research, using data mining techniques to generate collaboration and topic-cluster maps.
This study is appropriate for the EAGER program because it is both high-risk -- in that the outcome depends on the coordination of multiple software products, institutions, agencies and data sources in a rapid timeframe -- and high-reward, in that a successful prototype would accelerate the implementation of a national researcher profiling system that benefits multiple stakeholders.
The field of Computer Science exemplifies all the challenges and potential rewards of a nation-wide researcher profiling system. Computer scientists are funded by many different agencies (NSF, NIH, DOE, DOD, NASA, etc.); their research outputs take many forms (publications, conference presentations, software, databases, algorithms, patents, etc.); and they collaborate across many disciplines (such as medicine, economics, engineering, physics, and social science).
Broader impacts: A publicly-accessible national researcher-profiling system based on linked open data promises numerous benefits beyond enabling more accurate measures of return on federal investment in science. These benefits include the potential to streamline the grant application and reporting process for researchers, identify reviewers without conflicts of interest, help researchers find collaborators, match trainees and junior investigators with mentors and jobs, and enable scientists to showcase their work.
," was to determine whether a prototype website that profiles scientists from multiple institutions facilitates the process of linking research inputs to outputs. We collaborated with several synergistic projects: 1) Profiles Research Networking Software (RNS) (http://profiles.catalyst.harvard.edu) is the open source faculty profiling system used for the prototype website. 2) Harvard Faculty Finder is an instance of Profiles RNS that was recently launched by Harvard's Provost Office to profile faculty from the entire university. 3) ORCID is a non-profit company that creates a unique identifier for every researcher and links this to their publications and activities. 4) "Pathfinder: A Systems Approach to Advancing Workforce Inclusion and Diversity" is an NIH-funded project at Harvard, which collected data about faculty to understand the factors that influence the careers of women and underrepresented minorities in biomedicine. 5) The NIH Clinical and Translational Science Award (CTSA) consortium's Research Networking Affinity Group coordinated the development of the Direct2Experts federated expertise discovery tool. 6) SciENcv (Science Experts Network Curriculum Vitae) is a website developed by the NIH to create online profiles of federally funded investigators. In developing a prototype website, we found that there are trade-offs in different architectures. The first model copies data from universities' websites and stores them in a central repository. This approach enables the most functionality—complete faculty profiles can be displayed, along with network visualizations, and other advanced features. However, a technology used by many university faculty profiling systems limits how frequently the centrally stored data can be updated. Some institutions also have security and privacy concerns about having all their data copied. As a result, prototypes based on this architecture generally have only a handful of participating institutions. Federated prototypes that query multiple institutions in real-time and lack a central repository are simpler to implement and require less resources from the participating institutions. They also address institutions' security and privacy concerns. As a result, prototypes based on this architecture have included dozens of institutions. However, because their functionality is so limited, they are unable to meet many of the typical use cases for faculty profiling systems. An additional unanticipated outcome of this study was the development of a visual and quantitative method to identify translational research--science that begins in one field that has impact on another. This work was presented at conferences, published in the Journal of Translational Medicine, and provided preliminary data for another NSF SciSIP award to develop this technique further and use it to measure the impact of cross-disciplinary collaboration. Faculty profiling systems serve many functions, including helping researchers find collaborators, matching students with mentors, giving institutions a way to showcase their faculty's expertise, and reducing administrative burden (e.g., generating biosketches). They also provide essential data needed to link investments in science to the products of research. The public often finds it difficult to understand how they benefit from individual research projects, publications, or patents. However, by bringing these data together in websites like Profiles RNS, users can see how research inputs and outputs do not live in isolation, but rather as parts of large interconnected networks, which as a whole advance science and make an impact on society.