The objective of this study is to create tools to allow for analysis of topic lifecycles across heterogeneous corpora. While the growth of large-scale datasets has enabled examination within scientific datasets, there is a lack of research that looks across datasets--examining how different scientific activities enable or propel scientific discovery. This project will examine the development of topics in four domains: history of science, social network analysis, cognitive science, and digital humanities. Examination of the lifecycle of topics in these domains should provide insights into how scholarship evolves across genres in the social sciences and humanities. Triangulation of the methods (word analysis, topic modeling, burst detection, and survival analysis) will be used to ensure the highest level of validity.

This project is innovative in its combination of datasets; this research will combine data from formal sources (dissertations, conference proceedings, journal articles, and grant proposals) and informal communication channels (listservs, blogs, and twitter) in order to provide a more holistic lens on scientific communication. For years, our knowledge of the scholarly landscape, and subsequently, our understanding of innovation, productivity, and impact, has been largely informed by data from a single source. However, the growth of diverse datasets that reflect unique areas of scholarly activity have altered the research landscape and provide an opportunity to create more accurate understandings of the nature of science. The results of this work will have implications for policy makers, as they seek to identify emergent areas of research. It will also provide an indicator of the importance of certain communication channels for identifying emerging areas of knowledge--identifying which scholarly activities are most indicative of emerging areas and, thereby, identifying datasets that should no longer be marginalized, but built into our understandings and measurements of scholarship.

One of the chief broader impacts of this proposal is the development of new researchers through the use of Indiana University's student emissary program, which will allow students to work alongside the PIs, travel to meetings, and have a retreat in which they will share their own research and engage in peer mentoring. Further, the project should help inform both scholars and policy makers about emergent areas of research and for ways of disseminating and sharing new knowledge.

This grant was made as part of the Digging Into Data Challenge, an international competition designed to foster research collaboration across countries and to encourage innovative approaches to analyzing large data sets in the social sciences and humanities. In addition to the US research team, this project includes researchers from Canada and the United Kingdom.

Project Report

The objective of this project was to create and examine large-scale heterogeneous datasets to increase our understanding of the scholarly communication system, to identify and analyze various scholarly activities for creating and disseminating new knowledge, and further develop the innovative computer software to collect, filter and analyze data from the web and social media to discover trends in science and in scholarly communication. For years, our knowledge of the scholarly landscape, and subsequently, our understanding of innovation, productivity, and impact, has been largely informed by homogeneous and often biased corpora. However, the growth of datasets that reflect unique areas of scholarly activities (both informal and formal activities) have altered the research landscape and provide us with the opportunity to create more accurate understandings of the nature of science and how science is communicated. Also, while the growth of large-scale datasets has enabled examination within scientific datasets, there is a lack of research that looks across datasets. By looking across various dataset to investigate trends and correlations this project aimed at identifying the impact and visibility of various scholarly activities, and to identify datasets that should no longer be marginalized, but built into our understandings and measurements of scholarship. The results from the project present an argument that transformations in the scholarly communication system affect not only how scholars interact, but also the very substance of these communications. For example, we demonstrated that journal articles and monographs are no longer the dominant forms of communication in certain disciplines. We demonstrated the relationship between various genres, such as handbooks, dissertations, and preprints. We reveal growing interdisciplinarity in the social sciences and humanities. We also examined new forms of dissemination of science, such as the extremely popular TED Talks initiative. We show that these have low academic value, but are highly popular and useful for pedagogy. However, we demonstrate the large bias in these towards male presenters and show how audiences react differently to female presenters. We also investigate new forms of impact and communication, such as Twitter and Mendeley. Our results suggest that disciplines vary in their use of Twitter. However, in a large-scale study of biomedicial literature, we find low correlation between twitter and citation, the traditional metric of scholarly quality. This suggests that twitter cannot used to replace citations, but may provide a different measure of impact. Our studies of Mendeley demonstrate that this is largely an academic community—therefore, metrics from this platform should not be used as a proxy for public receipt of science. The project created many refereed journal articles, conference articles, and other publications that build on each other and help to triangulate a broad picture of the current scholarly communication system and the role of social media metrics. Some of the scientific publications written by the team members created interest beyond the scientific community and were recognized in the popular press (e.g., U.S. News, Bloomberg Businessweek, the Vancouver Sun, the Boston Globe) as well as the academic press (e.g., Science, Nature, the Chronicle of Higher Education). The project also created international ties between three institutions across three countries, facilitating the establishment of collaborative relationships that will be easily sustained beyond the duration of the grant.

Agency
National Science Foundation (NSF)
Institute
SBE Office of Multidisciplinary Activities (SMA)
Type
Standard Grant (Standard)
Application #
1208804
Program Officer
Elizabeth Tran
Project Start
Project End
Budget Start
2012-02-01
Budget End
2014-01-31
Support Year
Fiscal Year
2012
Total Cost
$126,522
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401