Understanding how individual scientists interact with one another and how such interaction impacts research productivity and knowledge diffusion is important for understanding the dynamics of scientific research collaboration. At the same time, information about patterns of collaboration and their consequences have implications for science policy. In quantitative research on collaboration networks, publication co-authorships and citation-linkages have been the primary source of data. As large data repositories, one of the signposts for cyberinfrastructure-enabled, data-driven science, become increasingly prevalent, however, they offer an alternative source of information about networks of scientific collaboration. This project investigates research collaboration networks emerging around one such international data repository, GenBank, and develops data products to support data-driven science policymaking and research. By utilizing this novel data source the project provides an unprecedented opportunity to validate and expand the theory of complex networks while generating rich data outputs and products to support science policy research and policymaking. This study fills a number of theoretical and methodological gaps identified by the 2008 roadmap for Science of Science Policy (SoSP), with a specific focus on how scientific collaboration networks form and evolve. The outcomes of this study address the lack of models and tools for network analysis, visual analytics, and science mapping outlined in the 2008 roadmap for SoSP. To accomplish the data collection and processing required for this project new computational programs will be developed to parse, extract, store, transform, split, merge, and filter the data; these will be applicable to the analysis of other similar data sources for science policy and innovation research.

Broader impacts. By making available dataset product prototypes the project will allow researchers, policy makers, and students to explore research networks in GenBank from longitudinal, thematic, geographical, institutional, and author dimensions. The multi-dimensional, interactive presentations of such datasets enable data-intensive science policy research and support science policymaking through filtering, sorting, associating, and visualization capabilities. The datasets and data products will be made available through an open access mechanism, so educators and undergraduate and graduate students have ample opportunities to use these resources for teaching and research. Students enrolled in Syracuse University's newly established Certificate for Advanced Study in Data Science (CAS DS) program will be able to participate in the project and gain skills in programming for data collection and processing, data quality verification, analysis, and visualization. In addition, the collaboration network analysis provides interested doctoral students an opportunity to do independent study or dissertation research. Findings from studying cyberinfrastructure-supported data sharing and knowledge diffusion is expected to advance policymakers' ability to properly assess the outcomes of federally funded research.

Agency
National Science Foundation (NSF)
Institute
SBE Office of Multidisciplinary Activities (SMA)
Type
Standard Grant (Standard)
Application #
1262535
Program Officer
maryann feldman
Project Start
Project End
Budget Start
2013-08-01
Budget End
2016-07-31
Support Year
Fiscal Year
2012
Total Cost
$306,566
Indirect Cost
Name
Department
Type
DUNS #
City
State
Country
Zip Code