The National Center for Science and Engineering Statistics (NCSES) at the National Science Foundation collects high-quality data on the scientific workforce, but these data are not as widely utilized by the larger Science of Science and Innovation Policy (SciSIP) community because of changes in survey instruments over time and the data format restrictions. The NSF only supports SAS for NCSES data whereas the majority of researchers use STATA. This proposed research will break down the barriers to using SESTAT and SDR data for the SciSIP community that in the long-run, could yield new insights about science policy. This project will create data infrastructure and enhancements for NCSES data that will be posted to the National Bureau of Economic Research (NBER) website for use by the broader research community. The proposed data tools and data sets are public goods for the Science of Science and Innovation Policy research community. In the process of developing these tools one graduate student will be trained.
This project creates the 1993 ? 2013 Harmonized Survey of Doctorate Recipients (SDR) and Harmonized Science and Engineering Data System (SESTAT) data. In both the SDR and SESTAT variable definitions have changed, major fields have been added, and answers to questions have also changed. This proposal creates SAS and STATA code for use with the restricted-use and public-use versions of the SDR and SESTAT micro data that harmonizes variable definitions based on the 2013 variable definitions where possible. This source code will be accompanied by a working paper and posted to the NBER website. Second, it will create a data set of patents assigned to US campuses. The United States Patent and Trademark Office has matched patents to university assignees. While some of the university assignees are single campuses, several are large university systems (e.g. the Regents of the University of California). This project will use several sources to assign patents to individual campuses. In addition, the PI will identify whether these patents acknowledge federal research funding and the source of that funding. The patent data will be linked to campuses using IPEDS codes and can be merged onto the SDR or linked to other patent data. Third, it will create a data set of NSF and NIH funding ranks to be used as quality measures. Previous research has found that NIH funding rank was a better measure of institution quality than NRC ranking or Carnegie ranking. This project proposes to create data on the NIH and NSF funding rank of an institution by major research field and year matched to institutional IPEDS codes. These data can then be merged onto SDR data by IPEDS code in order to have improved measures of institutional quality.