Core B: Data Acquisition and Construction Our projects'data requirements overlap extensively, and an Aim of this program project is to provide data to catalyze work on aging and innovation and person-based studies of innovation in the broader research community. The data necessary to study these problems are currently scattered across sources and formats and have not been linked, posing a formidable barrier to research. The Data Acquisition and Construction Core will develop, maintain, and distribute a number of integrated, large-scale datasets and tools that will provide infrastructure for the project and be provided freely in a user-friendly form and with support to the scholarly research community (including graduate students and researchers at non-profits and government agencies) in perpetuity. Generating this infrastructure centrally will ensure it is fully integrated, minimize duplication of effort;ensure quality and uniformity;take greatest advantage of the expertise of program participants;and establish a common set of methods for all users. The availability of this data infrastructure and established procedures will support a dynamic field studying aging and innovation and person-based studies of innovation. A central component of our work will be the construction of a large-scale, disambiguated, individual-level, longitudinal database on biomedical researchers comprising: (1) publications, (2) patents, (3) grants, (4) citations, (5) biographic data, (6) research institution characteristics and quality rankings and (7) journal quality. We will also develop: (1) a longitudinal dataset on research areas, including research effort, drug approvals, and health outcomes, which can stand alone and will also be combined with the individual-level dataset;(2) a set of data extraction and manipulation tools that will facilitate the use of these datasets;(3) estimates of the health and economic impacts of biomedical research;and (4) metrics to identify high-impact and transformative research. The project draws together a team with complementary skills that is uniquely suited to perform this work along with a sophisticated group of end-users who can refine the data, add complementary components, and maximize usability.

Public Health Relevance

The US is increasingly emphasizing innovation, but the aging of our scientific workforce is expected to reduce innovative output. This Core will develop the data infrastructure to support both our work and future work that will provide policy-relevant information about how the aging of our scientific workforce will affect our biomedical innovative output, the associated health and economic consequences, and policy responses.

National Institute of Health (NIH)
National Institute on Aging (NIA)
Research Program Projects (P01)
Project #
Application #
Study Section
Special Emphasis Panel (ZAG1-ZIJ-9)
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
National Bureau of Economic Research
United States
Zip Code
Shiffrin, Richard M; Börner, Katy; Stigler, Stephen M (2018) Scientific progress despite irreproducibility: A seeming paradox. Proc Natl Acad Sci U S A 115:2632-2639
Fortunato, Santo; Bergstrom, Carl T; Börner, Katy et al. (2018) Science of science. Science 359:
Börner, Katy; Simpson, Adam H; Bueckle, Andreas et al. (2018) Science map metaphors: a comparison of network versus hexmap-based visualizations. Scientometrics 114:409-426
Staudt, Joseph; Yu, Huifeng; Light, Robert P et al. (2018) High-impact and transformative science (HITS) metrics: Definition, exemplification, and comparison. PLoS One 13:e0200597
Azoulay, Pierre; Graff-Zivin, Joshua; Uzzi, Brian et al. (2018) Toward a more scientific science. Science 361:1194-1197
Marschke, Gerald; Nunez, Allison; Weinberg, Bruce A et al. (2018) Last Place? The Intersection of Ethnicity, Gender, and Race in Biomedical. AEA Pap Proc 108:222-227
Carpenter, Janet S; Laine, Tei; Harrison, Blake et al. (2017) Topical, geospatial, and temporal diffusion of the 2015 North American Menopause Society position statement on nonhormonal management of vasomotor symptoms. Menopause 24:1154-1159
Smalheiser, Neil R (2017) Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery. J Data Inf Sci 2:43-64
Kehoe, Adam K; Torvik, Vetle I; Ross, Matthew B et al. (2017) Predicting MeSH Beyond MEDLINE. Proc 1st Workshop Sch Web Min (2017) 2017:49-56
Peng, Yufang; Bonifield, Gary; Smalheiser, Neil R (2017) Gaps within the Biomedical Literature: Initial Characterization and Assessment of Strategies for Discovery. Front Res Metr Anal 2:

Showing the most recent 10 out of 31 publications