Hundreds, possibly thousands, of research papers have used patent citation data to study innovative search and impact. Recent critiques, however, have raised substantial questions about whether patent citation measures accurately capture a patent's novelty and importance. Using big data techniques including LASSO content labels, semantic distance measures with those labels, and computational linguistics analysis of communication between patent applicants and their examiners, this research will supplement (or possibly even replace) existing approaches to measuring a patent's importance and its position relative to other patented inventions. The new measure of patent importance is based on how many future patent applications are "blocked" by a particular patent, and the assumption to be validated is that "blocking patents" are valuable pieces of technology space. To characterize patents in technology space, this research will use computerized text analysis to identify the most differentiating words for a given patent, relative to all other patents granted since 1975, and test the hypothesis that similarly "tagged" patents are close to one another in technology space, and thus represent local search. The techniques and public data will open up research in the history of innovation, policy assessments of social welfare and technology and science policy, inventor and assignee disambiguation, prior art search, and matching of treated and control patents for econometric estimation.

Broader impacts

The use of patent data to study innovation and its social and economic impact has become increasingly common with the widespread availability of large databases. The advantages of such approaches are easily calculated and consistent measures across millions of documents and thousands of technologies. The disadvantages are that the measures (prior art references by patents, for example) provide poor and indirect correlations with actual phenomena (such as novelty, economic impact, job creation, or potential for patent litigation). Using computational social science, this research applies improved big data techniques, combined with survey validation, that could ameliorate the disadvantages, and made available to researchers, significantly improve research in the innovation and technology strategy literatures. In addition to enabling better research, this work provides wider societal benefit in helping inventors, lawyers, examiners, and investors in prior art search, improved disambiguation of the patent database, and faster identification of important patents.

National Science Foundation (NSF)
SBE Office of Multidisciplinary Activities (SMA)
Standard Grant (Standard)
Application #
Program Officer
maryann feldman
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Berkeley
United States
Zip Code