RI: Small: Print and Probability - A Statistical Approach to Analysis of Clandestine Publication

Berg-Kirkpatrick, Taylor; G'Sell, Max; Warren, Christopher

Abstract

This Print and Probability project develops novel machine learning and computer vision techniques to infer thousands of book and pamphlet printers whose identities have eluded scholars for roughly 500 years. Before the modern era, the book trade was often dangerous and secretive. For fear of persecution and punishment, printers between 1473-1800 declined to attach their names to about a quarter of all known books and pamphlets. However, now that over 130,000 books have been digitized by the Early English Books Online (EEBO) project, defects and variations in the printing tools of this era may hold the key to identifying these printers. Once an individual piece of metal type is damaged, it creates unique stamps. Since typesets belonged to specific printers, impressions of damaged type can thus serve as the fingerprints to identify the printers of tens of thousands of clandestine publications. The Print and Probability project automatically detects and tracks these unique pieces of damaged type in order to uncover new information about the history of books. The methods developed in this project could be generalizable to other important tasks and domains - for example, digital forensics and authorship attribution. In addition, the Print and Probability project will train students in a multidisciplinary way, engaging them in collaboration across multiple fields.

By developing new techniques for visual anomaly detection, the Print and Probability project detects damaged letterforms that create consistent aberrations. Based on these damaged type extractions, the project develops probabilistic models of both printer and damaged letter form identifications that allow direct inference of printers at scale. This framework also incorporates other sources of evidence into the identification model - most significantly, the spelling, punctuation, and spacing habits of individual press-house compositors, whose distinctive practices lend themselves to clustering and automatic attributions across all pages of text in the collection. Integrating a new method for automatic compositor attribution, this project develops a statistical model for printer identification that leverages the same sources of evidence compiled manually by scholars of rare books, but at a scale and speed never before possible.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1816311
Program Officer: Tatiana Korelsky

Project Start
Project End
Budget Start: 2018-09-01
Budget End: 2019-08-31
Support Year
Fiscal Year: 2018
Total Cost: $499,770
Indirect Cost

RI: Small: Print and Probability - A Statistical Approach to Analysis of Clandestine Publication
Berg-Kirkpatrick, Taylor G'Sell, Max Warren, Christopher
Carnegie-Mellon University, Pittsburgh, PA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments