This Print and Probability project develops novel machine learning and computer vision techniques to infer thousands of book and pamphlet printers whose identities have eluded scholars for roughly 500 years. Before the modern era, the book trade was often dangerous and secretive. For fear of persecution and punishment, printers between 1473-1800 declined to attach their names to about a quarter of all known books and pamphlets. However, now that over 130,000 books have been digitized by the Early English Books Online (EEBO) project, defects and variations in the printing tools of this era may hold the key to identifying these printers. Once an individual piece of metal type is damaged, it creates unique stamps. Since typesets belonged to specific printers, impressions of damaged type can thus serve as the fingerprints to identify the printers of tens of thousands of clandestine publications. The Print and Probability project automatically detects and tracks these unique pieces of damaged type in order to uncover new information about the history of books. The methods developed in this project could be generalizable to other important tasks and domains - for example, digital forensics and authorship attribution. In addition, the Print and Probability project will train students in a multidisciplinary way, engaging them in collaboration across multiple fields.

By developing new techniques for visual anomaly detection, the Print and Probability project detects damaged letterforms that create consistent aberrations. Based on these damaged type extractions, the project develops probabilistic models of both printer and damaged letter form identifications that allow direct inference of printers at scale. This framework also incorporates other sources of evidence into the identification model - most significantly, the spelling, punctuation, and spacing habits of individual press-house compositors, whose distinctive practices lend themselves to clustering and automatic attributions across all pages of text in the collection. Integrating a new method for automatic compositor attribution, this project develops a statistical model for printer identification that leverages the same sources of evidence compiled manually by scholars of rare books, but at a scale and speed never before possible.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1816311
Program Officer
Tatiana Korelsky
Project Start
Project End
Budget Start
2018-09-01
Budget End
2019-08-31
Support Year
Fiscal Year
2018
Total Cost
$499,770
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213