This project designs image analysis algorithms that extract salient image features, group images based on similarity of these features, classify groups according to a priori knowledge, and optimize algorithmic steps and parameters. The research team applies the algorithms jointly developed to the three collections of images; and reports accuracy and computational requirements over all of the image collections. The research activities address problems of individual and collective authorship via artistic, scientific and technological questions based on the datasets, and developing the corresponding image analyses leading to computationally scalable and accurate data-driven discoveries of salient and discriminating characteristics. More specifically, the project, (a) promotes the development and deployment of innovative image analyses targeting the problem of authorship and applied to large-scale data analysis; (b) fosters interdisciplinary collaboration among scholars in the humanities, computer sciences, and information sciences; (c) promotes international and domestic collaborations; and (d) leads to unique accuracy and computational scalability findings over a set of large, diverse digital collections made available over the grid to a significant body of researchers from complementary disciplines keen to learn from each other. The project is a part of international, multi-institutional and multi-disciplinary efforts that jointly explore authorship across three distinct but in some respects complementary digital dataset collections: 15th-century manuscripts, 17th- and 18th-century maps and 19th- and 20th-century quilts.

Project Report

The DID-ARQ project demonstrated how applied computer vision could be used to ask humanistic research questions. The project explored the use of several kinds of image based identification algorithms for a diverse set of applications in the humanities that included author identification in manuscripts and quilts, as well as shape analysis of lakes represented by historical maps. In each case specialized application specific software was developed through cross disciplinary interaction and the nailing down of assumptions leading to constraints that could be made within the given data set in order to reduce the problem domain and increase the accuracy of the image analysis software. The developed software was developed as open source and is being made available for others to use and modify towards their own needs. This kind of work is important to allow image analysis to have an equal part in the digital humanities as text mining. Discovering what methods work well for humanities researchers and making the algorithms available are first steps to making image analysis and data mining of images more accessible to humanities scholars. Further, problems with regards to accessibility and scale are becoming more and more clear as libraries and archives move towards digitization of born physical documents as part of their preservation. In terms of accessibility there is the matter of digitized documents being images as opposed to the ASCII text which search engines were built upon. Further, human annotation, though a common and costly practice, is becoming impractical if not impossible given the growing amounts of digitized data available. In this light even imperfect solutions involving computer vision, an active research field, offer a means of addressing problems surrounding these collections where there would be nothing to offer otherwise.

Project Start
Project End
Budget Start
2010-08-01
Budget End
2013-07-31
Support Year
Fiscal Year
2010
Total Cost
$108,000
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820