This project preserves and provides access to the image data associated with the ClueWeb09 dataset. The ClueWeb09 dataset was created by the PI in early 2009 to support research on the web, information retrieval, and related human language technologies. The dataset contains the texts of about 1 billion of the most important web pages written in ten major languages. The dataset was adopted quickly by the research community: 55 copies were licensed in the first 3 months of availability, and it was used in 4 out of 7 tracks of the National Institute of Standards and Technology's (NIST's) 2009 TREC evaluation of information retrieval research.