Classical image processing has mostly disregarded semantic image representations, in favor of more mathematically tractable representations based on low?]level signal properties (frequency decompositions, mean squared error, etc.). This is unlike biological solutions to image processing problems, which rely extensively on understanding of scene content. For example, regions of faces are usually processed more carefully than the bushes in the background. The inability to tune image processing to the semantic relevance of image content frequently leads to the sub?]optimal allocation of resources, such as bandwidth, error protection, or viewing time, to image areas that are perceptually irrelevant. One of the main obstacles to the deployment of semantic image processing systems has been the difficulty of training content?]understanding systems with large scale vocabularies. This is, in great part, due to the requirement for large amounts of training data and intensive human supervision associated with the classical methods for vocabulary learning. This research aims to establish a foundation for semantic image processing systems that can learn large scale vocabularies from informally annotated data and no additional human supervision. It builds on recent advances in semantic image labeling, which have made it possible to learn vocabularies from noisy training data, such as that massively (and inexpensively) available on the web. The research studies both theoretical issues in vocabulary learning, and the design of image processing algorithms that tune their behavior according to the content of the images being processed. Semantic image processing could lead to transformative advances in areas such as image compression, enhancement, encryption, de?]noising, or segmentation, among others, which are of interest for applications as diverse as medical imaging, image search and retrieval, or security and surveillance.