The ability to make meaning out of a visual world, such as recognizing objects, scenes and semantically meaningful activities and events, is a cornerstone of artificial intelligence. In computer vision, very important progress has been made recently in object and scene level recognition. But such tasks are often performed without an integrated and coherent description of the scene. Moreover, very few current algorithms are capable of further interpreting higher level semantic meanings of an image such as an event or activity. The goal of this project is to achieve event classification via an integrated image understanding given a single unknown image. This project aims to push the frontier of integrated and descriptive understanding of images through the development of sophisticated learning frameworks suitable for training algorithms by using a large amount of real-world data such as the ones from the Internet. High accuracy performance, minimal human supervision, flexibility and scalable learning will be the focus of this endeavor. This project?s theoretical framework ties together several areas of computer vision, offers interesting model representations for the machine learning field, and connects more semantically driven visual recognition problem with the natural language processing field. The results are vital for image understanding technology for the visually- impaired; automatic annotation of images for large digital library as well as the next generation of image retrieval engines; and translation, education and rehabilitation technology for language students and medical patients (such as aphasia, stroke, etc.). The project?s long-term educational plan focuses on bringing the latest visual computation and cognition research directly into the classroom and the community at large, with an emphasis on reaching the underrepresented groups of students.