In this project, the PIs and students study a probabilistic and graphical representation, called the And-or graph (AoG) for visual knowledge representation. This AoG model embodies hierarchical and contextual models for visual objects and scenes and is the key to robust object and scene recognition. More specifically, the project addresses two major technical challenges: (i) Learning the AoG for representing objects and scenes in an unsupervised way; and (ii) Developing effective inference algorithm by scheduling top-down and bottom-up processes to extract semantic contents in a parse graph under the guidance of the AoG. The extracted semantics include the hierarchical decomposition of the image from scene to objects, and parts, as well as the contextual relations. These contents are crucial for filling in the semantic gap in large scale image search and retrieval. The technologies studied in this project are key to a number of applications, such as image content extraction for security surveillance, information gathering, Internet image search, and situation awareness. One specific application studied in this project is autonomous driving assistant for designing safer vehicles and reducing car accidence. The project also supports the training of 3 graduate students over the three year period. Research results are disseminated through public publications in major computer vision conferences and journals, institutional webpages, and shared data sets and code in the Internet.