This project involves the integration of ideas from computer science, mathematics, and statistics, in the context of their application to knowledge discovery and data mining of very large data sets. The project has two general research goals. The first involves the development of novel methods for exploration and identification of structure in multivariate data, with particular emphasis on clustering and density estimation. The second research goal is the development of novel methods for modeling sequential structure in data, in particular the use of graphical models to facilitate the process of model construction and estimation. The technical approach is based on the coupling of ideas from modern statistical modeling with computational techniques. A key feature of this work is the use of large-scale scientific and engineering data sets as testbeds, including upper-atmosphere spatio-temporal data records, a large medical data set consisting of heterogeneous data types for the study of Alzheimer's disease, planetary image data sets and associated annotations and catalogs of geologic features, and multivariate engineering sensor data from online system monitoring. The educational component of the project consists of the development of new courses which emphasize a first-principles understanding of model-exploration in the context of data analysis, as well as opportunities for students to participate in inter-disciplinary, large- scale exploratory data mining projects. This project can have a significant impact on how large data sets are explored and analyzed across a wide variety of scientific, medical, and business disciplines.