This research project will extend the theoretical foundations of mixture modeling for statistical learning by novel mathematical tools that can probe into the precise geometry of mixture models. Based on the theoretical results, the investigators will develop new approaches to clustering, dimension reduction, variable selection, and temporal analysis. These methods will open promising paths for interactively visualizing complex data and for data summarization. A suite of statistical tools will be integrated as the technical backbone into a new visualization system. Applications to very large-scale, high dimensional, and temporally evolving data will be explored. The principal investigators, with complementary backgrounds in theoretical statistics, computational statistics, and information visualization, will also work with colleagues across multiple departments at Penn State University to test their methods and prototype systems using real-world data sets.

In a plethora of scientific and engineering areas with direct and tremendous impacts on our everyday life, such as extreme weather prediction and manufacturing engineering design, researchers are facing gigantic amount of data with great complexity in terms of dimensionality, data types, statistical dependence, and temporal variations. Visualization has played important roles in support of analyzing complex data. Visualization systems help users increase available spatial and cognitive resources, improve searching, enhance pattern recognition, and ultimately make sense of abstract phenomena. This research project aims at fundamentally advancing the mathematical core of visualization systems. The investigators take a probabilistic framework to model data, specifically the mixture model. Mixture modeling provides a highly flexible and theoretically solid basis for summarizing data and automatically extracting patterns from data. This project will develop theories and algorithms for mixture modeling and exploit them to construct new statistical learning and data mining techniques. These statistical methods will thoroughly change the ways visualization systems are designed, offering more functions as well as better functions. Software packages for advanced methods of statistical learning and interactive visualization will be developed and distributed for public use. The proposed research on data visualization and modeling techniques are expected to affect a wide range of fields in science, engineering, and commerce. The applications to hurricane forecast and engineering design can deeply influence our daily life.

Project Start
Project End
Budget Start
2009-09-15
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$497,973
Indirect Cost
Name
Pennsylvania State University
Department
Type
DUNS #
City
University Park
State
PA
Country
United States
Zip Code
16802