The overall goal of the project is to develop methodologies of density estimation in multiple dimensions, and to develop new tools based on this methodology for selected problems in data compression, image analysis and graphical model inference. Density estimation is a fundamental problem in statistics but traditional approaches such as kernel density estimation are not well suited to handle the large multivariate data sets in current applications. The research in this project is centered on the methodology and application of multivariate density estimation. By creating effective methods for this problem, this project will also benefit many other research problems in applied statistics and machine learning where density estimation can be used as a building block for the solution, for example, image segmentation, data compression and network modeling.
Specifically, the project will address the question of how to infer a partition of the sample space that will reveal the structure of the underlying data distribution. The partition will be learned from the observed data based on a Bayesian nonparametric approach which imposes minimal assumptions on the distribution to be estimated. Efficient and scalable algorithms for such inferences will be designed for the analysis of large data sets in multiple dimensions. The theoretical properties of the estimates, such as asymptotic consistency and convergence rates, will also be investigated.