The overall goal of the project is to develop methodologies of density estimation in multiple dimensions, and to develop new tools based on this methodology for selected problems in data compression, image analysis and graphical model inference. Density estimation is a fundamental problem in statistics but traditional approaches such as kernel density estimation are not well suited to handle the large multivariate data sets in current applications. The research in this project is centered on the methodology and application of multivariate density estimation. By creating effective methods for this problem, this project will also benefit many other research problems in applied statistics and machine learning where density estimation can be used as a building block for the solution, for example, image segmentation, data compression and network modeling.

Specifically, the project will address the question of how to infer a partition of the sample space that will reveal the structure of the underlying data distribution. The partition will be learned from the observed data based on a Bayesian nonparametric approach which imposes minimal assumptions on the distribution to be estimated. Efficient and scalable algorithms for such inferences will be designed for the analysis of large data sets in multiple dimensions. The theoretical properties of the estimates, such as asymptotic consistency and convergence rates, will also be investigated.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1407557
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2014-08-01
Budget End
2018-07-31
Support Year
Fiscal Year
2014
Total Cost
$599,539
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305