Bayesian Recursive Partitioning and Inference on the Structure of High-Dimensional Distributions

Ma, Li

Abstract

This research concerns one of the most important and pervasive classes of problems in modern data analysis---inference on the structure of probability distributions. Specific inference problems to be addressed fall into two broad categories. The first involves inference on the structure of a single probability distribution, including estimation of joint and conditional densities, variable selection in linear regression, and the testing of independence and conditional independence among variables. The second involves inference on the relationship across multiple distributions. This includes testing whether two (or more) data samples have the same underlying distribution, and learning the structure of their difference, with particular interest given to finding local structures---differences that lie in small subsets---in large high-dimensional spaces. To address these problems, the investigator puts forward a novel framework for constructing Bayesian priors on multivariate distributions through recursive partitioning. Inference using this framework is flexible and adaptive. Moreover, the generative nature of these priors facilitates the modeling of dependence structure across multiple distributions and this leads to powerful methods for comparing distributions. To address the computational challenges in high-dimensional problems, the investigator lays out a set of computational strategies and proposes to develop several algorithms that can drastically improve the efficiency of Bayesian posterior inference in high-dimensional problems. These strategies utilize the recursive nature of the proposed framework to efficiently explore the global landscape of the corresponding posterior distributions.

Inference on the structure of probability distributions lies at the heart of many scientific inquiries, and new statistical theory and methods are urgently needed to accommodate the ever increasing dimensionality of data sets that is commonplace in modern scientific investigations. Two specific applications that motivate this project are the analysis of high-dimensional flow cytometry data in systems biology for unraveling the functional relationships among proteins as well as the mapping of human genes to various qualitative and quantitative traits, in particular those of common diseases such as cancer and diabetes. The concepts, theory, methodology, and algorithms developed in this project will be directly applicable to these problems, as well as to the analysis of data sets arising from a wide variety of other fields ranging from environmental science to economics.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Application #: 1309057
Program Officer: Gabor Szekely

Project Start
Project End
Budget Start: 2013-07-01
Budget End: 2016-06-30
Support Year
Fiscal Year: 2013
Total Cost: $159,873
Indirect Cost

Bayesian Recursive Partitioning and Inference on the Structure of High-Dimensional Distributions
Ma, Li
Duke University, Durham, NC, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments