It is widely accepted that in many high dimensional situations, model selection has to be performed either before parameter estimation or simultaneously, in order to reduce the number of parameters under consideration. Indeed, model selection is one of the major challenges facing statisticians working with high dimensional data. Tools such as regularization and sparsity are some of the common notions employed to obtain parsimonious models to explain observed data. In recent years, the field of statistics has witnessed an explosion of frequentist and Bayesian methods for high dimensional problems. Despite these and other advances, Bayesian model selection in an "objective" sense in high dimensional problems remains an important problem that has yet to be solved satisfactorily. The need for objectivity translates into a need for specifying noninformative improper priors, which in turn renders the traditional Bayes factors unusable. The project proposes to derive objective Bayesian estimation and model selection procedures in a large class of high dimensional graphical models. The methodology that is proposed in this project therefore aims to contribute to much needed theory in the area of objective Bayesian model selection for high dimensional graphical models. In the process the methodology studies the benefits and shortcomings of objective Bayesian methods in this context. The theory that is developed feeds into developing algorithms and computational techniques for model selection/estimation in high dimensional settings.

The availability of throughput or high dimensional data has touched almost every field of science. The need to formulate correct models that explain observed high dimensional data permeates through many scientific fields. Indeed, such data where the number of variables is often much higher than the number of samples, referred to as the "large p small n" problem, is now more pervasive than it has ever been. Discovering statistical signals in high dimensional data, proposing correct models that can explain such data, and parameter estimation in these high dimensional settings are some of the major challenges that modern day statisticians have to contend with. Moreover, such challenges also feature in high stakes debates such as climate change, effectiveness of certain drugs in clinical trials, and relevance of various biomarkers in cancer studies. This project proposes to develop statistical methodology which is specifically targeted towards identifying models which explain high dimensional data in an objective manner. In particular the project is designed to develop better objective Bayesian model selection and parameter estimation methods in high dimensional problems, and has widespread applications. The PI and co-PI collaborate with scientists in applied fields, especially with faculty/researchers in their Medical Schools, Schools of Engineering and Environmental Sciences. Training of graduate students and mentoring is an integral part of this collaborative research. Scientific output from the project is intended for publication in high impact peer-reviewed journals.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1106642
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2011-10-01
Budget End
2014-09-30
Support Year
Fiscal Year
2011
Total Cost
$99,191
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304