This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).

The covariance parameter is the natural parameter of interest when exploring complex relationships between many variables in parametric models. Current methodology on high dimensional covariance estimation has focused on regularizing or putting zeros in the covariance matrix or its inverse using methods based on the lasso. Though very useful, these methods do not address some of the glaring gaps in the literature. First it is well known that lasso and similar penalization methods yield sparse models and estimators - yet a formal undertaking of the spectral properties of regularized covariance estimators or those of random matrices that arise naturally in graphical models is not available in the literature. This gap in the literature will be addressed. Second, an important class of models that have recently received much attention are the so-called covariance graph models. These models encode marginal independences in multivariate distributions and thus can yield more parsimonious representations. A comprehensive framework for Bayesian inference and model selection for this class of models is not available. This important class of problems is investigated in this project. One of the original justifications for the need for covariance regularized estimation is that the covariance matrix features in the mean estimation problem, and when constructing confidence intervals for the mean (for instance in MANOVA), or in regression yet there is relatively very little work in the area of covariance regularization required for the specific needs of regression. A generalized framework which investigates the merits of using the covariance matrix of the explanatory variables for regression purposes is undertaken, thereby providing insights into obtaining better estimators for regression coefficients than those suggested by standard methods.

In recent years, the availability of high-throughput data from genomic, finance, environmental, marketing (among other) applications has created an urgent need for methodology and tools for analyzing high-dimensional data. Making sense of all the many complex relationships that are in the data, formulating correct models and developing inferential procedures is one of the major challenges facing statisticians today, and also those working in applied fields. This project proposes to tackle some of the pressing questions that arise when exploring multivariate dependencies in high dimensions. As a concrete application, the methodology developed in this project will be used to understand the interconnectedness of genes in cancer studies and cardiovascular medicine, while maintaining the statistical rigor and ease of interpretability of previously developed methods. Hence a project of this nature will have widespread applications, as understanding relationships between many variables or players is an endeavor that is common to many scientific disciplines. The proposed work, though rooted in the principles of statistics, is interdisciplinary, and involves collaborations with biomedical scientists, engineers and the environmental scientists.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0906392
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2009-07-15
Budget End
2011-06-30
Support Year
Fiscal Year
2009
Total Cost
$103,764
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Palo Alto
State
CA
Country
United States
Zip Code
94304