Resampling Methods for High-Dimensional and Large-Scale Data

Lopes, Miles

Abstract

Resampling methods are a broad class of tools that serve to measure the variability of statistical results, for example, allowing a researcher to determine whether or not the outcome of an experiment is significant. Over the course of the last few decades, these methods have been extensively studied, and they have become fundamental to the practice of statistics - in large part because they can solve complex problems while relying on relatively few assumptions. Nevertheless, much remains to be understood about the performance of resampling methods in the context of modern data analysis, where observations tend to have large numbers of features (high-dimensional data), or where the quantity of data is so large that it outstrips computational resources (large-scale data). In both of these challenging settings, the proposed research will extend the applicability of resampling methods, and these efforts will be guided by two research themes discussed below.

First, in the setting of high-dimensional data, the understanding of inference problems, including tests and confidence intervals, remains underdeveloped in comparison with estimation and prediction problems. Given that resampling methods are a general-purpose approach to inference, it is important to know how they are influenced by the effects of low-dimensional structure and regularization. In particular, the proposed research will study the performance of resampling methods in high-dimensional models involving structured covariance matrices. Second, in the setting of large-scale data, randomized algorithms have received growing attention for their ability to produce fast approximate solutions. Although the outputs of such algorithms are random, their fluctuations can often be reduced at the expense of greater computation. This general trait of randomized algorithms leads to the problem of optimizing a tradeoff between precision and computational cost. Towards a solution, the proposed research will investigate how resampling methods can be used to measure this tradeoff for a collection of popular randomized algorithms.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Type: Standard Grant (Standard)
Application #: 1613218
Program Officer: Gabor Szekely

Project Start
Project End
Budget Start: 2016-07-01
Budget End: 2020-06-30
Support Year
Fiscal Year: 2016
Total Cost: $150,000
Indirect Cost

Resampling Methods for High-Dimensional and Large-Scale Data
Lopes, Miles
University of California Davis, Davis, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments