Correlated and high dimensional data appear routinely in many areas of sciences, including atmospheric sciences, finance, and molecular genetics, as well as in an ever increasing number of everyday activities such as social networking. While a vast amount of data are being generated and are available for analyses, traditional methods often fail to elicit information in such applications. This research project has two major goals. First, it seeks to develop new mathematical tools for analyzing a recent complex statistical approach for correlated data that has been known to produce astonishingly accurate results in empirical studies, but lacks any theoretical justification. It is hoped that the new theoretical tool will lead to further refinements of existing statistical methodology for correlated data. The second part of the project is concerned with complex inferential issues for high dimensional data where the number of unknown parameters far exceeds the sample size, such as determining the role of a few important genes among a collection of several thousand genes from data on a few hundred patients. The project seeks to develop theoretical and methodological statistical tools to enable researchers to address important inference questions without stringent model assumptions.

The project aims to develop some critical theoretical tools and nonparametric statistical methodology for the analysis of time series and high dimensional data. Specifically, this project will focus on (i) developing asymptotic expansion results for the "fixed-b" asymptotic approach in time series that has shown significant improvement over traditional methods in several empirical studies but with very little theoretical underpinning; (ii) investigating higher order properties of some general classes of statistical tests (e.g., Wald tests) and of some more recently proposed nonstandard empirical likelihood tests, both under the "fixed-b" formulation; (iii) developing new pivotal quantities for block bootstrap in time series that nearly match the accuracy of bootstrap under independence; (iv) developing asymptotic expansion results in high dimensions under sparsity by exploiting some novel tools from approximation theory and Banach space theory; (v) applying the asymptotic expansion results from (iv) to investigate the "phase transition" phenomenon in asymptotic properties of statistical methods in high dimensions, and (vi) investigating properties of resampling methods for post-variable selection inference in high dimensions.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
2006475
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2019-09-01
Budget End
2021-06-30
Support Year
Fiscal Year
2020
Total Cost
$92,407
Indirect Cost
Name
Washington University
Department
Type
DUNS #
City
Saint Louis
State
MO
Country
United States
Zip Code
63130