Due to the rapid development of information technologies and their applications in many scientific fields such as climate science, medical imaging, and finance, statistical analysis of high-dimensional data and infinite-dimensional functional data has become increasingly important. A key challenge associated with the analysis of such big data is how to measure and infer complex dependence structure, which is a fundamental step in statistics and becomes more difficult owing to the data's high dimensionality and huge size. The main goal of this research project is to develop new dependence measures for quantifying dependence of large scale data sets such as temporally dependent functional data and high dimensional data, and utilize these new measures to develop novel statistical tools for conducting sparse principal component analysis, dimensional reduction, and simultaneous hypothesis testing. Building on the new dependence metrics that can capture nonlinear and non-monotonic dependence, the methodologies under development are expected to lead to more accurate prediction and inference, as well as more effective dimension reduction in the analysis of functional and high dimensional data.

The research consists of three projects addressing different challenges in the analysis of functional and high dimensional data. In Project 1, the investigators introduce a new operator-valued quantity to characterize the conditional mean (in)dependence of one function-valued random element given another, and apply the newly developed dependent metrics to do dimension reduction for functional time series under a new framework of finite dimensional functional data. In Project 2, the investigators explore a new dimension reduction framework for regression models with high dimensional response, which requires less stringent linear model assumptions and is more flexible in terms of capturing possible nonlinear dependence between the response and the covariates. In Project 3, the investigators develop new tests for the mutual independence of high dimensional data via distance covariance and rank distance covariance using both sum of squares and maximum type test statistics. Overall, the three lines of research are all related to big data, and they touch upon various aspects of modern statistics; the project aims to push the current frontiers in areas including sparse principal component analysis, inference for dependent functional data, and high dimensional multivariate analysis to another level.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1607489
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2016-06-01
Budget End
2020-05-31
Support Year
Fiscal Year
2016
Total Cost
$184,996
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820