To pursue the promise of the big data revolution, the current project will focus on a common form of data, high dimensional high frequency data (HDHFD), where a snapshot of the data involves a large number of variables, and at the same time new data streams in every fraction of milliseconds. With technological advances in data collection, HDHFD occurs in medical applications from neuroscience to patient care; finance and economics; geosciences such as earthquake data; marine science including fishing and shipping; turbulence; internet data; and other areas where data streaming is available. The Principal Investigators' (PIs') research focuses on how to extract information from complex big data and how to turn data into knowledge. In particular, the project seeks to develop cutting-edge mathematics and statistical methodology to uncover the structure governing HDHFD systems. This structure is characterized by a web of dependence across both time and dimension, and the role of analysis is to provide guidance on how to reduce the complexity while retaining the important features of the data architecture. An integral part of this research is also about how to quantify the uncertainty in estimates and forecasts in HDHFD systems. In addition to developing a general theory, the project is concerned with applications to financial data, including risk management, forecasting, and portfolio management. More precise estimators, with improved margins of error, will be useful in all these areas of finance. The results are of interest to main-street investors, regulators and policymakers, and the results are entirely in the public domain.
The purpose of this project is to explore high dimensional high frequency data (HDHFD) from several angles. A fundamental approach is to extend the PIs? contiguity theory. Under a contiguous probability, the structure of the observations is often more accessible (frequently Gaussian) in local neighborhoods, facilitating statistical analysis. This is achieved without altering current models. In a contribution to factor modeling of the HDHFD data, the PIs will explore time-varying matrix decompositions, including the development of a singular value decomposition (SVD) for high frequency data, as a more direct path to a factor model. We plan to compare the new SVD with PCA based methods, as well as L1 type methods such as nonnegative matrix factorization. The PIs have discovered a new way to look at time and cross-dimension dependence, originally developed by the PIs in connection with their observed asymptotic variance (observed AVAR). They will now look into the possibility to "borrow" information across time and dimension. This tool will be used for matrix decompositions, as well as to develop volatility matrices for the drift part of a financial process, which will interface with their planned work on matrix decompositions. The PIs will explore a path to an observed AVAR that takes place in continuous time, thereby improving accuracy and simplifying both implementation and theoretical analysis.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.