Recent developments have made large-scale multidimensional data readily available in science and engineering applications. Examples include multi-tissue, multi-individual gene expression studies, in which gene expression profiles are collected from different individuals' tissues. Another example is the DBLP database, which is organized into a three-way tensor of author-by-word-by-venue, and each entry indicates the co-occurrence of the triplets. Despite the popularity of tensor data, there are many challenges to using statistical methods for analyzing higher-order tensors. Indeed, the classical spectral theory for matrices is not directly applicable to tensors, and the computational problem becomes NP-hard in the worst case. Therefore, analyzing tensor data with increasing dimensionality and ever-growing complexity requires the development of novel statistical methods, which is the aim of this project.
In this project, the PI plans to develop a framework of statistical models, scalable algorithms, and relevant theories to analyze tensor-valued data. This will allow researchers to examine complex interactions among tensor entries and between multiple tensors, thereby providing solutions to questions that cannot be addressed by traditional matrix analysis. The project will focus on three major areas: (i) spectral theory for specially-structured or random tensors; (ii) estimation of low-rank tensors from non-Gaussian observations; and (iii) joint estimation of mean and covariance for tensor-valued data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.