With big data come bigger goals – many modern statistical applications not only involve large datasets that may offer new insights but also require deciphering highly intricate relationships among a large number of variables, often characterized by very complex and high-dimensional functions. As more data are acquired, these functions inevitably become more complex. Oftentimes the most fundamental challenge for these applications is how to quantify the complexity of such tasks and learn these functions from data in an efficient way, both statistically and computationally. Despite impressive progress made in recent years, the current approach towards this goal is limited by the discrete nature of the classical notion of computational complexity and is not suitable for statistical problems that are continuous. This project aims to develop an information-based approach that better accounts for both statistical and computational efficiencies. This new framework of complexity is expected to offer insights into the potential trade-off between statistical and computational efficiencies and to reveal the role of experimental design in alleviating computational burden. The project provides training for graduate students through involvement in the research.
Traditional nonparametric techniques based solely on smoothness are known to suffer from the so-called "curse of dimensionality." But in many scientific and engineering applications, the underlying high-dimensional object may have additional structures which, if appropriately accounted for, could help lift this barrier and allow for efficient methods to handle it. This project aims to develop a coherent framework to quantify the complexity of high dimensional models that appropriately accounts for both statistical accuracy and computational cost and helps better understand the role of these additional structures. The project will use this notion of complexity to examine several common yet notoriously difficult high-dimensional nonparametric regression problems: one based solely on smoothness, another based on smoothness and sparsity, and finally, one based on low-rank tensors. The exercise is designed to reveal interesting relationships between statistical and computational aspects of these problems and lead to the development of novel and optimal sampling and estimation strategies. The research will develop the new framework of complexity in more general statistical contexts as well and investigate its role in characterizing statistically and computationally optimal inference schemes. This will be achieved by developing new statistical methods and computational algorithms, theoretical study of their performance and fundamental limits, and the development of related mathematical tools and computational software.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.