In recent years, the amount of available data in science and technology has exploded and is currently expanding at an unprecedented rate. The general task of making accurate inferences on large and complex datasets has become a major bottleneck across various disciplines. A natural formalization of such inference tasks involves viewing the data as random samples drawn from a probabilistic model -- a model that we believe describes the process generating the data. The overarching goal of this project is to obtain a refined understanding of these inference tasks from both statistical and computational perspectives. The questions addressed in this project arise from pressing challenges faced in modern data analysis. A crucial component of the project involves fostering collaboration between different communities. Furthermore, the PI will mentor high-school and undergraduate students, and design several new theory courses integrating research and teaching at the undergraduate and graduate levels.
The PI will investigate several fundamental algorithmic questions in unsupervised learning and testing for which there is an alarming gap in our current understanding. These include designing efficient algorithms that are stable in the presence of deviations from the assumed model, circumventing the curse of dimensionality in distribution learning, and testing high-dimensional probabilistic models. This set of directions could lead to new algorithmic and probabilistic techniques, and offer insights into the interplay between structure and efficiency in unsupervised estimation. This research ties into a broader range of work across computer science, probability, statistics, and information theory.