Scientific and engineering disciplines ranging from structural biology to computer vision rely on data collection and analysis to guide scientific discovery. Critically, in such applications, the systems under study constrain and govern the structure of information in collected data. The goal of this research project is to develop a family of statistical models that enables a systematic extraction of relevant statistical information from these datasets by bringing together interdisciplinary concepts from statistics and optimization. The approach under development aims to provide a new set of statistical tools that is adapted to this class of problems and that could have a transformative impact on several scientific disciplines.
This project is articulated around a core set of techniques to analyze datasets in the context of a latent algebraic structure, often arising from the physical laws underlying the data collection processes. Unlike more traditional statistical problems where a linear underlying structure is often built into the model, data-driven science generates problems with algebraic but often non-linear structure. The project focuses on problems of central importance in a variety of scientific and engineering disciplines, including signal processing, structural biology, and computer vision, that share a similar feature: the need to leverage algebraic structure in order to extract information from data. The project aims at developing a systematic approach to analyze this family of problems, together with a general procedure to construct computationally efficient algorithms using low-rank tensor decomposition. Importantly, these methods can be proved to be statistically optimal and therefore make the most efficient use of collected data.