Obtaining information from data is one of the most fundamental problems of modern science and technology. The aim of machine learning is to develop algorithms to automatically extract useful information from complex, high-dimensional data. Making progress toward this aim requires developing an understanding of the aspects of data, which are amenable to analysis and can be learned using computationally efficient methods. In particular, modeling non-linear structures in high-dimensional data has become one of the very challenging and active lines of research, which has seen significant progress over the last ten years.
The goal of this project is to develop and analyze new mathematical representations for data, based on spectral and algebraic methods. We will explore how different structures in the data, such as cluster, manifold or parametric model structures, are reflected in their spectral and algebraic properties and how they can be extracted algorithmically from data, paying particular attention to the issues of high dimensionality and non-linearity. These insights will be used to build better and more adaptive algorithms for inference and data analysis tasks.
We will also analyze experimentally and theoretically properties of these algorithms, when data deviates from the posited model structure. This is a key issue in practical applications, which nearly always involve uncertainty and noise.