Pairwise comparison of objects is an important way human beings learn to reason from massive data sets. In many modern science and engineering fields, large-scale high dimensional data sets are generated with abundant structural information within each object, allowing people to conduct detailed pairwise comparisons between individual objects. To preserve the fine structural information of the data, it is important to take into account both the scalar similarity measure and the transformations that describes the relation between the data points. When the transformations admit an algebraic structure such as a group, the additional algebraic rigidity constraints shed new lights upon efficient learning and inference strategies largely unexplored in existing literature. The PIs aim to utilize the two sources of low-dimensional structures in data: (i) the manifold underlying the data, and (ii) the algebraic consistency among the group transformations, to devise highly accurate and computationally efficient statistical methods for extracting patterns in massive complex data sets emerging from social, biomedical, and comparative biological sciences. This project will involve educating and training the next wave of students, and equipping them with the necessary tools to work in data science. Dissemination of research results and building connections among different fields through organizing workshops are also important aspects of the proposed work.

The goal of the project is to develop novel geometric harmonic analysis methods to extract information and perform inference on large-scale datasets equipped with group transformations. This will involve foundational theoretical work and algorithm development in the following three interrelated objectives: (i) angular synchronization across frequency channels, (ii) extended vector diffusion maps on multiple associated vector bundles of a common principal bundle, and (iii) community detection in conformation spaces of molecules and shape spaces of biological anatomical surfaces through multiple irreducible representations of group-valued pairwise interactions. On the practical side, the PIs propose to apply these newly developed techniques to high impact domain applications in biomedical and comparative biological sciences, including (1) cryo-EM and cryo-electron tomography (ET) image denoising, (2) shape space analysis in evolutionary and comparative biology, and (3) learning conformation spaces and dynamical structures of biomolecular machines. The techniques developed during the project period will be broadly applicable across disciplines, where the observations are noisy, incomplete, and possibly modified by a latent transformation through the action of an unknown group element.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1854791
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2019-08-01
Budget End
2022-07-31
Support Year
Fiscal Year
2018
Total Cost
$140,380
Indirect Cost
Name
University of Illinois Urbana-Champaign
Department
Type
DUNS #
City
Champaign
State
IL
Country
United States
Zip Code
61820