Learning latent structures in complex data sets is a crucial task in computational and data-enabled sciences. For applications in computer vision, genomics, and social networks, the hidden structures are often of discrete nature, making traditional algorithms not scalable to modern large data sets. To address this computational challenge, this research will introduce statistical methodologies to develop new models and fast algorithms for recovering discrete structures in data. The integration of computational and statistical perspectives will lead to not only advancement in theory, but also statistical packages for learning tasks. Moreover, multiple components of the research will bring societal benefits such as uncovering threats to online anonymity and understanding social polarization. This project will also provide high-quality training and research opportunities to next-generation data scientists.

More specifically, the research will focus on three types of problems: graph and shape matching, graph layout problems, and mixture models. Central to all these problems is the inference of permutations from noisy, incomplete observations. Due to the combinatorial nature of permutations and other hidden structures, the associated optimization problems are highly non-convex and intractable in the worst case. To develop efficient algorithms, this research will take an average-case perspective and employ a variety of techniques including spectral methods, convex relaxations, and non-convex local search. Theoretically, the fundamental limits of the proposed problems and algorithms will be characterized in terms of the trade-off between statistical and computational efficiency. On the practical front, all implementations of new methods will be made open-source for interdisciplinary applications such as alignment of biological networks and object matching in computer vision.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
2053333
Program Officer
Yong Zeng
Project Start
Project End
Budget Start
2021-06-01
Budget End
2024-05-31
Support Year
Fiscal Year
2020
Total Cost
$64,306
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332