Modern scientific applications produce an immense number of measurements, each contaminated by multiple sources of error. Notable examples include state-of-the-art technologies for imaging of biomedical molecules - such as single-particle reconstruction using cryo-electron microscopy, X-ray free-electron lasers, and X-ray crystallography - that produce vital knowledge to the process of drug design and expand our understanding of the mechanisms of life. The proposed research will develop various computational and mathematical tools to process and analyze massively large and complex datasets. In particular, the devised algorithms will assist researchers in extracting and analyzing information from data acquired by modern devices to exploit their full potential. A specific focus is given to establishing a solid theoretical framework to address the mathematical challenges arising from such datasets. User-friendly software will be made available in public repositories for the use of the scientific community.

The first part of the research studies applications of the method of moments, an appealing computational technique to process and analyze massively large datasets, to a variety of models that appear in scientific and engineering fields. This part includes deriving information-theoretic limits and establishing provable algorithms for models with intrinsic algebraic structures, for example, group and convolution actions, and high noise levels. The second part of the research advances numerical techniques to recover signals from their moments, entailing solving systems of polynomial equations. The novelty of the project lies in the development of new computational methods, equipped with sound mathematical theory, to recover signals from their statistical moments; the focus is on problems that involve massively large, highly corrupted datasets. The study will provide a variety of noise-tolerant solutions for prominent signal and image processing tasks, such as alignment, classification, detection, super-resolution, and phase retrieval. Particular objectives include modifying the method of moments to sustain massive and noisy datasets with outliers, developing scalable computational schemes for recovering signals from invariant and approximately invariant polynomials, numerical methods for solving large systems of polynomial equations, deriving new information-theoretic bounds for high-dimensional data in extreme noise levels, and analyzing non-convex optimization algorithms. These proposed methods have the potential to become leading techniques for solving some of today's leading scientific problems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
2009753
Program Officer
Eun Heui Kim
Project Start
Project End
Budget Start
2020-07-15
Budget End
2024-06-30
Support Year
Fiscal Year
2020
Total Cost
$300,000
Indirect Cost
Name
Princeton University
Department
Type
DUNS #
City
Princeton
State
NJ
Country
United States
Zip Code
08544