Many machine learning problems deal with networks that encode similarities or relationships among different objects, for which observational data may be limited in extent and noisy. Learning the desired information requires highly efficient algorithms that can process large-scale network data and detect tenuous statistical signatures. This project involves modeling large-scale networks and observations, devising learning algorithms, analyzing the performance of the algorithms, deriving bounds on the possible performance of best algorithms, and deploying theoretically-grounded algorithms to real network data. The research aims to significantly advance the theoretical and algorithmic understanding of graphical inference and provide key enabling technologies for high-impact applications such as ordering of short DNA sub-sequences for genetic sequencing. Improvements in the ability to sequence DNA can accelerate the use of genomics with applications in health care. The associated mechanisms for broadening participation in computing include: (a) Explorations in computing and statistics for K-12 with broad participation; (b) Career and life skills guidance for graduate students at the Annual Allerton Conference on Communications, Control, and Computing; and (c) Mentoring female and minority students in research.

The research is grouped into four interrelated areas, ranging from inference problems for single graphs, to inference involving two graphs, in order to study classification of graphs from general families: (a) learning community structure in dynamic graphs with heavy-tailed degree distribution, specifically, in a new variation of the Barabasi-Albert preferential attachment model; (b) recovering graphical structures beyond communities, including but not limited to recovery of hidden Hamiltonian cycles arising in a genetic sequencing problem and hidden matchings in bipartite graphs arising in a particle tracking problem; (c) matching two graphs to each other by identifying vertex correspondences, in particular, matching of perturbed versions of Erdos-Renyi random graphs and Barabasi-Albert preferential attachment graphs; and (d) learning properties of graphs using sampling, including sampling along random walks on graphs. Computationally efficient algorithms that estimate the number of both local structures (e.g., edges and triangles) and global structures are designed. Network dynamics and subsampling, as well as inference of network structures that are not necessarily low rank or static, are addressed by employing techniques ranging from information theory, message passing, spectral and non-convex methods, and convex methods including linear, quadratic, and semi-definite relaxations.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1900507
Program Officer
Armand Makowski
Project Start
Project End
Budget Start
2019-07-01
Budget End
2023-06-30
Support Year
Fiscal Year
2019
Total Cost
$106,940
Indirect Cost
Name
Yale University
Department
Type
DUNS #
City
New Haven
State
CT
Country
United States
Zip Code
06520