Science, medicine, business, and engineering are increasingly data-driven. Hypotheses, diagnoses, decisions, and designs are made by gathering and analyzing a wealth of data in search of meaningful patterns. It is the goal of the field of machine learning to develop methods for inferring and reasoning about patterns in the data. The research supported by this award addresses the question of learning from unlabeled data, one of the more vexing problems of data science. The PI's analyze and develop algorithms for problems such as clustering, i.e., finding groups of similar objects, as well as understanding continuously changing attributes in data. By integrating machine learning, modeling and geometric data analysis, this work injects new ideas and methodologies to modern data analysis, helps build practical algorithms for unsupervised and unsupervised learning and analyze their properties and domains of applicability. Students working on this project have a unique opportunity to be exposed to a broad spectrum of topics including machine learning, statistics, geometry and applied mathematics.

On a more technical level, the unifying perspective for the proposed research is that many of these unsupervised learning problems can be viewed as recovering structure or invariants of the underlying continuous space through the lens of the discrete data. This work takes that point of view to consider a number of important aspects of unsupervised learning including hierarchical clustering in the density model, data quantization, graphon clustering and estimation, as well as learning metric structure from data. The project also considers applications of these ideas to supervised learning, particularly in helping to scale algorithms to large data. While the work on this project concentrates on theoretical analyses, these are developed with a view toward practical algorithms, implementations and applications.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-08-15
Budget End
2020-10-31
Support Year
Fiscal Year
2018
Total Cost
$450,000
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210