The analysis of large and complicated data sets is a task of increasing importance in several brunches of science and engineering. In many application scenarios, the data consists of a set of objects endowed with some information that quantifies similarity or dissimilarity of certain pairs. Examples of such inputs include statistical distributions, user preferences in social networks, DNA sequences, and so on. A natural way to interpret such data sets is as metric spaces. That is, each object is treated as a point, and the dissimilarity between two objects is encoded by their distance. The successful application of this data-analytic framework requires finding a distance function that faithfully encodes the ground truth. The project seeks to develop new algorithms for computing such faithful metrical representations of data sets. The underlying research problems lead to new connections between the areas of computational geometry, algorithm design, and machine learning. The project aims at transferring ideas between these scientific communities, and developing new methods for data analysis in practice. The project will be part of the investigator's curriculum development, teaching, and educational and outreach activities. The main research problems that the project seeks to study will form the basis for the training of both undergraduate and graduate students. The investigator is committed to working with minorities and underrepresented groups.

This metrical interpretation of data sets has been successful in practice and has lead to a plethora of algorithmic methods and ideas, such as clustering, dimensionality reduction, nearest-neighbor search, and so on. The area of metric learning is concerned mainly with methods for discovering an underlying metric space that agrees with a given set of observations. Specifically, the problem of learning the distance function is cast as an optimization problem, where the objective function quantifies the extend to which the solution satisfies the input constraints. A common thread in many works on metric learning is the use of methods and ideas from computational geometry and the theory of metric embeddings. Over the past few decades, these ideas have also been the subject of study within the theoretical computer science community. However, there are several well-studied problems in metric learning that have not received much attention in the algorithms and computational geometry communities. Moreover, there are many metric embedding tools and techniques, that where developed within the algorithms community, that have not yet been used in the context of metric learning. The project aims at bridging this gap by a systematic study of algorithmic metric learning problems from the point of view of computational geometry and approximation algorithms. The major goals of this effort are transferring algorithmic tools to the metric learning framework, as well as providing theoretical justification for metric learning methods that are successful in practice but currently lack provable guarantees.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-06-01
Budget End
2022-05-31
Support Year
Fiscal Year
2018
Total Cost
$399,899
Indirect Cost
Name
University of Illinois at Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60612