CAREER: Geometric Algorithms For Data Analysis In Spaces Of Distributions

Venkatasubramanian, Suresh

Abstract

Collections of distributions arise naturally when analyzing large data sets. Since it is impractical to store all but a small fraction of such data, distributional representations are typically used to summarize the data in compact form. For example, a document in a corpus is typically represented by a normalized vector of frequencies of occurrence of keywords, an image is represented by a histogram over gradient features and speech signals are represented by spectral densities over a frequency domain.

Representing data sets as collections of distributions enables analysis via powerful concepts from statistics, learning theory and information theory. Concepts like strength of belief, information content, and pattern likelihood are used to extract meaning and structure from the data and are quantified using information measures like the Kullback-Leibler distance and its parent class, the Bregman divergences.

These measures capture meaning in data in a manner that traditional metrics cannot, by connecting abstract notions of information loss and transfer with concrete geometric notions like distances. However, they lack properties like symmetry and the triangle inequality that are essential requirements for the application of traditional geometric algorithms for data analysis.

In this project, the PI will develop a systematic, rigorous and global algorithmic framework for manipulating these distances. This framework will provide the foundation for efficient and accurate data analysis of spaces of distributions, and will lead to deeper insights into analysis problems across a wide range of applications.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Communication Foundations (CCF)
Application #: 0953066
Program Officer: Dmitry Maslov

Project Start
Project End
Budget Start: 2010-02-01
Budget End: 2015-01-31
Support Year
Fiscal Year: 2009
Total Cost: $390,420
Indirect Cost

CAREER: Geometric Algorithms For Data Analysis In Spaces Of Distributions
Venkatasubramanian, Suresh
University of Utah, Salt Lake City, UT, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments