CAREER: Algorithms for understanding data

Valiant, Gregory

Abstract

Given samples from some unknown distribution, what can one infer about the underlying distribution, and how efficiently can these inferences be made? Â In many of the most fundamental settings, our understanding of the computational and information theoretic possibilities and barriers is still startlingly poor. Â This project tackles two broad research objectives: developing efficient algorithms for probing data, and understanding how to efficiently estimate properties of distributions. Â The first line of research seeks to understand which questions about a dataset can be answered extremely efficiently, requiring computational resources (time, or memory) that are sublinear in the size of the dataset or distribution. Â The second research objective is to understand the minimal amount of information necessary to ascertain, with high probability, whether or not a distribution or dataset possesses a given property. Â In the context of statistical property estimation, this problem asks how few samples are needed to estimate the property in question to a desired accuracy, with high probability. Â This research pursues both new estimation algorithms, and new information theoretic tools and lower bounds.

With vast and important datasets emerging across many disciplines, from genetic, biological, and medical databases, to databases documenting our economic and social behaviors, the challenge of how to make sense of them has particular immediate relevance and has rapidly become the bottleneck in scientific understanding. Â The specific problems investigated in this project arise in the analysis of these datasets; algorithmic advances on these problems have the potential to very quickly be adopted and transform ongoing data analysis efforts. Â Beyond the immediate implications for the data sciences, these questions are extremely basic and foundational. As such, new techniques, perspectives, and insights gleaned from their study are likely to have broad implications for other problems throughout computer science, statistics, information theory, and the data sciences.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Communication Foundations (CCF)
Type: Standard Grant (Standard)
Application #: 1351108
Program Officer: Tracy Kimbrel

Project Start
Project End
Budget Start: 2014-07-01
Budget End: 2019-06-30
Support Year
Fiscal Year: 2013
Total Cost: $500,000
Indirect Cost

CAREER: Algorithms for understanding data
Valiant, Gregory
Stanford University, Stanford, CA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments