The sizes of problems that need to be solved in modern applications have grown enormously. Sampling to draw a (small) subset of the data to store in main memory and process in detail is thus a natural alternative to computing on the entire data. The project addresses the questions of how well the answers from a sample approximates answers to the full problem for many optimization problems that go under the name of Constraint Satisfaction Problems. The project also considers problems where the input data consists of matrices or higher dimensional arrays. Here adaptive sampling (where the probability of sampling a piece of the data depends on its relative importance) has been successfully used by the PI and others recently. The existing results are to be improved. Adaptive sampling calls for making more than one pass through the data. The project will study models of computation, where the number of passes is measured as an important resource. This is expected to contribute to the discussion of models to handle large data.

A third part of the proposal is to continue the PI's study of rapidly mixing Markov Chains, both by developing new general techniques and applying to specific problems like the computation of volumes of convex sets.

Broader Impact: The research is expected to have broad impact since the processing of massive data via sampling is of enormous importance in modern computing. The PI teaches specialized courses at Yale and hopes to disseminate the results of the research to students through the courses.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
0310805
Program Officer
Richard Beigel
Project Start
Project End
Budget Start
2003-05-01
Budget End
2007-04-30
Support Year
Fiscal Year
2003
Total Cost
$250,000
Indirect Cost
Name
Yale University
Department
Type
DUNS #
City
New Haven
State
CT
Country
United States
Zip Code
06520