Modern imaging devices, sensors, data acquisition systems allow to gather data with unprecedented speed and accuracy. Most of the times, however, we are not interested in accumulating data per se, but rather to uncover some hidden patterns in the data. For instance, given a large network, we might want to discover a small subset of notes that are tightly connected to each other. Such highly connected substructures are of interest in biological datasets, but also in social network analysis, and in signal processing. Finding such patterns requires highly efficient algorithms that can process large amount of data and uncover tenuous statistical signatures. The investigators develop new algorithms that simultaneously optimize both metrics: statistical efficiency and computational efficiency.

Consider in particular the problem of finding an anomalous submatrix in a large data matrix with independent random entries. If the anomalous submatrix has entries with a different distribution, this can be done via principal component analysis, as long as the submatrix has dimensions of the order of the square root of the ambient dimensions. The investigators introduce a class of first order methods with linear complexity, and determine the optimal algorithm within this class. This appears to provably outperform existing approaches. The same framework is generalized to several other classes of high-dimensional estimation problems. Optimal iterative procedures are developed under strict computational constraints.

Project Start
Project End
Budget Start
2013-06-01
Budget End
2018-05-31
Support Year
Fiscal Year
2013
Total Cost
$416,160
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305