The unprecedented explosion in the volume of readily accessible data in recent years, although highly desirable, comes at a heavy price - it is now considerably more difficult to locate the answer to a specific question, or to find patterns and correlations that are central to a phenomenon. The PI's propose to study general-purpose clustering as a solution to the problem of locating relevant information and organizing it in an intelligible way.
The proposal falls into two tracks. The first track is to develop measures and criteria that accurately re the quality of a clustering. The PI's have initiated work in this direction, and plan to build a theory of clustering criteria firmly grounded in the intuition of practitioners.
The second track is the design of efficient algorithms to find near-optimal clustering under the right measures. The PI's propose to develop algorithms based on the so-called "spectral methods" and other techniques. They plan to make them efficient by the use of random (non-uniform) sampling. The PI's also plan to implement some of the algorithms and empirically evaluate them.
The PI's propose to ensure that any positive impact from their research is maximized by (a) widely disseminating the results via conferences and also by inviting other researchers to discuss the ideas, (b) by making the preliminary implementations of clustering algorithms available.