This proposal aims at developing new and effective algorithms for performing dimensionality reduction tasks by methods which blend techniques from numerical linear algebra and approximation theory. Current implementations of LSI, and other dimensionality reduction methods, rely on matrix decompositions such as the Singular Value Decomposition (SVD). SVD-based methods compute explicitly the basis of the dominant singular vectors and proceed with a projection of the data on this basis. This has the desirable effect of filtering out noise and redundancy inherent to the data, while retaining its main structural features (e.g., `semantic contents' in LSI). However, SVD-based methods tend to be expensive, both in terms of computational cost and storage, and become impractical for very large data sets. The premise of this proposal is that there is no need to compute the (partial) SVD in order to perform dimensionality reduction. The projection of a given vector onto the space associated with the largest singular values can be accurately reproduced by a polynomial filtering technique. This technique, which entails repeated multiplication of a vector by the original data matrix and its transpose, offers several advantages including low computational and storage requirements. In addition, ``relevance feedback'', which enhances significantly the quality of the results of LSI, can be easily adapted for polynomial filtering. Perhaps more important is the excellent flexibility of polynomial filtering in enabling various desired reduction features. For example, an appropriate choice of the filter will yield an arbitrarily smooth transition from the unwanted components (small singular values) to the wanted ones (large singular values) in contrast with the discontinuous cut-off which characterizes Truncated SVD (TSVD). Also, some applications may require an accurate projection (high degree polynomial) while for others this would be wasteful or even counter-productive.

Society is currently facing an explosive surge of exploitable information in scientific, engineering, and economical applications. The rapidly increasing sizes of the data sets becoming available is starting to render inadequate many of the algorithms used in `data exploration' in spite of their merits when computational costs are set aside. The methods investigated in this research will address the issue of cost by taking a new approach which completely avoids the bottleneck of the classical algorithms. If fully successful the methods to be developed may significantly enhance the capabilities of current state-of-the-art methods used in key areas of information technology. For example, preliminary studies have shown that when processing a query in a database, the proposed methods can sometimes offer a tenfold gain in speed relative to standard methods, without any loss of accuracy. The investigating team will also extend this approach to the problem of face recognition. The method also has excellent prospects in other potential applications including image processing, and medical tomography.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0510131
Program Officer
Junping Wang
Project Start
Project End
Budget Start
2005-07-01
Budget End
2009-06-30
Support Year
Fiscal Year
2005
Total Cost
$271,647
Indirect Cost
Name
University of Minnesota Twin Cities
Department
Type
DUNS #
City
Minneapolis
State
MN
Country
United States
Zip Code
55455