Matrix computations are ubiquitous in scientific and engineering applications. The existence of high-quality algorithms and resulting software packages, such as LAPACK and MATLAB, greatly alleviates the burden on scientists and engineers, allowing them to devote their energy to the particular application at hand.

This proposal is addressed towards identifying and solving some novel and fundamental matrix problems that arise in modern applications. The last decade has seen significant advances in communications technology and infrastructure; prime examples are the world-wide web, wireless computing and sensor networks. These modern applications lead to a vast variety of interesting problems --- these range from complex design constructions, data warehousing and retrieval problems, to the need for intelligent analysis of huge amounts of data. As a response to these needs, many new matrix problems arise that need to be solved robustly and efficiently.

Some of the most challenging problems in the analyses of large data matrices are the related problems of clustering, co-clustering (simultaneous clustering of rows and columns), and matrix approximations. In many applications, the data is not Gaussian, and hence traditional techniques, such as singular value decomposition (SVD) and classical Euclidean k-means are often inadequate. The proposed project formulates new matrix approximation problems related to co-clustering: these formulations appeal to a generalized maximum entropy principle to search for structured matrix approximations that arise naturally for an important class of statistical distributions, known as exponential families. The aim of the project is to develop the theory as well as robust, numerical software for these matrix computations.

Another important class of problems contains varied inverse problems: given a set of desirable properties, construct a data matrix that satisfies these properties. Such inverse problems arise in diverse applications, such as the design of kernel similarity matrices for statistical learning and the design of Gram matrices with low cross-correlations for wireless communications. The plan is to use alternate projection schemes, and numerical optimization to solve such design problems. The outcome of the project would be a unified framework for these inverse problems, including modeling, algorithms and software.

It is anticipated that there will be two broad research impacts of the proposed project. The first contribution would be to applied mathematics and computer science --- the formulation and proposed solutions of the novel matrix problems that arise naturally in applications. It is hoped that the proposed work will vitalize the search for better algorithms, and further deepen the connections between applied mathematics and engineering. The second contribution would be to the applications themselves. Posing particular application tasks in terms of well-defined matrix problems should lead to faster and more reliable methods.

The project will support the graduate education of 2 or 3 students. These students will be trained in the multidisciplinary areas of computational linear algebra, data mining, statistical pattern recognition and wireless computing. Specialized courses in all these areas are available in the Computer Science, Applied Mathematics and Electrical & Computer Engineering departments at UT Austin. Such a broad education will fill the need for substantial academic and industrial demand in these multidisciplinary areas for educated personnel. The project will also support research experience for one undergraduate, and a conscious effort will be made to recruit women and minority students. Results of the project will be published in the linear algebra, data-mining, and information theory literatures, contributing to each of these areas as well as their cross-fertilization. Data and software developed under the project will be shared with the scientific community via a public web site.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Type
Standard Grant (Standard)
Application #
0431257
Program Officer
Lenore M. Mullin
Project Start
Project End
Budget Start
2004-08-15
Budget End
2009-07-31
Support Year
Fiscal Year
2004
Total Cost
$230,000
Indirect Cost
Name
University of Texas Austin
Department
Type
DUNS #
City
Austin
State
TX
Country
United States
Zip Code
78712