Nonnegative matrix factorization (NMF) factorizes an input nonnegative matrix into two nonnegative matrices of lower rank. It is recently discovered that NMF in the most basic form is equivalent to a relaxed K-means clustering, the most widely used pattern discovery algorithm in data mining. This direct link between mathematics and data mining sets in motion a large number of developments on using matrix factorizations for pattern discovery. It turns out that NMF provides more consistent and mathematically well-defined optimization formulations for many fundamental and emerging data-mining problems. NMF algorithms have well-understood properties; they are simple and easy-to-implement, well suited for distributed parallel architectures. This research aims to formally establish a comprehensive NMF-based framework for data mining. In particular, we will (1) extend matrix factorization data-mining methodology from current focus on clustering (pattern discovery) to newer problems: semi-supervised clustering (extending partial knowledge to whole data) and classifications (pattern prediction, such as predicting a cancer tumor tissue from a normal one); (2) develop fast numerical algorithms and incorporate state-of-the-art numerical optimization techniques; and (3) apply and evaluate the NMF algorithms in different real-world applications including text mining and bioinformatics.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0844497
Program Officer
Thomas F. Russell
Project Start
Project End
Budget Start
2008-09-15
Budget End
2009-08-31
Support Year
Fiscal Year
2008
Total Cost
$56,000
Indirect Cost
Name
University of Texas at Arlington
Department
Type
DUNS #
City
Arlington
State
TX
Country
United States
Zip Code
76019