Nonnegative matrix factorization (NMF) factorizes an input nonnegative matrix into two nonnegative matrices of lower rank. It was recently discovered that NMF has unique ability to solve challenging data mining and machine learning problems. The advantage of NMF over existing unsupervised learning methods are (1) NMF can model widely varying data distributions, (2) NMF performs both hard and soft clustering simultaneously. (3) Many other data mining problems such as semi-supervised clustering problems can be reformulated as NMF problem. Building upon these foundations, the investigators propose to establish a NMF-based comprehensive framework for data mining: (a) Provide deeper understanding of NMF's clustering capability; (b) Extend data mining capability of NMF for solving various data mining and machine learning problems; (c) Develop fast numerical algorithms which incorporate the state-of-the-art developments from numerical optimization for various matrix factorization models; (d) Develop novel and rigorous proof strategies to prove the correctness and convergence properties of the numerical algorithms; (e) Apply and evaluate these new algorithms in real-world applications.

The proposed work creates a new paradigm of analyzing vast amount of data and discovering new knowledge from the data by transforming established matrix computational methodologies. This new technology can automatically group news articles into meaningful categories, discover protein modules in protein networks, extract weather patterns in climate data, segment pictures into distinct objects, detect communities on the Web, and enable many other scientific discoveries and new technologies creation. On a fundamental level, the proposed work establishes that a simple matrix factorization in fact solves challenging data mining problems. This research reinforces the importance of mathematics in today's data centric world and encourages students to learn mathematics.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
0915228
Program Officer
Junping Wang
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2009
Total Cost
$200,031
Indirect Cost
Name
University of Texas at Arlington
Department
Type
DUNS #
City
Arlington
State
TX
Country
United States
Zip Code
76019