Nonnegative matrix factorization (NMF) factorizes an input nonnegative matrix into two nonnegative matrices of lower rank. It was recently discovered that NMF has unique ability to solve challenging data mining and machine learning problems. The advantage of NMF over existing unsupervised learning methods are (1) NMF can model widely varying data distributions, (2) NMF performs both hard and soft clustering simultaneously. (3) Many other data mining problems such as semi-supervised clustering problems can be reformulated as NMF problem. Building upon these foundations, the investigators propose to establish a NMF-based comprehensive framework for data mining: (a) Provide deeper understanding of NMF's clustering capability; (b) Extend data mining capability of NMF for solving various data mining and machine learning problems; (c) Develop fast numerical algorithms which incorporate the state-of-the-art developments from numerical optimization for various matrix factorization models; (d) Develop novel and rigorous proof strategies to prove the correctness and convergence properties of the numerical algorithms; (e) Apply and evaluate these new algorithms in real-world applications.
The proposed work creates a new paradigm of analyzing vast amount of data and discovering new knowledge from the data by transforming established matrix computational methodologies. This new technology can automatically group news articles into meaningful categories, discover protein modules in protein networks, extract weather patterns in climate data, segment pictures into distinct objects, detect communities on the Web, and enable many other scientific discoveries and new technologies creation. On a fundamental level, the proposed work establishes that a simple matrix factorization in fact solves challenging data mining problems. This research reinforces the importance of mathematics in today's data centric world and encourages students to learn mathematics.