Nonnegative matrix factorization (NMF) has proven to be an important tool of choice for numerous data analytic problems in text, imaging, and computer vision. It provides advanced mathematical methods for improvements in dimensionality reduction, clustering, etc. A distinguishing feature of the NMF is the requirement of non-negativity in the factors that represent the matrix in a lower rank. This property greatly enhances the interpretability and modeling capability for many applications, where preserving non-negativity is important. This project is studying foundational properties of the NMF, producing new algorithmic methods using the framework of NMF for efficient and effective hierarchical clustering and topic modeling of large scale text data for multi-scale analysis, generating labels for the topics, and interactive analysis. In addition, an interactive visual analytic system for the proposed methods is being developed to make these theoretical and algorithmic discoveries readily available to the research and applications communities. New multi-scale hierarchical methods for generating clusters and discovering topics in the documents and detection of topic changes over time are being explored to enable computationally efficient and perceptually effective ways of exploring text data and discovering latent group structure. Visual analytic systems are also being developed based on this foundational work to enable more effective and informed discovery of topics in a large-scale document collection.

This project will have a significant impact on the analysis and development of NMF algorithms and new modeling of problems for applications utilizing the NMF (e.g., 'Big Data'). The project is yielding effective computational methods with solid analysis that will enhance the analysis of high-dimensional data in broad areas of science, engineering, medicine, and business disciplines beyond the application areas being considered within this project.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
1348152
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2013-08-15
Budget End
2017-07-31
Support Year
Fiscal Year
2013
Total Cost
$175,000
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332