CIF: Small: Cluster Analysis for Highly-correlated, Heavy-tailed, and Higher-order Data

Mai, Qing; Zhang, Xin

Abstract

Rapidly advances in modern science and technology are resulting in the generation of data sets of unprecedented sizes and complexity. A common source of complexity in data sets is the presence of subpopulations. For example, a disease may have several subtypes; and customers may be attracted by different features of the same product. Cluster analysis is a popular tool to identify subpopulations, which affords a refined investigation on each of them. This project develops novel clustering methods to reveal the increasingly complex patterns within contemporary data sets. In addition to the allocation of subjects, the clustering methods in this research further find the defining features of each subpopulation. The research team will apply these methods to various real-world problems with potential to affect multiple fields that rely on such data sets. Open source and user-friendly software will also be provided. Moreover, this project will be integrated with educational and outreach activities, including new courses, interdisciplinary training, and mentoring of underrepresented student groups in mathematical and statistical sciences.

Classical clustering methods tend to be inefficient and/or inaccurate when data are highly correlated, heavy-tailed, and/or comprise higher-order tensors. To address these challenges in high-dimensional unsupervised learning problems, the investigators pursue new probabilistic models and statistical methods for clustering of large and complex data. The investigators promote parsimony in the models by the synthesis of the sparsity principle through variable selection and the dimension reduction principle through linear projections. The pursuant probabilistic frameworks enable simultaneous variable selection/dimension reduction, parameter estimation and prediction. By separating and excluding the noise in the data set, efficiency in estimation and prediction is greatly enhanced. Concurrently, parsimony in the models leads to scalable algorithms and new statistical insights.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Computer and Communication Foundations (CCF)
Type: Standard Grant (Standard)
Application #: 1908969
Program Officer: Scott Acton

Project Start
Project End
Budget Start: 2019-10-01
Budget End: 2022-09-30
Support Year
Fiscal Year: 2019
Total Cost: $478,522
Indirect Cost

CIF: Small: Cluster Analysis for Highly-correlated, Heavy-tailed, and Higher-order Data
Mai, Qing Zhang, Xin
Florida State University, Tallahassee, FL, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments