III: Small: Unsupervised Feature Selection in the Era of Big Data

Tang, Jiliang; Zhou, Jiayu

Abstract

Feature selection has been proven to be efficient and effective in preparing high-dimensional data for data mining and machine learning applications, especially when the original features are important for model interpretation and knowledge extraction. The growth of data in both size and complexity accelerates rapidly as the dramatic increase of the capacity to collect data. Such big data has imposed tremendous challenges on traditional feature selection methods, which are usually designed to handle homogeneous and static data in a centralized fashion. Meanwhile in many real-world domains big data is unlabeled, which further exacerbates the difficulty. Therefore, the majority of existing feature selection methods are not well prepared for big data, and this thus calls for the development of novel unsupervised feature selection for unlabeled big data. The project extends the state-of-the-art feature selection research to a new frontier of taming big data. It has potential to benefit a number of real-world applications from various disciplines such as Computer Science, Business, Education, Politics, Healthcare and Bioinformatics.

This project proposes a suite of novel approaches for unsupervised feature selection to facilitate the computational understanding of big data, investigating associated fundamental research issues and developing effective algorithms. It consists of three major thrusts. First, it studies various strategies to scale unsupervised feature selection to handle large-scale and distributed data; and investigates distributed unsupervised feature selection with structured features and under asynchronous updates. Second, it develops a family of heterogeneous unsupervised feature selection with multiple types of heterogeneity. Third, it defines the unsupervised feature selection with various streaming scenarios, and develops new algorithms to improve the capability of unsupervised feature selection in handling the corresponding streaming settings. Disparate means are planned to disseminate the project and its findings, including web enabled data and software repositories, books, journal and conference publications, special-purpose workshops or tutorials, and external collaborations. The project lies at the confluence of feature selection, big data analysis, machine learning and data mining. It can be effectively integrated to undergraduate and graduate courses as well as in student research projects.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 1714741
Program Officer: Wei Ding

Project Start
Project End
Budget Start: 2017-08-15
Budget End: 2021-07-31
Support Year
Fiscal Year: 2017
Total Cost: $480,398
Indirect Cost

III: Small: Unsupervised Feature Selection in the Era of Big Data
Tang, Jiliang Zhou, Jiayu
Michigan State University, East Lansing, MI, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments