The growth of social media data in size and variety accelerates rapidly as more people use social media such as Facebook, Twitter, LinkedIn, among others. It is a massive "treasure trove" interesting to researchers and practitioners of different disciplines, and a great source for data mining. However, attribute-value data in classic data mining differs from social media data besides both are large-scale. Social media data are noisy, incomplete, comprised of multiple sources, and form multi-modal and and multi-attributed networks. Furthermore, such data are not independent and identically distributed (i.i.d.). These unique properties present new challenges for mining social media data.

This project investigates a novel approaches to feature selection in linked data in general and social media data in particular. Specifically, it seeks to exploit link information in supervised as well as unsupervised feature selection for social media data. Because social media data are drawn from multiple noisy, partial, or redundant sources, the proposed approach to feature selection seeks to select relevant sources and use them together to guide linked feature selection in multi-modal, multi-attributed social media.

The project lies at the confluence of feature selection, social media analysis, and data mining. The project offers an opportunity to engage students who are adept users of social media in developing computational tools that can harness the power of social media. Some broader impacts of this research include integration of social media analytics into undergraduate and graduate courses as well as student research projects; enhanced research-based training opportunities for students from under-represented groups; and powerful social media analytics tools for understanding collective behavior in social media, employing social media for crisis response and disaster relief, and studying social and political movements. The results of the project (including publications, software, etc.) will be made available through the project web site:

National Science Foundation (NSF)
Division of Information and Intelligent Systems (IIS)
Standard Grant (Standard)
Application #
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Arizona State University
United States
Zip Code