A large fraction of internet social media content is found in thousands of specialized communities that are hosted by news outlets, typically in the form of reader forums or comments on news articles. The users of the such a site are said to form a vertical social community (VSC), because they deeply engage with a single media source. While each VSC is tiny compared to broad communities such as Facebook, they are important because they expose how different segments of society feel about various world events. This can be a very useful resource for downstream intelligence and predictive analytics. However, current web crawlers cannot effectively access VSCs. Thus their data is invisible to search engines, and remains hidden from analytics tools. The goals of this project are to enable effective access to vertical social communities coalesced at news reports online, and to mine their comments and debates. This project will provide researchers with tools to collect data from these communities and analyze them. The educational component of the project includes the involvement of graduate and undergraduate student training and research and the incorporation of research projects and results in courses.

The researchers will develop algorithms to unearth the content generated at thousands of vertical social communities and make their content transparently accessible to data management and analytics tools. The researchers will develop novel deep learning techniques for content detection, and build a novel scalable end-to-end system for real-time access and collective mining of these communities, capable of handling large parallel data streams based on shifting ideas. The specific algorithms will include user population estimation, bootstrap communication patterns for automatic crawling of content, and fine-grained sentiment analysis for intelligence and predictive analytics. Software tools will be made available to researchers in academe and industry. Distribution of free, open-source software for implementing the techniques developed will enhance existing research infrastructure.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Project Start
Project End
Budget Start
2018-09-15
Budget End
2022-08-31
Support Year
Fiscal Year
2018
Total Cost
$427,912
Indirect Cost
Name
Temple University
Department
Type
DUNS #
City
Philadelphia
State
PA
Country
United States
Zip Code
19122