U.S. health officials are struggling to manage health conditions, and disease outbreaks (e.g., flu) affecting underserved communities. Yet, early warnings can be found in public postings made by citizens of these communities in social apps like Twitter. Recent studies from the Pew Research Center indicate that minorities are as likely to own a mobile phone as non-minorities, and are avid users of social apps. Since public Twitter posts can be searched and accessed without having a friendship relation with the author, this platform could provide health officials with the analytics capability to track which diseases are being discussed in a given region and at a specific time. Unfortunately, the software tools to perform these analytic tasks are still in the early stages of development. Very often, the expertise to use the required big data software and machine learning programs is not readily available, limiting access to officials working with underserved communities. In this project, we seek to conduct basic research aimed at designing, implementing and testing an open-source research prototype for an integrated and scalable platform to search Twitter posts, and analyze their contents in search for clues about health conditions, thereby understanding the health issues affecting underserved communities, and making predictions about possible health conditions that might affect them in the future.
In Aim 1, we will build an automated Twitter data warehouse to collect, index, and query public posts.
In Aim 2, we will build a predictive analytics engine that uses social data to make predictions about possible outbreaks of conditions, regions that might be affected and at-risk groups. Finally, in Aim 3, we will build mobile and web apps, with a map-based interface, to query and visualize the health data. The value- added capability of our system is the ability to work as an integrated system to help analyze tweets, visualize data along disease and spatio-temporal attributes, and make predictive analytics, all under one roof. This could have a significant impact on public health disease tracking and response. The University of Puerto Rico, Mayagez (UPRM) is a Hispanic serving institution, with the second largest Hispanic serving engineering school in the U.S. and with 35% female enrollment. This AREA project provides a unique opportunity to train students in social media analysis, big data systems, machine learning, and predictive analytics.

Public Health Relevance

U.S. health officials are struggling to manage health conditions, and disease outbreaks (e.g., flu) affecting underserved communities. Yet, early warnings can be found in public postings made by citizens of these communities in social apps like Twitter. In this project we shall build an open-source system to search public Twitter posts, and analyze their contents in search for clues about health conditions, thereby understanding the health issues affecting underserved communities, and making predictions about possible health conditions that might affect them in the future.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15LM012275-01
Application #
9022751
Study Section
Special Emphasis Panel (ZRG1-HDM-X (81))
Program Officer
Sim, Hua-Chuan
Project Start
2015-09-17
Project End
2016-08-31
Budget Start
2015-09-17
Budget End
2016-08-31
Support Year
1
Fiscal Year
2015
Total Cost
$312,651
Indirect Cost
$95,151
Name
University of Puerto Rico Mayaguez
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
175303262
City
Mayaguez
State
PR
Country
United States
Zip Code
00681
Rodríguez-Martínez, Manuel (2017) Experiences with the Twitter Health Surveillance (THS) System. Proc IEEE Int Congr Big Data 2017:376-383