Patterns in Twitter data have revolutionized understanding of public health events such as influenza outbreaks. While researchers have begun to examine messaging related to substance use on Twitter, this project will strengthen the use of Twitter as an infoveillance tool to more rigorously examine nicotine, tobacco, and cancer- related communication. Twitter is particularly suited to this work because its users are commonly adolescents, young adults, and racial and ethnic minorities, all of whom are at increased risk for nicotine and tobacco product (NTP) use and related health consequences. Additionally, due to the openness of the platform, searches are replicable and transparent, enabling large-scale systematic research. Therefore, our multidisciplinary team of experts in diverse relevant fields?including public health, behavioral science, computational linguistics, computer science, biomedical informatics, and information privacy and security?will build upon our previous research to develop and validate structured algorithms providing automated surveillance of Twitter?s multifaceted and continuously evolving information related to NTPs. First, we will qualitatively assess a stratified random sample of relevant NTP-related tweets for specific coded variables, such as the message?s primary sentiment and other key information of potential value (e.g., whether a message involves buying/selling, policy/law, and cancer-related communication). Tweets will be obtained directly from Twitter using software we developed that leverages a comprehensive list of Twitter-optimized search strings related to NTPs. Second, we will statistically determine what message characteristics (e.g., the presence of certain words, punctuation, and/or structures) are most strongly associated with each of the coded variables for each search string. Using this information, we will create specialized Machine Learning (ML) algorithms based on state-of-the-art methods from Natural Language Processing (NLP) to automatically assess and categorize future Twitter data. Third, we will use this information to provide automatic assessment of current and future streaming data. Time series analyses using seasonal Auto-Regressive Integrated Moving Averages (ARIMA) will determine if there are significant changes over time in volume of messaging related to each specific coded variables of interest. Trends will be examined at the daily, weekly, and monthly level, because each of these levels is potentially valuable for intervention. To maximize the translational value of this project, we will partner with public health department stakeholders who are experts in streamlining dissemination of actionable trends data. In summary, this project will substantially advance our understanding of representations of NTPs on social media?as well as our ability to conduct automated surveillance and analysis of this content. This project will result in important and concrete deliverables, including open-source algorithms for future researchers and processes to quickly disseminate actionable data for tailoring community- level interventions.

Public Health Relevance

For this project, we gathered a team of public health researchers and computer scientists to leverage the power of Twitter as a novel surveillance tool to better understand communication about nicotine and tobacco products (NTPs) and related messages about cancer and cancer prevention. We will gather a random sample of Twitter messages (?tweets?) related to NTPs and examine them in depth and use this information to create specialized computer algorithms that can automatically categorize future Twitter data. Then, we will examine changes over time related to attitudes towards and interest in NTPs, as well as cancer-related discussion around various NTPs, which will dramatically improve our ability to better understand Twitter as a tool for this type of surveillance.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Blake, Kelly D
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Colditz, Jason B; Chu, Kar-Hai; Emery, Sherry L et al. (2018) Toward Real-Time Infoveillance of Twitter Health Messages. Am J Public Health 108:1009-1014
Chu, Kar-Hai; Colditz, Jason B; Primack, Brian A et al. (2018) JUUL: Spreading Online and Offline. J Adolesc Health 63:582-586