Electronic cigarettes (or e-cigarettes) are currently a popular emerging tobacco product. Because e-cigarettes do not generate toxic tobacco combustion products produced when smoking regular cigarettes, they are perceived and sometimes promoted as a less harmful alternative to smoking and also as a means to quit smoking. Although they may be less harmful, the ef?cacy of using them for smoking cessation has not been demonstrated conclu- sively with studies indicating evidence both favoring and opposing such an application for them. Furthermore, owing to their recent introduction, there are also safety concerns given reported adverse events. The US Federal Drug Administration (FDA) has introduced regulations that went into effect on 8/8/2016 requiring FDA review for e-cigarette products, banning sales to minors and free samples, and requiring warning labels on certain prod- ucts. In this context, surveillance of evolving themes and factors contributing to message popularity for e-cigarette chatter on social media platforms is an important activity. Twitter has become the favorite network for teenagers and young adults owing to the short message size and associated ease of use on smart phones. For an emerg- ing product like e-cigarettes, the asymmetric follower-friend connections and hashtag functionality in Twitter offer a convenient way to propagate information and facilitate discussion. Among online forums, Reddit allows for longer messages from users inviting speci?c feedback from other users. Within Reddit, the e-cigarette subreddit facilitates focused discussions on e-cigarette use and products. In this project, we propose to computationally analyze the contents and user pro?les available in the dataset of all e-cigarette tweets generated during 7/2016? 6/2017 and all e-cigarette subreddit posts/comments generated since 9/2016. We will continue such analyses with data collected through free but rate limited API throughout the duration of the project. Our ?rst aim is to sur- face speci?c themes of interest directly from e-cigarette messages using phrase based online and binned topic models. We expect these themes to complement familiar broad themes that researchers currently consider when analyzing online messages. Next, we will identify factors (involving message content and pro?le characteristics) that contribute to different notions of popularity (#retweets, #replies, #up-votes) of e-cigarette tweets/messages. We expect these results will help health agencies, the FDA, and researchers gain insights into observed viral nature of certain messages and designing effective strategies to maximize diffusion of their messages. Finally, we will conduct these analyses along the dimensions of gender, race, and age to grasp variations in themes and popularity factors speci?c to different vulnerable demographic segments.

Public Health Relevance

Electronic cigarettes (e-cigarettes) have emerged as the main smoke-free alternative to regular cigarettes over the past few years. While the ongoing healthy scienti?c debate about their long term health effects and their suitability for smoking cessation are important, in this project, we propose computational approaches toward ?ne-grained surveillance of speci?c themes, factors in?uencing message popularity, and demographic variations. The overarching goal is to create new affordances for researchers and health agencies to leverage online social media platforms for knowing and reaching their audience in effective ways.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Blake, Kelly D
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Kentucky
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code