Social ?big data? holds information with wide-ranging implications for addressing issues along the HIV care continuum. Social big data refers to information from social media and online platforms on which individuals and communities create, share, and discuss content. One in four people worldwide, or over a billion people, are publically documenting their activities, intentions, moods, opinions, and social interactions on these sites. They are doing so with increasing volume and velocity, including 400 million ?tweets? per day on Twitter and 4.75 billion content items shared per day on Facebook. With an increasing number of these platforms supporting access to publicly-available user data, social big data analysis is a promising new approach for attaining organic observations of behavior that can be used to monitor and predict real-world public health problems, such as HIV incidence. New tools such as social data are therefore needed to supplement existing HIV data collection methods. In preliminary research, our team developed the first approach that identifies psychological and behavioral characteristics from social big data (>550 million tweets) found to be associated with HIV diagnoses. Since groups at the highest risk for HIV (e.g., minority populations) are the fastest growing Twitter users, and because social media users have been found to publicly share personal information, we identified and collected tweets suggesting HIV risk behaviors (e.g., drug use, high-risk sexual behaviors, etc.) and modeled them alongside CDC statistics on HIV diagnoses. We found a significant positive relationship between HIV- related tweets and county-level HIV cases, controlling for socioeconomic status measures and other variables. The problem is that this approach is not currently scalable for use by HIV researchers and public health organizations. Although public health agencies are interested in mining social data to address HIV, current tools are not accessible to most health scientists, as the tools require advanced computer science expertise. For example, analyzing 500 million tweets a day requires expertise in big data engineering, advanced machine learning, natural language processing, and artificial intelligence. Developing a single platform for mining social data that has been designed and tested by and for HIV researchers could provide a significant impact on HIV prevention, testing, and treatment. We seek to create a single automated platform that collects social media data; identifies, codes, and labels tweets that suggest HIV-related behaviors; and ultimately predicts regional HIV incidence. Because of the potential ethical issues associated with mining people's data, we also seek to interview staff at local and regional HIV organization and participants affected by HIV to gain their perspectives on the ethical issues associated with this approach. The software developed from this application will be shared with HIV researchers and health care workers to provide additional tools that can be used to combat the spread of HIV.

Public Health Relevance

Surveillance and monitoring of HIV and related risk behaviors is a top priority for public health organizations. This project is of particularly high impact because it seeks to 1) develop software to allow (non- technical) researchers to analyze real-time conversations from social media big data to monitor HIV diagnoses, and 2) interview participants to learn about the ethical issues that need to be addressed when building this software. The software developed from this application will be shared with HIV researchers and health care workers to provide additional tools that can be used to combat the spread of HIV.

National Institute of Health (NIH)
National Institute of Allergy and Infectious Diseases (NIAID)
Research Project (R01)
Project #
Application #
Study Section
Behavioral and Social Consequences of HIV/AIDS Study Section (BSCH)
Program Officer
Mckaig, Rosemary G
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of California Los Angeles
Family Medicine
Schools of Medicine
Los Angeles
United States
Zip Code
Garett, Renee; Liu, Sam; Young, Sean D (2018) The Relationship Between Social Media Use and Sleep Quality among Undergraduate Students. Inf Commun Soc 21:163-173
Young, Sean D; Mercer, Neil; Weiss, Robert E et al. (2018) Using social media as a tool to predict syphilis. Prev Med 109:58-61
Liu, Sam; Young, Sean D (2018) A survey of social media data analysis for physical activity surveillance. J Forensic Leg Med 57:33-36
Young, Sean D; Zhang, Qingpeng (2018) Using search engine big data for predicting new HIV diagnoses. PLoS One 13:e0199527
Young, Sean D; Torrone, Elizabeth A; Urata, John et al. (2018) Using Search Engine Data as a Tool to Predict Syphilis. Epidemiology 29:574-578
Garett, Renee; Liu, Sam; Young, Sean D (2017) A longitudinal analysis of stress among incoming college freshmen. J Am Coll Health 65:331-338
Huang, Emily; Marlin, Robert W; Young, Sean D et al. (2016) Using Grindr, a Smartphone Social-Networking Application, to Increase HIV Self-Testing Among Black and Latino Men Who Have Sex With Men in Los Angeles, 2014. AIDS Educ Prev 28:341-50