Major depressive disorder is one of the most common debilitating illnesses in the United States, with a lifetime prevalence of 16.2%. Currently, nationwide mental health surveillance takes the form of large-scale telephone- based surveys. These surveys have high running costs and require teams of human telephone operators. Even the largest system, the Behavioral Risk Factor Surveillance System, reaches only 0.13% of the US population. Twitter (and other microblog services) offers a rich, if terse, multilingual source of real time data for public health surveillance. Natural Language Processing (NLP) provides techniques and resources to "unlock" data from text. We propose using Twitter and NLP as a cost-effective and flexible approach to augmenting current telephone- based surveillance methods for population level depression monitoring. This grant application has two major strands. First, investigating ethical issues and challenges to privacy that emerge with the use of Twitter data for public health surveillance (Aim One). Second, developing techniques and resources for real-time public health surveillance for mental illness from Twitter (Aim Two &Aim Three).
Aim One seeks to investigate and codify our responsibilities as researchers towards Twitter users by engaging with those users directly.
With Aim Two, we will build and evaluate Natural Language Processing resources - algorithms, lexicons and taxonomies - to support the identification of depression symptoms in Twitter data.
For Aim Three, we will build and evaluate Natural Language Processing modules and services that use Twitter as a data source for monitoring depression levels in the community. The significance of the proposed work lies in three areas. First, our investigations - both empirical and theoretical - will explore ethical issues in the use of Twitter for public health surveillance. This work has the potential to guide future research in the area. Second, in developing and evaluating algorithms and resources for identifying depression from tweets, we are contributing foundational work to the field of NLP. Third, developing these algorithms and resources will provide the bedrock for building social media based surveillance systems which will provide a cost effective means of augmenting current mental health surveillance practice. This proposal is innovative in both its application area (microblogs have not been used before for mental health surveillance), its focus on using NLP to identify depressive symptoms for public health, and in the central role that qualitative bioethical research will play in guiding the work.
The proposed research focuses on using advanced Natural Language Processing methods to mine microblog data - in this case, Twitter - for mental health surveillance (specifically, depression surveillance), in order to augment current telephone-based mental health surveillance systems. The research has public health at its core.