Understanding mechanisms of action is key to improving psychosocial interventions for cancer and other chronic disease conditions. In cancer, emotional expression has been identified as one possible mediator of the effect of psychosocial intervention on patient-reported outcomes. However, scientific evaluations of psychological mechanisms of adjustment to cancer and other chronic diseases are constrained by limitations associated with self-report measures. Because self-care resources, peer-to-peer networks, and more recent forms of psychosocial intervention are increasingly being delivered online, linguistic and behavioral data can be used to characterize internal coping processes, social interactions, and other manifest behaviors. Few tools are currently available for harnessing text as a potential data source, and signal detection indices of existing tools leave room for considerable improvement in these methodologies (Bantum &Owen, 2009). In the present study, natural language processing and other tools of computational linguistics will be used to develop a machine-learning classifier to identify emotional expression in electronic text data.
The aims of the study are: 1) to annotate a large text corpus from cancer survivors using an objective and reliable emotion-coding procedure, 2) to incorporate linguistic and psychological features into a machine-learning classification method and identify which of these features are most strongly associated with codes assigned by trained human raters, and 3) to develop combined psychological and natural language processing (NLP) methods for identifying linguistic markers of emotional coping behaviors. To accomplish these aims, a comprehensive corpus of emotionally-laden cancer communications will be developed from 5 existing linguistic datasets. Five raters will be selected and undergo a rigorous training procedure for coding emotional expression using an emotion-coding system previously developed by the research. Coding will take place using an Internet-based coding interface that will allow the investigators to continuously monitor inter-rater reliability. Simultaneous with the coding process, the investigators will link the electronic text data with key linguistic and psychological features, including Linguistic Inquiry and Word Count (LIWC), Affective Norms for English Words (ANEW), WordNet, part of speech tags, patterns of capitalization and punctuation, emoticons, and textual context. A machine-learning classifier, using tools of natural language processing, will then be applied to the text/feature data and validated against human-rated emotion codes. The long-term objective of this research is to advance a methodology for objectively identifying coping behavior, particularly emotional expression, in order to supplement self-report measures and improve scientific understanding of adjustment to chronic disease, trauma, or other psychological conditions. This work is essential for identifying mechanisms of action in psychosocial interventions for cancer survivors and others and has significance for the fields of medicine, psychology, computational linguistics, and artificial intelligence.

Public Health Relevance

Identifying specific emotional, cognitive, and behavioral factors that contribute to adjustment to cancer and other chronic diseases is essential for being able to develop and improve effective interventions to promote health and well-being. To date, the study of these factors as mechanisms of action has been limited to self-report measures that may not correlate well with other more objective indicators. The proposed study will improve our ability to identify mechanisms of action by supplementing self-report measures with objectively identified markers of coping behaviors such as emotional expression in natural language used by individuals living with cancer.

National Institute of Health (NIH)
National Cancer Institute (NCI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BBBP-D (52))
Program Officer
Arora, Neeraj K
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Loma Linda University
Schools of Arts and Sciences
Loma Linda
United States
Zip Code
Zhang, Shaodian; Bantum, Erin; Owen, Jason et al. (2014) Does sustained participation in an online health community affect sentiment? AMIA Annu Symp Proc 2014:1970-9