The World-Wide Web (Web 1.0) and online social media (Web 2.0) have revolutionized the ways medical knowledge is disseminated and health information is exchanged and shared among patients, supporters, and health care providers. Online patient communities have been expanding at an impressive rate with millions of active participants from all age groups. Recent studies on researching and analyzing social media contents for health-related applications show that this uprising cyber-trend leads to valuable knowledge, traditionally acquired with scientific methods such as observational epidemiological studies. This new mode for information acquisition is particularly advantageous for studies requiring long period of data curation. We propose to leverage the power of online contents, including user-generated contents on social network sites, to tackle NCI's second provocative question on complex migration patterns and their effect on environmental cancer risk. We hypothesize that the rich amount of personal information shared openly among cancer patients and cancer-free people online can be effectively mined to generate new knowledge on the topic, which cannot be easily uncovered with conventional migrant studies in our modern economy with population mobility patterns far more complex and dynamic than those observed in the past. To achieve our goal, we will build upon our unique cyber-informatics experience at the Oak Ridge National Laboratory (ORNL) on ultra-scale searching, identifying, and understanding free-structured web content. Specifically, we will develop domain-specific informatics tools to automatically reconstruct people's spatiotemporal lifelines, link them to spatiotemporal environmental data available from online sources such as the Environmental Protection Agency, and mine them using machine learning methods to search for salient associations between changes of migration-influenced environmental exposure and cancer risk. These tools will be individually validated and the overall approach will be carefully tested to understand its capabilities, methodological challenges, and practical limitations (if any) for knowledge discovery and scientific explorations in environmental cancer epidemiology. This study has the potential to provide a powerful complementary approach to the standard paradigm of observational epidemiological research. It will offer a fully automated and cost-effective way to discover new trends and monitor evolving ones on the impact of modern population migration patterns and environmental cancer risk. Such information could help cancer epidemiologists and health policy makers generate and prioritize study hypotheses worth testing with carefully controlled and properly powered (but also long term and costly) epidemiological studies.

Public Health Relevance

Web mining has emerged in different domains as a powerful approach to harvesting knowledge of unprecedented quantity, comprehensiveness, and diversity. In this study we propose to pursue web mining in the environmental cancer risk domain. We will develop dedicated cyber-informatics algorithms and tools to (i) automatically search disparate online sources for retrieving and integrating contents related to individuals' cancer history and spatiotemporal environmental exposure profiles, and to (ii) effectively synthesize this information to accelerate knowledge discovery on environmental cancer risk change due to an individual's migration activities.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Research Project (R01)
Project #
5R01CA170508-04
Application #
8876609
Study Section
Special Emphasis Panel (ZCA1)
Program Officer
Hesse, Bradford
Project Start
2012-09-21
Project End
2016-06-30
Budget Start
2015-07-01
Budget End
2016-06-30
Support Year
4
Fiscal Year
2015
Total Cost
Indirect Cost
Name
UT-Battelle, LLC-Oak Ridge National Lab
Department
Type
DUNS #
099114287
City
Oak Ridge
State
TN
Country
United States
Zip Code
37831
Yoon, Hong-Jun; Tourassi, Georgia (2016) Investigating the Association Between Sociodemographic Factors and Lung Cancer Risk Using Cyber Informatics. IEEE EMBS Int Conf Biomed Health Inform 2016:557-560
Yoon, Hong-Jun; Xu, Songhua; Tourassi, Georgia (2016) Predicting Lung Cancer Incidence from Air Pollution Exposures Using Shapelet-based Time Series Analysis. IEEE EMBS Int Conf Biomed Health Inform 2016:565-568
Zhao, Baoquan; Xu, Songhua; Lin, Shujin et al. (2016) A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos. J Am Med Inform Assoc 23:e34-41
Xu, Songhua; Markson, Christopher; Costello, Kaitlin L et al. (2016) Leveraging Social Media to Promote Public Health Knowledge: Example of Cancer Awareness via Twitter. JMIR Public Health Surveill 2:e17
Tourassi, Georgia; Yoon, Hong-Jun; Xu, Songhua (2016) A novel web informatics approach for automated surveillance of cancer mortality trends. J Biomed Inform 61:110-8
Liu, Yang; Xu, Songhua; Tourassi, Georgia (2015) Detecting Rumors Through Modeling Information Propagation Networks in a Social Media Environment. Soc Comput Behav Cult Model Predict (2015) 9021:121-130
Yoon, Hong-Jun; Tourassi, Georgia; Xu, Songhua (2015) Residential Mobility and Lung Cancer Risk: Data-Driven Exploration Using Internet Sources. Soc Comput Behav Cult Model Predict (2015) 9021:464-469
Yoon, Hong-Jun; Tourassi, Georgia (2014) Analysis of Online Social Networks to Understand Information Sharing Behaviors Through Social Cognitive Theory. Annu ORNL Biomed Sci Eng Cent Conf 2014:
Xu, Songhua; Yoon, Hong-Jun; Tourassi, Georgia (2014) A user-oriented web crawler for selectively acquiring online content in e-health research. Bioinformatics 30:104-14
Liu, Yang; Xu, Songhua; Yoon, Hong-Jun et al. (2014) Extracting patient demographics and personal medical information from online health forums. AMIA Annu Symp Proc 2014:1825-34

Showing the most recent 10 out of 12 publications