The World-Wide Web (Web 1.0) and online social media (Web 2.0) have revolutionized the ways medical knowledge is disseminated and health information is exchanged and shared among patients, supporters, and health care providers. Online patient communities have been expanding at an impressive rate with millions of active participants from all age groups. Recent studies on researching and analyzing social media contents for health-related applications show that this uprising cyber-trend leads to valuable knowledge, traditionally acquired with scientific methods such as observational epidemiological studies. This new mode for information acquisition is particularly advantageous for studies requiring long period of data curation. We propose to leverage the power of online contents, including user-generated contents on social network sites, to tackle NCI?s second provocative question on complex migration patterns and their effect on environmental cancer risk. We hypothesize that the rich amount of personal information shared openly among cancer patients and cancer-free people online can be effectively mined to generate new knowledge on the topic, which cannot be easily uncovered with conventional migrant studies in our modern economy with population mobility patterns far more complex and dynamic than those observed in the past. To achieve our goal, we will build upon our unique cyber-informatics experience at the Oak Ridge National Laboratory (ORNL) on ultra-scale searching, identifying, and understanding free-structured web content. Specifically, we will develop domain-specific informatics tools to automatically reconstruct people's spatiotemporal lifelines, link them to spatiotemporal environmental data available from online sources such as the Environmental Protection Agency, and mine them using machine learning methods to search for salient associations between changes of migration-influenced environmental exposure and cancer risk. These tools will be individually validated and the overall approach will be carefully tested to understand its capabilities, methodological challenges, and practical limitations (if any) for knowledge discovery and scientific explorations in environmental cancer epidemiology. This study has the potential to provide a powerful complementary approach to the standard paradigm of observational epidemiological research. It will offer a fully automated and cost-effective way to discover new trends and monitor evolving ones on the impact of modern population migration patterns and environmental cancer risk. Such information could help cancer epidemiologists and health policy makers generate and prioritize study hypotheses worth testing with carefully controlled and properly powered (but also long term and costly) epidemiological studies.
Web mining has emerged in different domains as a powerful approach to harvesting knowledge of unprecedented quantity, comprehensiveness, and diversity. In this study we propose to pursue web mining in the environmental cancer risk domain. We will develop dedicated cyber-informatics algorithms and tools to (i) automatically search disparate online sources for retrieving and integrating contents related to individuals? cancer history and spatiotemporal environmental exposure profiles, and to (ii) effectively synthesize this information to accelerate knowledge discovery on environmental cancer risk change due to an individual's migration activities.
|Xu, Songhua; Yoon, Hong-Jun; Tourassi, Georgia (2014) A user-oriented web crawler for selectively acquiring online content in e-health research. Bioinformatics 30:104-14|