Individuals with chronic diseases rely more and more on online forums, blogs, and mailing lists to exchange information, practical tips, and stories about their conditions and to get emotional support from their peers. While this type of social networking has become central to the daily lives and decision-making processes of many patients, there has been little research on the quality of the content it conveys, as well as its use and impact in the fields of medicine and public health. On the patients' side, forums are surprisingly technologically poor: users have often no choice but to browse through massive numbers of posts while looking for a particular piece of information. The lack of appropriate tools to organize, analyze and ultimately understand the overwhelming number of health-related, patient-written posts hinders researchers from investigating this medium and hinders patients from using this medium to its full potential.

This project aims at helping both patients and health professionals access online patient-authored information by creating tools to search for information in patient forums. The proposed work spans several fields: natural language processing, data management, information retrieval, public health and behavioral medicine, and it will build the foundations for understanding peer patient posts available through online forums and mailing lists.

This proposal aims at bringing together information processing and medical understanding of patient-centric resources. The work in this project will process texts from an emerging medium, which directly addresses the immediate concerns of patients. The tools designed as part of this project will benefit the researchers who study the behaviors and information needs of patients online. These tools, such as the intelligent search engine for posts, will also enhance the experience of the patients themselves, who are avid users of this medium. In addition to the research agenda, this proposal presents an education plan consistent with the overall goal of bridging the gap between researchers in computer science and researchers in medical fields. In particular, a course is presented that introduces methods of intelligent information processing in the context of research questions important to the fields of public health and medicine.

Project Report

Online forums contain mostly user-authored content in a free-text format, usually with very little structured metadata information. Users often face the daunting task of reading a large quantity of text to discover potentially useful information. This project focused on automatically leveraging information from user-authored text to improve search in health-forum applications where users frequently make connections with other users, enabling them to find the right person to answer their questions. It addressed several challenges: identifying the connections, or similarities, between users, combining these connections into an integrated scoring mechanism, ranking the short snippets of forum posts while taking into account these personal connections, and deciding how much pertinent information to return to the user. As part of this project, we designed techniques that compute user similarities via multiple indicators such as shared information needs, user profiles, or topics of interest. We developed a novel, graph-based, multidimensional model that uniformly incorporates the heterogeneous user relations and identifies similar participants to predict future social interactions and enhance keyword search. In addition, we improved over traditional keyword search techniques by re-ranking results using information on the authority of users contributing to the forums. Our experimental evaluation showed that the quality of the ranked results is improved when the various types of connections between users are taken into account in the scoring of answers. A critical challenge for search over user-authored data is to provide results that are as complete as possible and do not miss some relevant information but that are not too broad. We addressed the problem of presenting textual search results in a concise manner to answer user search queries. We designed a novel hierarchical representation and scoring technique for objects at multiple granularities. Using our scores, we can generate results that contain a mixed set of objects, dynamically selecting the best level of focus on the data. We also presented a score optimization algorithm that efficiently chooses the best k-sized non-overlapping result set to ensure that no redundant information is returned. We conducted extensive user studies and showed that a mixed granularity set of results is more relevant to users than standard post-only approaches.

Project Start
Project End
Budget Start
2010-10-01
Budget End
2014-09-30
Support Year
Fiscal Year
2010
Total Cost
$302,268
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
Piscataway
State
NJ
Country
United States
Zip Code
08854