The importance of understanding interactions among social, behavioral, environmental, and genetic factors and their relationship to health has led to greater interest in studying these determinants of disease in the biomedical research community. While some knowledge exists regarding contributions of specific determinants such as socioeconomic status, educational background, tobacco and alcohol use, and genetic susceptibility to particular diseases or conditions, enhanced methods are needed to analyze and ascertain interrelationships among multiple determinants and to discover potentially unexpected relationships that may ultimately contribute to improving patient care and population health. The increased adoption of electronic health record (EHR) systems has the potential for enhanced collection and access to a wide range of information about an individual's lifetime health status and health care to support a range of "secondary uses" such as biomedical, behavioral and social science, and public health research. Traditionally, clinicians document an individual's health history in clinical notes, including social and behavioral factors within the "social histor" section and familial factors in the "family history" section. While some EHR systems have specific modules for collecting social and family history in structured or semi-structured formats, a large amount of this information is recorded primarily in narrative format, thus necessitating the need for automated methods to facilitate the extraction and integration of social, behavioral, and familial factors for subsequent uses. Once extracted, knowledge acquisition and discovery methods can be applied to both confirm known relationships relative to specific diseases or conditions as well as to potentially discover new relationships. We hypothesize that advanced computational methods can transform social, behavioral, and familial factors from the EHR into a rich longitudinal resource for generating knowledge regarding various determinants of health including their temporal progression, severity, and relationship to health conditions. Towards this goal, the specific aims are to: (1) develop comprehensive information models and natural language processing (NLP) techniques to represent, extract, and integrate social, behavioral, and familial factors from social and family history information in the EHR, (2) adapt and extend data mining techniques to identify non-temporal and temporal relationships among these factors and diseases, and (3) evaluate and validate known and candidate new relationships for specific conditions (pediatric asthma and epilepsy). This multi-site proposal will involve a transdisciplinary team of investigators from the University of Vermont and University of Minnesota, use of EHR data from both institutions, and collaborative development and evaluation of the NLP and data mining techniques. Ultimately, this work has the potential to provide a generalizable approach for supporting and enhancing existing knowledge regarding the interactions among social, behavioral, and familial factors and diseases.

Public Health Relevance

The ability to systematically collect and analyze social, behavioral, and familial factors from the electronic health record using automated methods could assist in developing a rich longitudinal resource for enhancing knowledge regarding the interactions among these factors and diseases. This knowledge could ultimately contribute to improving patient care and population health. !

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Vermont & St Agric College
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Chen, Elizabeth S; Sarkar, Indra Neil (2014) Mining the electronic health record for disease knowledge. Methods Mol Biol 1159:269-86
Chen, Es; Garcia-Webb, M (2014) An analysis of free-text alcohol use documentation in the electronic health record: early findings and implications. Appl Clin Inform 5:402-15