The importance of understanding interactions among social, behavioral, environmental, and genetic factors and their relationship to health has led to greater interest in studying these determinants of disease in the biomedical research community. While some knowledge exists regarding contributions of specific determinants such as socioeconomic status, educational background, tobacco and alcohol use, and genetic susceptibility to particular diseases or conditions, enhanced methods are needed to analyze and ascertain interrelationships among multiple determinants and to discover potentially unexpected relationships that may ultimately contribute to improving patient care and population health. The increased adoption of electronic health record (EHR) systems has the potential for enhanced collection and access to a wide range of information about an individual's lifetime health status and health care to support a range of """"""""secondary uses"""""""" such as biomedical, behavioral and social science, and public health research. Traditionally, clinicians document an individual's health history in clinical notes, including social and behavioral factors within the """"""""social histor"""""""" section and familial factors in the """"""""family history"""""""" section. While some EHR systems have specific modules for collecting social and family history in structured or semi-structured formats, a large amount of this information is recorded primarily in narrative format, thus necessitating the need for automated methods to facilitate the extraction and integration of social, behavioral, and familial factors for subsequent uses. Once extracted, knowledge acquisition and discovery methods can be applied to both confirm known relationships relative to specific diseases or conditions as well as to potentially discover new relationships. We hypothesize that advanced computational methods can transform social, behavioral, and familial factors from the EHR into a rich longitudinal resource for generating knowledge regarding various determinants of health including their temporal progression, severity, and relationship to health conditions. Towards this goal, the specific aims are to: (1) develop comprehensive information models and natural language processing (NLP) techniques to represent, extract, and integrate social, behavioral, and familial factors from social and family history information in the EHR, (2) adapt and extend data mining techniques to identify non-temporal and temporal relationships among these factors and diseases, and (3) evaluate and validate known and candidate new relationships for specific conditions (pediatric asthma and epilepsy). This multi-site proposal will involve a transdisciplinary team of investigators from the University of Vermont and University of Minnesota, use of EHR data from both institutions, and collaborative development and evaluation of the NLP and data mining techniques. Ultimately, this work has the potential to provide a generalizable approach for supporting and enhancing existing knowledge regarding the interactions among social, behavioral, and familial factors and diseases.

Public Health Relevance

The ability to systematically collect and analyze social, behavioral, and familial factors from the electronic health record using automated methods could assist in developing a rich longitudinal resource for enhancing knowledge regarding the interactions among these factors and diseases. This knowledge could ultimately contribute to improving patient care and population health. !

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Vermont & St Agric College
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Lindemann, Elizabeth A; Chen, Elizabeth S; Rajamani, Sripriya et al. (2017) Assessing the Representation of Occupation Information in Free-Text Clinical Documents Across Multiple Sources. Stud Health Technol Inform 245:486-490
Haddad, Jessica M; Chen, Elizabeth S (2017) Identifying Psychiatric Comorbidities for Obstructive Sleep Apnea in the Biomedical Literature and Electronic Health Record. AMIA Jt Summits Transl Sci Proc 2017:502-511
Rajamani, Sripriya; Chen, Elizabeth S; Lindemann, Elizabeth et al. (2017) Representation of occupational information across resources and validation of the occupational data for health model. J Am Med Inform Assoc :
Aldekhyyel, Ranyah; Chen, Elizabeth S; Rajamani, Sripriya et al. (2016) Content and Quality of Free-Text Occupation Documentation in the Electronic Health Record. AMIA Annu Symp Proc 2016:1708-1716
Hu, Zhen; Melton, Genevieve B; Moeller, Nathan D et al. (2016) Accelerating Chart Review Using Automated Methods on Electronic Health Record Data for Postoperative Complications. AMIA Annu Symp Proc 2016:1822-1831
Wang, Yan; Chen, Elizabeth S; Leppik, Ilo et al. (2016) Identifying Family History and Substance Use Associations for Adult Epilepsy from the Electronic Health Record. AMIA Jt Summits Transl Sci Proc 2016:250-9
Finley, Gregory P; Pakhomov, Serguei V S; McEwan, Reed et al. (2016) Towards Comprehensive Clinical Abbreviation Disambiguation Using Machine-Labeled Training Data. AMIA Annu Symp Proc 2016:560-569
Winden, Tamara J; Chen, Elizabeth S; Melton, Genevieve B (2016) Representing Residence, Living Situation, and Living Conditions: An Evaluation of Terminologies, Standards, Guidelines, and Measures/Surveys. AMIA Annu Symp Proc 2016:2072-2081
Wang, Yan; Chen, Elizabeth S; Pakhomov, Serguei et al. (2016) Investigating Longitudinal Tobacco Use Information from Social History and Clinical Notes in the Electronic Health Record. AMIA Annu Symp Proc 2016:1209-1218
McEwan, Reed; Melton, Genevieve B; Knoll, Benjamin C et al. (2016) NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes. AMIA Jt Summits Transl Sci Proc 2016:150-9

Showing the most recent 10 out of 26 publications