The eTfor2 project will develop and evaluate open-source programs and knowledge representations to better characterize patients for translational and clinical research studies. The project addresses National Library of Medicine (NLM) RFA initiatives for: (a) information &knowledge processing, including natural language processing and text summarization, (b) approaches for linking phenomic and genomic information, and (c) integration of information from heterogeneous sources. Translational studies correlate clinical patient descriptors (phenome) with results of genomic investigations, e.g., genome-wide association studies (GWAS). Standard methods for defining phenotypes require costly, labor-intensive cohort enrollments to identify patients with diseases and appropriate controls. Recently, translational and clinical researchers have used electronic medical record (EMR) data as an alternative to identifying patient characteristics. However, EMR case extraction requires substantial manual review and "tuning" for case selection, due to the inaccuracies inherent in ICD9 billing codes. While relevant and useful natural language processing (NLP) approaches to facilitate EMR text extraction have proliferated, the target patient descriptors these approaches employ typically remain non-standard and locally defined, and vary from disease to disease, project to project and institution to institution. At best, such NLP applications use standard terminology descriptors such as SNOMED-CT as EMR extraction targets. Yet, there is no generally utilized "standard" knowledge base that links such "extractable" descriptors to an academic-quality knowledge source detailing what findings have been reliably reported to occur in each disease. To facilitate translational and clinical research, the eTfor2 project will make available an open-source, evidence-based, electronic clinical knowledge base (KB) and related NLP tools enabling researchers at any site to extract a standard "target" set of EMR-based phenomic descriptors at both the finding and disease levels. It will further include diagnostic decision support logic to confirm the degree of support for patients'diagnoses in their EMR records. The eTfor2 project will decrease effort required to harvest EMR patient descriptors for clinical and translational studies, and enable new translational work that identifies genomic associations at both finding and disease levels. The eTfor2 resources should improve the quality and cross-institutional validity of EMR-based translational and clinical studies.

Public Health Relevance

Evidence-based Diagnostic Tools for Translational and Clinical Research (eTfor2) Project Narrative When completed successfully, the eTfor2 project will enable researchers at disparate institutions to extract from their respective EMR systems a shared target set of common phenomic descriptors, in a standard, reproducible manner. Doing so should improve the quality and cross-institutional validity of EMR-based translational and clinical studies.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Vanderbilt University Medical Center
Internal Medicine/Medicine
Schools of Medicine
United States
Zip Code
Rosenbaum, Benjamin P; Silkin, Nikolay; Miller, Randolph A (2014) Easily configured real-time CPOE Pick Off Tool supporting focused clinical research and quality improvement. J Am Med Inform Assoc 21:564-8
Atreya, Ravi V; Smith, Joshua C; McCoy, Allison B et al. (2013) Reducing patient re-identification risk for laboratory results within research datasets. J Am Med Inform Assoc 20:95-101
Mitchell, J A; Gerdin, U; Lindberg, D A B et al. (2011) 50 years of informatics research on decision support: what's next. Methods Inf Med 50:525-35