? ? The electronic medical record (EMR) offers impressive opportunities for increasing care quality, but challenges stand in the way of realizing this vision. For example, coded EMR data readily available for analysis typically are incomplete (due to the prevalence of free-text clinical notes in EMR implementations), and data from different EMRs are often incommensurate due to differences in standard vocabularies and system implementations. While informatics research has shown the feasibility of automatically coding specific aspects of clinical text using Natural Language Processing (NLP), challenges remain for translating these informatics developments into large-scale care quality assessments. To date, successful NLP solutions for automated quality assessment have tended to be applications that are specific to (a) the target problem or clinical focus, (b) the EMR data system, and (c) the person or team that implements the NLP solution. In this study, we propose to begin addressing the problem of implementation team specificity by developing, evaluating, and making freely available a generalizable NLP development tool suite. The tools will enable widespread adoption of NLP systems to extract and code data from free text clinical notes. The Knowledge Editing Toolkit will simplify development of problem-specific knowledge by helping the user define the rules, concepts, and terms that constitute a domain-specific knowledge module, thus allowing any informaticist to develop an NLP application. The NLP Application Validation Toolkit will allow rapid testing and evaluation of the application against a gold standard of independently-coded test records from any EMR. To evaluate the effects of the toolkits on NLP generalizability, we will have three clinical informaticists each build two NLP applications (for a total of six distinct applications). One of their applications will identify a constellation of common clinical signs or symptoms (e.g., """"""""persistent cough"""""""") that are relatively discrete concepts using simple language terms for many different clinical purposes. Their second application will assess behavioral counseling (e.g., """"""""alcohol counseling""""""""), which uses complex language constructs for dedicated clinical purposes. We will describe and evaluate the accuracy of the solutions against independently coded test sets of medical records. We will quantify and compare the difficulty of creating these solutions as measured by the time, number of iterations required to build the applications, and the number of concepts and rules employed, as well as analyze variability in content and accuracy of the solutions created. In addition, we will use qualitative techniques to assess the ease of using the development tools; the difficulty in learning the tools; and specific types of problems, limitations, and bugs encountered. Such an NLP development tool suite has the potential to allow simple, elegant, and reliably good NLP solutions regardless of the clinical problem domain or the person developing the solution. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Exploratory/Developmental Grants (R21)
Project #
1R21LM009728-01A1
Application #
7529967
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
2008-09-30
Project End
2010-09-29
Budget Start
2008-09-30
Budget End
2009-09-29
Support Year
1
Fiscal Year
2008
Total Cost
$213,300
Indirect Cost
Name
Kaiser Foundation Research Institute
Department
Type
DUNS #
150829349
City
Oakland
State
CA
Country
United States
Zip Code
94612
Sittig, Dean F; Hazlehurst, Brian L; Brown, Jeffrey et al. (2012) A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogenous clinical data. Med Care 50 Suppl:S49-59
Sittig, Dean F; Hazlehurst, Brian L (2012) Informatics grand challenges in multi-institutional comparative effectiveness research. J Comp Eff Res 1:373-6