An emerging trend in computational linguistics is melding natural language processing (NLP) and machine learning (ML) to help computers make sense of human-generated free text. The blending of these disciplines is relatively rare in biomedical informatics. Past medical NLP/ML research work is biased heavily towards linguistic methods that attempt to reason about grammar and syntax aided by a domain-focal knowledge base (e.g., one for radiology or one for clinical pathology).
The aim of the work proposed here takes a different tack: exploring the utility of a statistical approach to clinical NLP, one augmented by machine learning and concentrating on general progress notes from across multiple clinical domains. The specific clinical goal will be to identify adverse drug events described implicitly or explicitly in inpatient progress notes. Rather than relying on a narrow domain focus to provide enough context restriction to make text interpretation tractable, this approach will use statistical patterns in note author information (e.g., profession, note type, treating ward) and patient information (e.g., admit diagnosis, procedures performed, temporal note relationships) for context restriction. The research component of this proposal is divided into two categories: three small-scale projects designed to rapidly hone new skills developed under the training component, and a large-scale project that assesses the feasibility of cross-discipline clinical text analysis.
Hurdle, John F; Botkin, Jeffery; Rindflesch, Thomas C (2007) Leveraging semantic knowledge in IRB databases to improve translation science. AMIA Annu Symp Proc :349-53 |