The adoption of electronic health record (EHR) systems is a national healthcare priority. However studies show massive physician productivity drop of up to 25-40% upon transition to EHR. The majority of workflow delay is based on the need to perform manual operations to fill structured forms within the EHR, as opposed to simple unstructured narratives used in traditional written notes and transcriptions. Vanguard Medical Technologies (VMT), under NIH grant 1R43LM010750, proved feasibility for DocTalk, a real-time, speech-driven, open-source augmented, small practice encounter recording system that processes voice to text to structured medical data to EHR input, utilizing integrated automated speech recognition (ASR) and natural language processing (NLP) in the cloud. While NLP accuracy in Phase I was high, voice accuracy prior to physician review was inadequate. Fortunately, the tight integration of ASR and NLP combined with the formal structure of physician notes offers unique context based approaches to address the challenge. Current speech recognition methods use a single general-purpose medical lexicon to train a recognizer when identifying words. Medical context-specific probabilities are ignored. The four Specific Aims of this Phase I SBIR project are to: 1. Create a textual corpus for each section of a patient encounter note by processing 1 million text based narrative structured encounter notes 2. Build a family of Section-Specific Statistical Language Models (SS-SLMs) specialized in recognizing speech pertaining to each specific section of a patient encounter note, using industry standard open source statistical language modeling tools. 3. Use NLP techniques to infer patterns of language usage from text of each section, a. To detect section boundaries to be used as trigger words for invoking SS-SLMs b. To determine characteristic word distributions of each section 4. Assess improvement in accuracy per section due to use of SS-SLMs, with the goal of 50% overall reduction of errors compared to non-section-specific SLMs in the same medical dictation system.

Public Health Relevance

Successful completion of this innovative proposed program of NLP-enhanced context based ASR, will provide the accuracy required to deploy an integrated, interactive, intuitive, low-cost data entry system for small practice primary care physicians. The augmented DocTalk system will enable physicians to increase usable information, avoid third-party transcription errors, and mitigate workflow delays. Increased small practice EHR adoption directly addresses national healthcare goals.

National Institute of Health (NIH)
National Center for Advancing Translational Sciences (NCATS)
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-HDM-R (11))
Program Officer
Sawczuk, Andrea
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Health Fidelity, Inc.
Menlo Park
United States
Zip Code