The goal of this research is to develop improved techniques, both fully automated and computer-assisted, for classification of medical text. The technical approach is exemplar-based: robust information retrieval methods find similar, previously-classified texts, and corresponding codes are used to suggest likely classifications for a new text. Phase I focused upon implementing experimental software to establish baseline performance with several variations of the exemplar-based approach. Phase II builds upon this work to implement a complete Coder's Workstation (CWS). Based upon Phase I results and assessments of commercial opportunities, Phase II will focus upon shorter texts (<12 words), which are best suited for automated methods. A """"""""short-similarity"""""""" capability will be added to the Phase I approach to further enhance performance with shorter texts. To evaluate and refine the CWS, Phase II will include extensive """"""""beta testing"""""""" of the software at the Brigham and Women's Hospital and the Mayo Clinic. The major technical innovation of this project is the development of highly automated classification software that is sensitive to term similarities. The major health-related contributions are large potential savings in coding expenses, reduced time demands upon physicians for coding, and improved consistency in classification of free text for research studies.
The proposed technology will have important commercial application within hospitals, insurance companies, and pharmaceutical companies which currently expend significant resources on coding of free text (ICD9, CPT4, COSTART, etc.). The founders of Belmont Research Inc. have extensive experience in creating and marketing software to support biomedical applications.