Most medical knowledge and patient record data are represented as natural language. Record-based clinical research in the areas of outcome analysis, epidemiology, and health services research are dependent upon the organization of patient data into analyzable categories. Thus classifying patient events (diagnoses, procedures, or findings) is critical for the conduct of research based on the patient record. Any progress in computer assisted medical text classification would directly contribute to the efficient conduct of clinical research deriving from patient data. This proposal seeks to bring state of the art information retrieval techniques to bear on the problem of computer classification of clinical phrases about patients. Success in this effort will make possible patient record-based research that includes text descriptions, a practice presently too costly or tedious to conduct widely in most medical centers. We outline experimental variations on lexicon based word and phrase mapping into canonical form using the CLARIT system from Carnegie Mellon University. This work will include synonym mapping, phrase recognition, and the assignment of term weights for information matrix construction. We have evaluated a modification of the Latent Semantic Indexing (LSI) information retrieval technique to exploit the rich structure of the UMLS Metathesaurus. We propose refinements on our preliminary work, which constitute testable strategies for incorporating several weighting options, multidimensional structures, and ancillary information resources such as the complete ICD-9-CM. Because this task is dependent on the computationally demanding singular value decomposition (SVD) to create principal components for statistical mapping, we include a consortium agreement with the University of Minnesota to address algorithmic variations suited to our sparse information matrix structure. This aspect of our proposal will make the initial solution of SVD practical, removing its present dependence on supercomputers. However, application of our proposed techniques, once a solution is computed, can be undertaken on personal computers. Our proposal promises to improve computer-assisted classification of medical text by using the structured knowledge sources of the UMLS and its contributing nosologies in an application of LSI. This research minimizes dependence on hand built semantic networks, focusing on statistical decomposition of existing classification structures, enriched by lexicon based preprocessing of medical text sources. These techniques apply equally to classifying patient records and processing natural language inquiries of these databases, thereby broadening the scope and opportunity for research based on clinical records.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM005416-03
Application #
2237806
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Project Start
1992-09-30
Project End
1995-09-29
Budget Start
1994-09-30
Budget End
1995-09-29
Support Year
3
Fiscal Year
1994
Total Cost
Indirect Cost
Name
Mayo Clinic, Rochester
Department
Type
DUNS #
City
Rochester
State
MN
Country
United States
Zip Code
55905
Brown, S H; Lincoln, M; Hardenbrook, S et al. (2001) Derivation and evaluation of a document-naming nomenclature. J Am Med Inform Assoc 8:379-90
McDonald, F S; Chute, C G; Ogren, P V et al. (1999) A large-scale evaluation of terminology integration characteristics. Proc AMIA Symp :864-7
Chute, C G; Elkin, P L; Sherertz, D D et al. (1999) Desiderata for a clinical terminology server. Proc AMIA Symp :42-6
Chute, C G; Elkin, P L; Fenton, S H et al. (1998) A clinical terminology in the post modern era: pragmatic problem list development. Proc AMIA Symp :795-9
Chute, C G; Elkin, P L (1997) A clinically derived terminology: qualification to reduction. Proc AMIA Annu Fall Symp :570-4
Elkin, P L; Mohr, D N; Tuttle, M S et al. (1997) Standardized problem list generation, utilizing the Mayo canonical vocabulary embedded within the Unified Medical Language System. Proc AMIA Annu Fall Symp :500-4
Chute, C G; Cohn, S P; Campbell, K E et al. (1996) The content coverage of clinical classifications. For The Computer-Based Patient Record Institute's Work Group on Codes & Structures. J Am Med Inform Assoc 3:224-33
Chute, C G; Crowson, D L; Buntrock, J D (1995) Medical information retrieval and WWW browsers at Mayo. Proc Annu Symp Comput Appl Med Care :903-7
Chute, C G; Yang, Y (1995) An overview of statistical methods for the classification and retrieval of patient events. Methods Inf Med 34:104-10
Yang, Y; Chute, C G (1995) Sampling strategies in a statistical approach to clinical classification. Proc Annu Symp Comput Appl Med Care :32-6

Showing the most recent 10 out of 13 publications