Llsf Mapping for Indexing and Retrieval of Medline

Yang, Yiming

Abstract

The long-term objective of our research group is to facilitate automatic or semi-automatic classification and retrieval of natural language texts, in support of reducing the cost and improving the quality of computerized medical information. This proposal develops further and applies a novel approach, the Linear Least Squares Fit (LLSF) mapping, to document indexing and document retrieval of the MEDLINE database. LLSF mapping is a statistical method developed by the PI for learning human knowledge about matching queries, documents, and canonical concepts. The goal is to improve the quality (recall and precision) of automatic document indexing and retrieval, which cannot be achieved by surface-based matching without using human knowledge or thesaurus-based matching dependent on manually developed synonyms. This project applies LLSF to MEDLINE, the world's largest and most frequently used on-line database, to evaluate the effectiveness of this method and to explore the practical potential on large scale databases.
The specific aims and methods are: l. To collect data needed for the training and evaluation of the LLSF method. A collaboration with another research institute is planned for utilizing and refining a large collection of MEDLINE retrieval data. A sampling of MEDLINE searches at the Mayo Clinic will be employed for obtaining additional tasks. 2. To develop automatic noise reduction techniques for improving both the accuracy of the LLSF mapping and the efficiency of the computation. A multi-step noise reduction in the training process of LLSF will be investigated, including a statistical term weighting for the removal of non-informative terms, a truncated singular value decomposition (SVD) for reducing the noise at the semantic structure level, and the truncation of insignificant elements in the LLSF solution matrix for noise-reduction at the level of term-to-concept mapping. 3. To scale-up the training capacity for enabling the LLSF to accommodate the large size of MEDLINE data. A split-merge approach decomposes a large training sample into tractable subsets, computes an LLSF mapping function for each subset, and then merges the lcal mapping functions into a global one. 4. To improve the computational efficiency by employing algorithms optimized for sparse matrices and for noise reduction. The potential solutions include the Block Lanczos truncated SVD algorithm which can reduce the cubic time complexity of standard SVD (on dense matrices) to a quadratic complexity, a QR decomposition which solves the LLSF without SVD, a sparse matrix algorithm which has shown a speed-up in matrix multiplication and cosine computation by a factor of l to 4 magnitudes, and parallel computing. 5. To evaluate the effectiveness of LLSF on large MEDLINE document sets and compare with the performance of alternate indexing/retrieval systems.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: First Independent Research Support & Transition (FIRST) Awards (R29)
Project #: 5R29LM005714-05
Application #: 2897374
Study Section: Biomedical Library and Informatics Review Committee (BLR)
Program Officer: Bean, Carol A

Project Start: 1995-04-01
Project End: 2000-03-31
Budget Start: 1999-04-01
Budget End: 2000-03-31
Support Year: 5
Fiscal Year: 1999
Total Cost
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Biostatistics & Other Math Sci
Type: Other Domestic Higher Education
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects


NIH 1999 R29 LM	Llsf Mapping for Indexing and Retrieval of Medline Yang, Yiming / Carnegie-Mellon University
NIH 1998 R29 LM	Llsf Mapping for Indexing and Retrieval of Medline Yang, Yiming / Carnegie-Mellon University
NIH 1997 R29 LM	Llsf Mapping for Indexing and Retrieval of Medline Yang, Yiming / Carnegie-Mellon University
NIH 1996 R29 LM	Llsf Mapping for Indexing and Retrieval of Medline Yang, Yiming / Carnegie-Mellon University
NIH 1995 R29 LM	Llsf Mapping for Indexing and Retrieval of Medline Yang, Yiming / Mayo Clinic, Rochester

Publications

Yang, Y; Chute, C G (1995) Sampling strategies in a statistical approach to clinical classification. Proc Annu Symp Comput Appl Med Care :32-6

Comments

Be the first to comment on Yiming Yang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: