It is becoming increasingly difficult for biologists to keep pace with information being published within their own fields, let alone biology as a whole. The ability to rapidly access specific and current biomedical information as well as to quickly gain an overview of current knowledge in a given field is becoming more difficult while at the same time more important. Traditional methods of keeping up with advances are therefore becoming inadequate. This project will involve a unique collaboration between a computational linguist at Brandeis University and two biologists at Tufts University School of Medicine. We propose to make use of recent advances in the computational analysis of text to organize and summarize the biological literature. Building on our previous language technology research at Brandeis, we propose to integrate the domain knowledge of the National Library of Medicine's Unified Medical Language System (UMLS) with Brandeis' semantic lexicon, CoreLex, toward the development of normalized structured representations of the semantic content of abstracts in the Medline database. These data structures, called lexical webs, accelerate the availability of information in a richly hyperlinked index that facilitates rapid navigation and information access. Automated analysis of biological abstracts will be combined with information derived from sequence databases to provide an up-to-date and comprehensive database of information regarding known genes and proteins. The results of this analysis will be used to construct a web accessible database organized on a gene-by-gene basis. Other unique aspects of this database will be the visualization of motifs and features extracted from Medline abstracts through the generation of annotated structure-function maps of proteins and genes, and the construction of gene-specific semantic indexes to the relevant biological literature. This system, called MedStract, will reduce the time required for biomedical researchers to find information of interest and should facilitate the development of new research insights.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM006649-02
Application #
6165092
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
1999-03-01
Project End
2002-02-28
Budget Start
2000-03-01
Budget End
2001-02-28
Support Year
2
Fiscal Year
2000
Total Cost
$297,119
Indirect Cost
Name
Brandeis University
Department
Biostatistics & Other Math Sci
Type
Schools of Arts and Sciences
DUNS #
616845814
City
Waltham
State
MA
Country
United States
Zip Code
02454
Pustejovsky, J; Castano, J; Zhang, J et al. (2002) Robust relational parsing over biomedical literature: extracting inhibit relations. Pac Symp Biocomput :362-73
Pustejovsky, J; Castano, J; Cochran, B et al. (2001) Automatic extraction of acronym-meaning pairs from MEDLINE databases. Medinfo 10:371-5