As the pace of biological research increases, computers are being used to manage the explosive amount of biological information. Much of information relevant to biological research is recorded either as coded data in biological databases or as free text in journal articles and in annotation fields of biological databases. Natural language processing tools have shown to have the potential to decrease the difficulty of managing information in biomedical free text. This project aims to use online resources (e.g., genetic databases, free-text corpora or machine readable dictionaries) and machine learning techniques for the construction of a biological entity tagging system that associates terms mentioned in text with entries in databases. Biological entity tagging is extremely challenging because of novelty, synonymy and ambiguity associated with terms representing biological entities in text. The project includes the construction of a biological entity dictionary and the acquisition of disambiguation knowledge using online resources. It also includes the development of dictionary lookup method and the employment of machine learning techniques for resolving ambiguity, discovering novelty, and recognizing synonymy. The research will generate several deliverables and the enriched information on gene/protein names, bibliography, and other annotation fields will be integrated into UniProt/PIR databases, which is an ongoing international effort on protein databases. The project provides an opportunity of furthering the collaborations among Columbia University, Georgetown University Medical Center and University of Maryland at Baltimore County. The project also integrates educational and research activities by having graduate and undergraduate students involved in the overall project.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0430743
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2004-09-01
Budget End
2006-08-31
Support Year
Fiscal Year
2004
Total Cost
$823,109
Indirect Cost
Name
University of Maryland Baltimore County
Department
Type
DUNS #
City
Baltimore
State
MD
Country
United States
Zip Code
21250