The biological databases provide a rich source of data to serve as training data for statistical and machine learning approaches to text mining; they also provide expert-curated, "gold standard" data for evaluation of system performance. The strategy is to focus on problems of importance to working biologists, such as overcoming the curation bottleneck for biological literature, providing better mappings between biological ontologies and text, and giving biologists better access to textual information in both in the literature and in curated databases. This proposal focuses on development of mechanisms to promote progress in text mining to problems of biological significance. The short term focus is to continue work in organizing BioCreAtIvE: Critical Assessment for Information Extraction in Biology. The long term focus is to improve text mining tools to support expert curators in their cost-effective acquisition of information for biological databases, as well as to improve access to biological information via the use of shared semantics (ontologies), with particular focus on interactive tools and extraction of complex relations, such as host-pathogen or ecosystem interactions. The specific tasks proposed here are 1) running the Gene Normalization task for BioCreAtIvE II (to take place in 2006-2007) and analyzing and disseminating the data and results of the BioCreAtIvE II; 2) providing input into the creation of a Roadmap for BioCreAtIvE; 3) defining new evaluation tasks to meet needs of a wider range of biological curators; this will include an evaluation of interactive curation tools, done in conjunction with the RegCreative Jamboree; and methods for the representation and capture of complex biological relations, such as host-pathogen interaction and ecosystem interactions, in conjunction with standards consortia.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0640153
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2006-10-01
Budget End
2008-09-30
Support Year
Fiscal Year
2006
Total Cost
$296,174
Indirect Cost
Name
Mitre Corporation Virginia
Department
Type
DUNS #
City
McLean
State
VA
Country
United States
Zip Code
22102