There has been growing interest in recent years in developing methods that automatically identify Gene Ontology (GO) concepts in the unstructured text of scientific articles. This interest is motivated in part by the need to automate the task of model-organism database curation. In addition, however, methods that automatically identify GO concepts in text will enable data mining tools that compile and interpret information extracted from text, tools that will benefit a large number of people across the scientific enterprise. This project builds on recently completed work in which we used the literature of S. cerevisiae and annotations in the Saccharomyces Genome Database (SGD) to develop methods that determine what molecular function claims are being made in an article and what experimental evidence there is in the article for those claims. The data generated in this project contains a wealth of information that could lead to greatly improved methods for identifying GO concepts in text.
The specific aims of this project are: (1) to develop a representation for GO molecular function concepts that captures information not only about the language of a GO term but also the biomedical entity the term refers to;and (2) to analyze the results of the S. cerevisiae data mining project using the GO representations formulated in (1) to determine which are likely to produce improved GO term recognition. The analysis will be performed on 276 true positive results, 29,276 false positive results, and 336 false negative results to see if a new GO concept representation can reduce the number of false positives or false negatives without losing any true positives. The data mining tools of this proposal can be extended to ontologies other than GO, thereby leveraging the effort expended on ontology development.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Small Research Grants (R03)
Project #
5R03LM009752-02
Application #
7918188
Study Section
Special Emphasis Panel (ZLM1-ZH-S (O1))
Program Officer
Ye, Jane
Project Start
2009-09-01
Project End
2012-08-31
Budget Start
2010-09-01
Budget End
2012-08-31
Support Year
2
Fiscal Year
2010
Total Cost
$50,000
Indirect Cost
Name
Converspeech, LLC
Department
Type
DUNS #
803686435
City
Palo Alto
State
CA
Country
United States
Zip Code
94301