Beyond information extraction: Identifying Gene Ontology concepts in text

Crangle, Colleen

Abstract

There has been growing interest in recent years in developing methods that automatically identify Gene Ontology (GO) concepts in the unstructured text of scientific articles. This interest is motivated in part by the need to automate the task of model-organism database curation. In addition, however, methods that automatically identify GO concepts in text will enable data mining tools that compile and interpret information extracted from text, tools that will benefit a large number of people across the scientific enterprise. This project builds on recently completed work in which we used the literature of S. cerevisiae and annotations in the Saccharomyces Genome Database (SGD) to develop methods that determine what molecular function claims are being made in an article and what experimental evidence there is in the article for those claims. The data generated in this project contains a wealth of information that could lead to greatly improved methods for identifying GO concepts in text.
The specific aims of this project are: (1) to develop a representation for GO molecular function concepts that captures information not only about the language of a GO term but also the biomedical entity the term refers to;and (2) to analyze the results of the S. cerevisiae data mining project using the GO representations formulated in (1) to determine which are likely to produce improved GO term recognition. The analysis will be performed on 276 true positive results, 29,276 false positive results, and 336 false negative results to see if a new GO concept representation can reduce the number of false positives or false negatives without losing any true positives. The data mining tools of this proposal can be extended to ontologies other than GO, thereby leveraging the effort expended on ontology development.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Small Research Grants (R03)
Project #: 5R03LM009752-02
Application #: 7918188
Study Section: Special Emphasis Panel (ZLM1-ZH-S (O1))
Program Officer: Ye, Jane

Project Start: 2009-09-01
Project End: 2012-08-31
Budget Start: 2010-09-01
Budget End: 2012-08-31
Support Year: 2
Fiscal Year: 2010
Total Cost: $50,000
Indirect Cost

Institution

Name: Converspeech, LLC
Department
Type
DUNS #: 803686435

City: Palo Alto
State: CA
Country: United States
Zip Code: 94301

Related projects


NIH 2010 R03 LM	Beyond information extraction: Identifying Gene Ontology concepts in text Crangle, Colleen Elizabeth / Converspeech, LLC	$50,000
NIH 2009 R03 LM	Beyond information extraction: Identifying Gene Ontology concepts in text Crangle, Colleen Elizabeth / Converspeech, LLC	$50,000

Comments

Be the first to comment on Colleen Crangle's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments