This proposal describes a new tool for text data mining-a biomedical language ontology and integrated natural-language-processing methods. Our long-term goal is to provide resources for biomedical knowledge discovery from text. Our immediate goal is to provide a knowledge discovery tool for the curation of organism databases such as the Genome Database (SGD). The proposed research not only serves the research needs of the SGD community, it also helps the broader biomedical community exploit the strengths of the comparative approach to biological research. The hypothesis of this proposal is that knowledge discovery from biomedical text requires a knowledge base that integrates both genomic and linguistic information. This hypothesis is based on two observations: (a) the language of biomedicine, like all natural language, is complex in structure and morphology (the basic units of meaning) and poses problems of synonymy (several terms having the same meaning), polysemy (a term having more than one meaning), hypernymy (one term being more general than another), hyponymy (one term being more specific than another), denotation (what a term refers to in contrast to what it means), and denotation and description (different ways of referring to the same thing); and (b) important biomedical knowledge sources, such as the Gene Ontology (GO), are expressed in natural language.
The specific aims of the proposed project are to: 1. Extend an existing biomedical language ontology to include genomic and linguistic data from SGD; 2. Use this ontology to discover, in full-text articles made available by SGD, information about the molecular function of yeast gene products that can be inferred from direct experimental assays; 3. Evaluate the effectiveness of the new tool and methods by comparing its results to those of the SGD curators for gene products that have GO functional annotations with evidence code IDA (Inferred from Direct Assay).

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
1R43HG003600-01
Application #
6885487
Study Section
Special Emphasis Panel (ZRG1-BDMA (01))
Program Officer
Bonazzi, Vivien
Project Start
2005-03-11
Project End
2006-09-30
Budget Start
2005-03-11
Budget End
2006-09-30
Support Year
1
Fiscal Year
2005
Total Cost
$99,250
Indirect Cost
Name
Converspeech, LLC
Department
Type
DUNS #
803686435
City
Palo Alto
State
CA
Country
United States
Zip Code
94301