The long-term goal of our research is to develop resources and tools for knowledge retrieval management in the biomedical domain. As the pace of biomedical research accelerates, researchers become more and more dependent on computers to manage the explosive amount of biomedical information being published. The high quality of many databases is guaranteed by database curators who extract and synthesize information stored in literature or other databases. It is important to accurately recognize biomedical entity names in text and map the identified names to corresponding records in biomedical databases. Usually, a biomedical database provides a list of names either entered by curators or extracted from other databases. Those names could be used to retrieve records from databases or map names to database records by NLP systems. However, there are several characteristics associated with biomedical entity names, namely: synonymy (i.e., different names refer to the same database entry), ambiguity (i.e., one name is associated with different entries), and novelty (i.e., names or entities are not present in databases or knowledge bases) which make the task of retrieving database records using names and the task of associating names in text to database records very daunting. Additionally, biomedical entities can appear in text as short forms (SFs) abbreviated from their long forms (LFs). The prevalent use of SFs representing biomedical entities is another challenge faced by end users and NLP applications because of the high ambiguity of SFs. Recently, ontology-based knowledge management is becoming increasingly popular since ontologies provide formal, machine-processable, and human-interpretable representations of the biomedical entities and their relations. We hypothesize that biomedical ontologies can be used to reduce the difficulty associated with retrieving records using names or mapping names in text to database records.
Specific aims and the corresponding hypotheses are: i) develop onto-BioThesaurus by enriching BioThesaurus with gene/protein-related ontologies (Hypothesis: aligning gene/protein names to gene/protein-related ontologies can reduce the complexity associated with gene/protein names);ii) harvest synonyms for gene/protein classes and entities from online resources and text (Hypothesis: harvesting synonyms especially gene/protein SFs is critical since SFs are frequently used to represent gene/protein entities);iii) build a web user interface for gene/protein names and entries search and query through ontology-enabled onto-BioThesaurus (Hypothesis: enhancing BioThesaurus with gene/protein-related ontologies would enable us to build heuristic rules to enable machine reasoning);and iv) evaluate and distribute research methods/outcome (Hypothesis: evaluating and distributing research methods/outcome are critical to advance both basic and applied biomedical science.

Public Health Relevance

The proposed research is critical for biomedical knowledge retrieval and management. It serves as one of the foundation for storing, retrieving, and extracting knowledge and information in the biomedical domain. Additionally, the proposed research will benefit biomedical researchers and general community for understanding and managing biomedical text through web interfaces and automated systems.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM009959-01A1
Application #
7654995
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2009-09-01
Project End
2011-08-31
Budget Start
2009-09-01
Budget End
2010-08-31
Support Year
1
Fiscal Year
2009
Total Cost
$608,650
Indirect Cost
Name
Georgetown University
Department
Biostatistics & Other Math Sci
Type
Schools of Medicine
DUNS #
049515844
City
Washington
State
DC
Country
United States
Zip Code
20057
Elayavilli, Ravikumar Komandur; Liu, Hongfang (2016) Ion Channel ElectroPhysiology Ontology (ICEPO) - a case study of text mining assisted ontology development. AMIA Jt Summits Transl Sci Proc 2016:42-51
Li, Dingcheng; Okamoto, Janet; Liu, Hongfang et al. (2015) A bibliometric analysis on tobacco regulation investigators. BioData Min 8:11
Ravikumar, Komandur Elayavilli; Wagholikar, Kavishwar B; Li, Dingcheng et al. (2015) Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature. BMC Bioinformatics 16:185
Li, Ding-Cheng; Rastegar-Mojarad, Majid; Okamoto, Janet et al. (2015) A Bibliometric Analysis on Cancer Population Science with Topic Modeling. AMIA Jt Summits Transl Sci Proc 2015:102-6
Ravikumar, K E; Wagholikar, Kavishwar B; Liu, Hongfang (2014) Towards pathway curation through literature mining--a case study using PharmGKB. Pac Symp Biocomput :352-63
Liu, Hongfang; Sohn, Sunghwan; Murphy, Sean et al. (2014) Facilitating post-surgical complication detection through sublanguage analysis. AMIA Jt Summits Transl Sci Proc 2014:77-82
Wu, Stephen T; Juhn, Young J; Sohn, Sunghwan et al. (2014) Patient-level temporal aggregation for text-based asthma status ascertainment. J Am Med Inform Assoc 21:876-84
Moosavinasab, Soheil; Rastegar-Mojarad, Majid; Liu, Hongfang et al. (2014) Towards Transforming Expert-based Content to Evidence-based Content. AMIA Jt Summits Transl Sci Proc 2014:83-90
Li, Ding Cheng; Thermeau, Terry; Chute, Christopher et al. (2014) Discovering associations among diagnosis groups using topic modeling. AMIA Jt Summits Transl Sci Proc 2014:43-9
Zhang, Yuji; Tao, Cui (2014) Network Analysis of Cancer-focused Association Network Reveals Distinct Network Association Patterns. Cancer Inform 13:45-51

Showing the most recent 10 out of 47 publications