Knowledge of protein function serves as a corner stone for biomedical research, which is fundamental for understanding biologic systems, the mechanism of disease and ultimately the human health. Decades of biomedical research has accumulated a great wealth of such knowledge available in the form of biomedical literatures. An important task of biomedical informatics is to acquire and represent the knowledge from free text of literatures and transform it to languages that are understandable by computational agents, so that the knowledge can be stored, retrieved and used for knowledge discovery. Currently, all protein annotations are assigned manually which, unfortunately, is extremely labor-intense and cannot keep up the pace of the growth of information. Indeed, with the completion of genome sequences of several model organisms, manual annotation of proteins has already become a major bottleneck between large number of proteins and exploding amount information in biomedical literatures. In this application, we propose to develop methods to facilitate automatic annotation of protein functions based on the functional information buried in the biomedical literature. The proposed methods adapt and extend the state of art probabilistic semantic analysis, information retrieval and machine learning methodologies, which serve as principled approaches to modeling uncertainties in natural language text. The project will develop algorithmic building blocks for a future automatic annotation system such that, when given a brief description of a protein (e.g., a protein name and symbol), it will be capable of retrieving relevant literature articles about the protein, extracting biological concepts from the articles and mapping the concept to a controlled vocabulary. We envision that achieving these goals will result in advances with broader impact which not only facilitate automatic protein annotation but also for biomedical literature indexing-one of the important area of biomedical informatics. The efficient knowledge acquisition and management will enhance biomedical research regarding the mechanisms of diseases and drug discovery.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
3R01LM009153-02S1
Application #
7840891
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2007-07-15
Project End
2010-08-31
Budget Start
2009-07-15
Budget End
2010-08-31
Support Year
2
Fiscal Year
2009
Total Cost
$39,294
Indirect Cost
Name
Medical University of South Carolina
Department
Internal Medicine/Medicine
Type
Schools of Medicine
DUNS #
183710748
City
Charleston
State
SC
Country
United States
Zip Code
29425
Lu, Songjian; Lu, Xinghua (2012) Integrating genome and functional genomics data to reveal perturbed signaling pathways in ovarian cancers. AMIA Jt Summits Transl Sci Proc 2012:72-8
Karimzadehgan, Maryam; Zhai, Chengxiang (2012) Integer Linear Programming for Constrained Multi-Aspect Committee Review Assignment. Inf Process Manag 48:725-740
Qin, Tingting; Tsoi, Lam C; Sims, Kellie J et al. (2012) Signaling network prediction by the Ontology Fingerprint enhanced Bayesian network. BMC Syst Biol 6 Suppl 3:S3
Richards, Adam J; Schwacke, John H; Rohrer, Bärbel et al. (2012) Revealing functionally coherent subsets using a spectral clustering and an information integration approach. BMC Syst Biol 6 Suppl 3:S7
Li, Xiaoyun; Bandyopadhyay, Dipankar; Lipsitz, Stuart et al. (2011) Likelihood methods for binary responses of present components in a cluster. Biometrics 67:629-35
Jin, Bo; Chen, Vicky; Chen, Lujia et al. (2011) Mapping annotations with textual evidence using an scLDA model. AMIA Annu Symp Proc 2011:834-42
Cowart, L Ashley; Shotwell, Matthew; Worley, Mitchell L et al. (2010) Revealing a signaling role of phytosphingosine-1-phosphate in yeast. Mol Syst Biol 6:349
Asbury, Thomas M; Mitman, Matt; Tang, Jijun et al. (2010) Genome3D: a viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome. BMC Bioinformatics 11:444
Richards, Adam J; Muller, Brian; Shotwell, Matthew et al. (2010) Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph. Bioinformatics 26:i79-87
Jin, Bo; Lu, Xinghua (2010) Identifying informative subsets of the Gene Ontology with information bottleneck methods. Bioinformatics 26:2445-51

Showing the most recent 10 out of 16 publications