Knowledge in molecular biology consists of assertions about the relationship of molecular entities qualified by context which describes when and where those assertions apply. The vast majority of knowledge in molecular biology resides in the primary research literature, and only a small fraction of this knowledge is currently accessible through well-structured databases. This is a pilot project to develop automated knowledge extraction technology. We will use the regulation of gene expression in hematopoiesis as a test domain. Knowledge acquisition will be accomplished through a multi-stage process: parsing the document and sentence structure, recognizing the names of known biological entities and matching sentences to verb based templates to capture assertions (e.g. ;A binds B; or ;A contains B; A regulates B;) and preposition templates to capture context in which these assertions apply. A multi-disciplinary approach will be used drawing on experts in bioinformatics, databases, information science and computational linguistics. Four unique aspects of this project are the definition of a multi-dimensional description of molecular biological context, the use of preposition templates and hierarchical document structure to capture and make inference on context, the development of domain specific parsing techniques and the use of probabilistic representations explicitly represented in XML throughout text processing, parsing, knowledge acquisition and information integration.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM008106-02
Application #
6805725
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2003-09-30
Project End
2007-09-29
Budget Start
2004-09-30
Budget End
2005-09-29
Support Year
2
Fiscal Year
2004
Total Cost
$333,902
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Genetics
Type
Schools of Medicine
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109
Hur, Junguk; Sullivan, Kelli A; Schuyler, Adam D et al. (2010) Literature-based discovery of diabetes- and ROS-related targets. BMC Med Genomics 3:49
States, David J; Ade, Alex S; Wright, Zachary C et al. (2009) MiSearch adaptive pubMed search tool. Bioinformatics 25:974-6
Chen, Yili; Lin, Grace; Huo, Jeffrey S et al. (2009) Computational and functional analysis of growth hormone (GH)-regulated genes identifies the transcriptional repressor B-cell lymphoma 6 (Bc16) as a participant in GH-regulated transcription. Endocrinology 150:3645-54
Hur, Junguk; Schuyler, Adam D; States, David J et al. (2009) SciMiner: web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics 25:838-40
Menon, Rajasree; Zhang, Qing; Zhang, Yan et al. (2009) Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. Cancer Res 69:300-9
Tarcea, V Glenn; Weymouth, Terry; Ade, Alex et al. (2009) Michigan molecular interactions r2: from interacting proteins to pathways. Nucleic Acids Res 37:D642-6
Gao, Jing; Ade, Alex S; Tarcea, V Glenn et al. (2009) Integrating and annotating the interactome using the MiMI plugin for cytoscape. Bioinformatics 25:137-8
Sarntivijai, Sirarat; Ade, Alexander S; Athey, Brian D et al. (2008) A bioinformatics analysis of the cell line nomenclature. Bioinformatics 24:2760-6
Ashkenazi, Maital; Bader, Gary D; Kuchinsky, Allan et al. (2008) Cytoscape ESP: simple search of complex biological networks. Bioinformatics 24:1465-6
Ozgur, Arzucan; Vu, Thuy; Erkan, Gunes et al. (2008) Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics 24:i277-85

Showing the most recent 10 out of 19 publications