Knowledge in molecular biology consists of assertions about the relationship of molecular entities qualified by context which describes when and where those assertions apply. The vast majority of knowledge in molecular biology resides in the primary research literature, and only a small fraction of this knowledge is currently accessible through well-structured databases. This is a pilot project to develop automated knowledge extraction technology. We will use the regulation of gene expression in hematopoiesis as a test domain. Knowledge acquisition will be accomplished through a multi-stage process: parsing the document and sentence structure, recognizing the names of known biological entities and matching sentences to verb based templates to capture assertions (e.g. ;A binds B; or ;A contains B; A regulates B;) and preposition templates to capture context in which these assertions apply. A multi-disciplinary approach will be used drawing on experts in bioinformatics, databases, information science and computational linguistics. Four unique aspects of this project are the definition of a multi-dimensional description of molecular biological context, the use of preposition templates and hierarchical document structure to capture and make inference on context, the development of domain specific parsing techniques and the use of probabilistic representations explicitly represented in XML throughout text processing, parsing, knowledge acquisition and information integration.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM008106-01
Application #
6709640
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2003-09-30
Project End
2007-09-29
Budget Start
2003-09-30
Budget End
2004-09-29
Support Year
1
Fiscal Year
2003
Total Cost
$335,521
Indirect Cost
Name
University of Michigan Ann Arbor
Department
Genetics
Type
Schools of Medicine
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109
Hur, Junguk; Sullivan, Kelli A; Schuyler, Adam D et al. (2010) Literature-based discovery of diabetes- and ROS-related targets. BMC Med Genomics 3:49
States, David J; Ade, Alex S; Wright, Zachary C et al. (2009) MiSearch adaptive pubMed search tool. Bioinformatics 25:974-6
Chen, Yili; Lin, Grace; Huo, Jeffrey S et al. (2009) Computational and functional analysis of growth hormone (GH)-regulated genes identifies the transcriptional repressor B-cell lymphoma 6 (Bc16) as a participant in GH-regulated transcription. Endocrinology 150:3645-54
Hur, Junguk; Schuyler, Adam D; States, David J et al. (2009) SciMiner: web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics 25:838-40
Menon, Rajasree; Zhang, Qing; Zhang, Yan et al. (2009) Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. Cancer Res 69:300-9
Tarcea, V Glenn; Weymouth, Terry; Ade, Alex et al. (2009) Michigan molecular interactions r2: from interacting proteins to pathways. Nucleic Acids Res 37:D642-6
Gao, Jing; Ade, Alex S; Tarcea, V Glenn et al. (2009) Integrating and annotating the interactome using the MiMI plugin for cytoscape. Bioinformatics 25:137-8
Ozgur, Arzucan; Vu, Thuy; Erkan, Gunes et al. (2008) Identifying gene-disease associations using centrality on a literature mined gene-interaction network. Bioinformatics 24:i277-85
Leitner, Florian; Krallinger, Martin; Rodriguez-Penagos, Carlos et al. (2008) Introducing meta-services for biomedical information extraction. Genome Biol 9 Suppl 2:S6
Sarntivijai, Sirarat; Ade, Alexander S; Athey, Brian D et al. (2008) A bioinformatics analysis of the cell line nomenclature. Bioinformatics 24:2760-6

Showing the most recent 10 out of 19 publications