In the three years since the original proposal was submitted, the claims we made about the impending readiness of knowledge-based approaches and natural language processing to address pressing problems of information overload in molecular biology have been resoundingly confirmed, and such methods have become increasingly accepted within the computational bioscience and systems biology communities. We are now well into the era of broad use of semantic representation technology to support biomedical research, and at the cusp of the use of biomedical natural language processing software to create the enormous number of necessary formal representations automatically from biomedical texts. The results of the work during the last funding period have not only contributed innovative and significant new methods, but have helped us identify a set of specific research issues we claim are now the rate-limiting factors in building an extensive, high-quality computational knowledge-base of molecular biology.
The aims of this competitive renewal are to address those factors, making it possible to scale our impressive results on intentionally narrow applications to much larger (and more significant) tasks, specifically: (1) to create an enriched, relationally decomposed set of conceptual frames, hewing closely to multiple, community curated ontologies;(2) develop language processing tools capable of recognizing and populating instances of those conceptual frames, and (3) develop systems for integrating and using diverse knowledge from multiple sources to generate scientific insights, focusing on the analysis of sets of dozens to hundreds of genes produced by diverse high-throughput methodologies. An innovative aspect of this proposal is the creation and application of novel, insight-based extrinsic evaluation techniques for such systems.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZLM1-ZH-C (J2))
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Colorado Denver
Schools of Medicine
United States
Zip Code
Funk, Christopher S; Hunter, Lawrence E; Cohen, K Bretonnel (2014) Combining heterogenous data for prediction of disease related and pharmacogenes. Pac Symp Biocomput :328-39
Cohen, K Bretonnel; Hunter, Lawrence E (2013) Chapter 16: text mining for translational bioinformatics. PLoS Comput Biol 9:e1003044
Comeau, Donald C; Islamaj Do?an, Rezarta; Ciccarese, Paolo et al. (2013) BioC: a minimalist approach to interoperability for biomedical text processing. Database (Oxford) 2013:bat064
Liu, Haibin; Hunter, Lawrence; KeĊĦelj, Vlado et al. (2013) Approximate subgraph matching-based literature mining for biomedical events and relations. PLoS One 8:e60954
Hill, David P; Adams, Nico; Bada, Mike et al. (2013) Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics 14:513
Sarkar, Suparna A; Lee, Catherine E; Tipney, Hannah et al. (2012) Synergizing genomic analysis with biological knowledge to identify and validate novel genes in pancreatic development. Pancreas 41:962-9
Epperson, L Elaine; Karimpour-Fard, Anis; Hunter, Lawrence E et al. (2011) Metabolic cycles in a circannual hibernator. Physiol Genomics 43:799-807
Galligan, James J; Fritz, Kristofer S; Tipney, Hannah et al. (2011) Profiling impaired hepatic endoplasmic reticulum glycosylation as a consequence of ethanol ingestion. J Proteome Res 10:1837-47
Grabek, Katharine R; Karimpour-Fard, Anis; Epperson, L Elaine et al. (2011) Multistate proteomics analysis reveals novel strategies used by a hibernator to precondition the heart and conserve ATP for winter heterothermy. Physiol Genomics 43:1263-75
Lu, Zhiyong; Kao, Hung-Yu; Wei, Chih-Hsuan et al. (2011) The gene normalization task in BioCreative III. BMC Bioinformatics 12 Suppl 8:S2

Showing the most recent 10 out of 59 publications