Building on 8 years of highly productive work in technology development that included the creation of the Colorado Richly Annotated Full Text corpus (CRAFT), we hypothesize that text mining resources and methods are approaching the level of maturity required to productively process a significant proportion of the full text biomedical literature to create a well-represented formal knowledge base of molecular biology. We propose a detailed, integrated plan to achieve this long-standing goal. Success in this effort will make possible a transformative new way for the biomedical research community to identify access and integrate existing knowledge, breaking down disciplinary boundaries and other silos that have kept scientists from fully exploiting relevant prior results in their research. Our successes in the prior funding period broadened the applicability of biomedical concept identification systems to a much wider set of tasks, demonstrating the ability to target multiple community-curated ontologies in text mining, and generate scientifically significant insights from the results. The proposed work would take advantage of the resources we produced to transcend several of the limitations of previous efforts. We propose innovative new approaches to formal knowledge representation and to characterizing relationships between textual elements and semantic content. We will design, implement and evaluate computational systems that have the potential to transform enormous text collections into semantically rich, logic-based, standards-compliant, formal representations of biomedical knowledge with clearly identified provenance. The resulting representations will express complex assertions about a very wide range of entities, processes, qualities, and, most importantly, their specific relationships with one another.

Public Health Relevance

Hunter, Lawrence E. Project narrative This project will affect public health by increasing the access of physicians, researchers, and the general public to highly targeted information from published research and electronic health records. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page

National Institute of Health (NIH)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Colorado Denver
Schools of Medicine
United States
Zip Code
Funk, Christopher S; Hunter, Lawrence E; Cohen, K Bretonnel (2014) Combining heterogenous data for prediction of disease related and pharmacogenes. Pac Symp Biocomput :328-39
Cohen, K Bretonnel; Hunter, Lawrence E (2013) Chapter 16: text mining for translational bioinformatics. PLoS Comput Biol 9:e1003044
Comeau, Donald C; Islamaj Do?an, Rezarta; Ciccarese, Paolo et al. (2013) BioC: a minimalist approach to interoperability for biomedical text processing. Database (Oxford) 2013:bat064
Liu, Haibin; Hunter, Lawrence; KeĊĦelj, Vlado et al. (2013) Approximate subgraph matching-based literature mining for biomedical events and relations. PLoS One 8:e60954
Hill, David P; Adams, Nico; Bada, Mike et al. (2013) Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics 14:513
Sarkar, Suparna A; Lee, Catherine E; Tipney, Hannah et al. (2012) Synergizing genomic analysis with biological knowledge to identify and validate novel genes in pancreatic development. Pancreas 41:962-9
Epperson, L Elaine; Karimpour-Fard, Anis; Hunter, Lawrence E et al. (2011) Metabolic cycles in a circannual hibernator. Physiol Genomics 43:799-807
Galligan, James J; Fritz, Kristofer S; Tipney, Hannah et al. (2011) Profiling impaired hepatic endoplasmic reticulum glycosylation as a consequence of ethanol ingestion. J Proteome Res 10:1837-47
Grabek, Katharine R; Karimpour-Fard, Anis; Epperson, L Elaine et al. (2011) Multistate proteomics analysis reveals novel strategies used by a hibernator to precondition the heart and conserve ATP for winter heterothermy. Physiol Genomics 43:1263-75
Lu, Zhiyong; Kao, Hung-Yu; Wei, Chih-Hsuan et al. (2011) The gene normalization task in BioCreative III. BMC Bioinformatics 12 Suppl 8:S2

Showing the most recent 10 out of 59 publications