Building on 8 years of highly productive work in technology development that included the creation of the Colorado Richly Annotated Full Text corpus (CRAFT), we hypothesize that text mining resources and methods are approaching the level of maturity required to productively process a significant proportion of the full text biomedical literature to create a well-represented formal knowledge base of molecular biology. We propose a detailed, integrated plan to achieve this long-standing goal. Success in this effort will make possible a transformative new way for the biomedical research community to identify access and integrate existing knowledge, breaking down disciplinary boundaries and other silos that have kept scientists from fully exploiting relevant prior results in their research. Our successes in the prior funding period broadened the applicability of biomedical concept identification systems to a much wider set of tasks, demonstrating the ability to target multiple community-curated ontologies in text mining, and generate scientifically significant insights from the results. The proposed work would take advantage of the resources we produced to transcend several of the limitations of previous efforts. We propose innovative new approaches to formal knowledge representation and to characterizing relationships between textual elements and semantic content. We will design, implement and evaluate computational systems that have the potential to transform enormous text collections into semantically rich, logic-based, standards-compliant, formal representations of biomedical knowledge with clearly identified provenance. The resulting representations will express complex assertions about a very wide range of entities, processes, qualities, and, most importantly, their specific relationships with one another.

Public Health Relevance

Hunter, Lawrence E. Project narrative This project will affect public health by increasing the access of physicians, researchers, and the general public to highly targeted information from published research and electronic health records. PHS 398/2590 (Rev. 06/09) Page Continuation Format Page

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Colorado Denver
Schools of Medicine
United States
Zip Code
Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young et al. (2017) Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. BMC Bioinformatics 18:372
Greene, Casey S; Garmire, Lana X; Gilbert, Jack A et al. (2017) Celebrating parasites. Nat Genet 49:483-484
Hooper, Joan E; Feng, Weiguo; Li, Hong et al. (2017) Systems biology of facial development: contributions of ectoderm and mesenchyme. Dev Biol 426:97-114
Hirschman, Lynette; Fort, Karën; Boué, Stéphanie et al. (2016) Crowdsourcing and curation: perspectives from biology and natural language processing. Database (Oxford) 2016:
Eberlein, Jens; Davenport, Bennett; Nguyen, Tom et al. (2016) Aging promotes acquisition of naive-like CD8+ memory T cell traits and enhanced functionalities. J Clin Invest 126:3942-3960
Funk, Christopher S; Cohen, K Bretonnel; Hunter, Lawrence E et al. (2016) Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition. J Biomed Semantics 7:52
Névéol, Aurélie; Cohen, K Bretonnel; Grouin, Cyril et al. (2016) Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016. CEUR Workshop Proc 1609:28-42
Andrew, Audra L; Card, Daren C; Ruggiero, Robert P et al. (2015) Rapid changes in gene expression direct rapid shifts in intestinal form and function in the Burmese python after feeding. Physiol Genomics 47:147-57
Livingston, Kevin M; Bada, Michael; Baumgartner Jr, William A et al. (2015) KaBOB: ontology-based semantic integration of biomedical databases. BMC Bioinformatics 16:126
Vehlow, Corinna; Kao, David P; Bristow, Michael R et al. (2015) Visual analysis of biological data-knowledge networks. BMC Bioinformatics 16:135

Showing the most recent 10 out of 79 publications