Knowledge-based biomedical data science In the previous funding period, we designed and constructed breakthrough methods for creating a semantically coherent and logically consistent knowledge-base by automatically transforming and integrating many biomedical databases, and by directly extracting information from the literature. Building on decades of work in biomedical ontology development, and exploiting the architectures supporting the Semantic Web, we have demonstrated methods that allow effective querying spanning any combination of data sources in purely biological terms, without the queries having to reflect anything about the structure or distribution of information among any of the sources. These methods are also capable of representing apparently conflicting information in a logically consistent manner, and tracking the provenance of all assertions in the knowledge-base. Perhaps the most important feature of these methods is that they scale to potentially include nearly all knowledge of molecular biology. We now hypothesize that using these technologies we can build knowledge-bases with broad enough coverage to overcome the ?brittleness? problems that stymied previous approaches to symbolic artificial intelligence, and then create novel computational methods which leverage that knowledge to provide critical new tools for the interpretation and analysis of biomedical data. To test this hypothesis, we propose to address the following specific aims: 1. Identify representative and significant analytical needs in knowledge-based data science, and refine and extend our knowledge-base to address those needs in three distinct domains: clinical pharmacology, cardiovascular disease and rare genetic disease. 2. Develop novel and implement existing symbolic, statistical, network-based, machine learning and hybrid approaches to goal-driven inference from very large knowledge-bases. Create a goal- directed framework for selecting and combining these inference methods to address particular analytical problems. 3. Overcome barriers to broad external adoption of developed methods by analyzing their computational complexity, optimizing performance of knowledge-based querying and inference, developing simplified, biology-focused query languages, lightweight packaging of knowledge resources and systems, and addressing issues of licensing and data redistribution.

Public Health Relevance

Knowledge-based biomedical data science In the previous funding period, we designed and constructed breakthrough methods for creating a semantically coherent and logically consistent knowledge-base by automatically transforming and integrating many biomedical databases, and by directly extracting information from the literature. We now hypothesize that using these technologies we can build knowledge-bases with broad enough coverage to overcome the ?brittleness? problems that stymied previous approaches to symbolic artificial intelligence, and then create novel computational methods which leverage that knowledge to provide critical new tools for the interpretation and analysis of biomedical data.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
5R01LM008111-14
Application #
9743225
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2004-09-30
Project End
2022-06-30
Budget Start
2019-07-01
Budget End
2020-06-30
Support Year
14
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Colorado Denver
Department
Pharmacology
Type
Schools of Medicine
DUNS #
041096314
City
Aurora
State
CO
Country
United States
Zip Code
80045
Boguslav, Mayla; Cohen, K Bretonnel; Baumgartner, William A et al. (2018) Improving precision in concept normalization. Pac Symp Biocomput 23:566-577
Cohen, K Bretonnel; Xia, Jingbo; Zweigenbaum, Pierre et al. (2018) Three Dimensions of Reproducibility in Natural Language Processing. LREC Int Conf Lang Resour Eval 2018:156-165
Callahan, Tiffany J; Baumgartner, William A; Bada, Michael et al. (2018) OWL-NETS: Transforming OWL Representations for Improved Network Inference. Pac Symp Biocomput 23:133-144
Hooper, Joan E; Feng, Weiguo; Li, Hong et al. (2017) Systems biology of facial development: contributions of ectoderm and mesenchyme. Dev Biol 426:97-114
Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young et al. (2017) Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. BMC Bioinformatics 18:372
Kao, David P; Stevens, Laura M; Hinterberg, Michael A et al. (2017) Phenotype-Specific Association of Single-Nucleotide Polymorphisms with Heart Failure and Preserved Ejection Fraction: a Genome-Wide Association Analysis of the Cardiovascular Health Study. J Cardiovasc Transl Res 10:285-294
Hunter, Lawrence E (2017) Knowledge-based biomedical Data Science. EPJ Data Sci 1:19-25
Greene, Casey S; Garmire, Lana X; Gilbert, Jack A et al. (2017) Celebrating parasites. Nat Genet 49:483-484
Oellrich, Anika; Collier, Nigel; Groza, Tudor et al. (2016) The digital revolution in phenotyping. Brief Bioinform 17:819-30
Cohen, K Bretonnel; Fort, Karën; Adda, Gilles et al. (2016) Ethical Issues in Corpus Linguistics And Annotation: Pay Per Hit Does Not Affect Effective Hourly Rate For Linguistic Resource Development On Amazon Mechanical Turk. LREC Int Conf Lang Resour Eval 2016:8-12

Showing the most recent 10 out of 91 publications