Realization of precision medicine ideas requires an unprecedented rapid pace of translation of biomedical discoveries into clinical practice. However, while many non-canonical disease pathways and uncommon drug actions, which are of vital importance for understanding individual patient-specific disease pathways, are accumulated in the literature, most are not organized in databases. Currently, such knowledge is curated manually or semi-automatically in a very limited scope. Meanwhile, the volume of biomedical information in PubMed (currently 28 million publications) keeps growing by more than a million articles per year, which demands more efficient and effective biocuration approaches. To address this challenge, a novel biocuration method for automatic extraction of disease pathways from figures and text of biomedical articles will be developed.
Specific Aim 1 : To develop focused benchmark sets of articles to assess the performance of the biocuration pipeline.
Specific Aim 2 : To develop a method for extraction of components of disease pathways from articles? figures based on deep-learning techniques.
Specific Aim 3 : To develop a method for reconstruction of disease-specific pathways through enrichment and through graph neural network (GNN) approaches.
Specific Aim 4 : To conduct a comprehensive evaluation of the pipeline. The overarching goal of this project is to develop a computer-based automatic biocuration ecosystem for rapid transformation of free-text biomedical literature into a machine-processable format for medical applications. The overall impact of the proposed project will be to significantly improve health outcomes in individualized patient cases by efficiently bringing the latest biomedical discoveries into a precision medicine setting. It will especially benefit cancer patients for which up-to-date knowledge of newly discovered molecular mechanisms and drug actions is critical.

Public Health Relevance

The overall impact of the proposed project will be to significantly improve health outcomes in individualized patient cases by efficiently bringing the latest biomedical discoveries into a precision medicine setting. In this project, a novel biocuration method for an automatic extraction of disease mechanisms from figures and text in scientific literature will be developed. These mechanisms will be stored in a database for further querying to assist in medical diagnosis and treatment.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM013392-01
Application #
9987133
Study Section
Special Emphasis Panel (ZLM1)
Program Officer
Vanbiervliet, Alan
Project Start
2020-05-01
Project End
2024-02-29
Budget Start
2020-05-01
Budget End
2021-02-28
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Missouri-Columbia
Department
Pathology
Type
Schools of Medicine
DUNS #
153890272
City
Columbia
State
MO
Country
United States
Zip Code
65211