Realization of precision medicine ideas requires an unprecedented rapid pace of translation of biomedical discoveries into clinical practice. However, while many non-canonical disease pathways and uncommon drug actions, which are of vital importance for understanding individual patient-specific disease pathways, are accumulated in the literature, most are not organized in databases. Currently, such knowledge is curated manually or semi-automatically in a very limited scope. Meanwhile, the volume of biomedical information in PubMed (currently 28 million publications) keeps growing by more than a million articles per year, which demands more efficient and effective biocuration approaches. To address this challenge, a novel biocuration method for automatic extraction of disease pathways from figures and text of biomedical articles will be developed.
Specific Aim 1 : To develop focused benchmark sets of articles to assess the performance of the biocuration pipeline.
Specific Aim 2 : To develop a method for extraction of components of disease pathways from articles? figures based on deep-learning techniques.
Specific Aim 3 : To develop a method for reconstruction of disease-specific pathways through enrichment and through graph neural network (GNN) approaches.
Specific Aim 4 : To conduct a comprehensive evaluation of the pipeline. The overarching goal of this project is to develop a computer-based automatic biocuration ecosystem for rapid transformation of free-text biomedical literature into a machine-processable format for medical applications. The overall impact of the proposed project will be to significantly improve health outcomes in individualized patient cases by efficiently bringing the latest biomedical discoveries into a precision medicine setting. It will especially benefit cancer patients for which up-to-date knowledge of newly discovered molecular mechanisms and drug actions is critical.
The overall impact of the proposed project will be to significantly improve health outcomes in individualized patient cases by efficiently bringing the latest biomedical discoveries into a precision medicine setting. In this project, a novel biocuration method for an automatic extraction of disease mechanisms from figures and text in scientific literature will be developed. These mechanisms will be stored in a database for further querying to assist in medical diagnosis and treatment.