Plants have been acknowledged as forming the basis of medicines dating back to the most ancient civilizations. To complement synthetic drug discovery processes, there remains a significant opportunity for identifying potential new therapies from plant-based sources (phyto-therapies). Current approaches used for the discovery of potential phyto-therapies are laborious, time-consuming, and mostly manual. The increased availability of ethnobotanical and biomedical knowledge in digital formats suggests that there may be the potential to leverage automated techniques to facilitate the phyto-therapy discovery process. The long-term goal of this initiative is thus to develop a semantically integrated framework that could be used to identify and validate potential phyto-therapies embedded within ethnobotanical and biomedical knowledge sources, and thus encourage the conservation of this knowledge and biodiversity. The overall project is built around three major aims, which are to: (1) develop a standards-driven gold standard that can be used for benchmarking automated phyto-therapy identification approaches; (2) develop an automated approach to identify potential phyto-therapies from digitized biodiversity literature (Biodiversity Heritage Library), biomedical literature citations (MEDLINE) or digital full-text (PubMed Central), genomic (GenBank), clinical trial (ClinicalTrials.gov), and chemical (PubChem) resources; and (3) leverage vector space modeling techniques to predict the relevance of potential phyto-therapies. The success of this endeavor will set the stage for the translation of a growing, but currently disjointed, evidence-base of medicinal plant knowledge into tools for the elucidation of potential phyto-therapies. Furthermore, through achieving these aims, this project will also establish a first-of- its-kind in silico platform that could be extended to identify additional therapeutics from a broad spectrum of biodiversity sources. The core aspects of this project will build on experience with developing computational techniques to bridge biodiversity and biomedical knowledge, including those that have been pioneered by the research team. This project will bring together biomedical informatics, library science, and ethnobotany experience and expertise from two institutions: the University of Vermont and The New York Botanical Garden. The multi- institutional and multi-PI aspects of this project support the feasibility of the proposed project aims and will furthermore enable the load-balancing of essential tasks such that they may meet the proposed milestones set for each aim. To this end, the success of the proposed endeavor will be built on a foundation of experiences in gathering ethnobotanical knowledge, analyzing and linking biodiversity and biomedical knowledge sources, and developing approaches for systematically annotating corpora for subsequent purposes in support of natural language processing and data mining pursuits.
The identification of potential therapies is a significant area of research with direct public health implications. As such, the integration of knowledge from traditionally disjoint knowledge sources may offer a more holistic view of the ethnobotanical and biomedical research knowledge that can support the development of new disease treatment regimens.