Plants have been acknowledged as forming the basis of medicines dating back to the most ancient civilizations. To complement synthetic drug discovery processes, there remains a significant opportunity for identifying potential new therapies from plant-based sources (""""""""phyto-therapies""""""""). Current approaches used for the discovery of potential phyto-therapies are laborious, time-consuming, and mostly manual. The increased availability of ethnobotanical and biomedical knowledge in digital formats suggests that there may be the potential to leverage automated techniques to facilitate the phyto-therapy discovery process. The long-term goal of this initiative is thus to develop a semantically integrated framework that could be used to identify and validate potential phyto-therapies embedded within ethnobotanical and biomedical knowledge sources, and thus encourage the conservation of this knowledge and biodiversity. The overall project is built around three major aims, which are to: (1) develop a standards-driven gold standard that can be used for benchmarking automated phyto-therapy identification approaches;(2) develop an automated approach to identify potential phyto-therapies from digitized biodiversity literature (Biodiversity Heritage Library), biomedical literature citations (MEDLINE) or digital full-text (PubMed Central), genomic (GenBank), clinical trial (, and chemical (PubChem) resources;and (3) leverage vector space modeling techniques to predict the relevance of potential phyto-therapies. The success of this endeavor will set the stage for the translation of a growing, but currently disjointed, evidence-base of medicinal plant knowledge into tools for the elucidation of potential phyto-therapies. Furthermore, through achieving these aims, this project will also establish a first-of- its-kind in silico platform that could be extended to identify additional therapeutics from a broad spectrum of biodiversity sources. The core aspects of this project will build on experience with developing computational techniques to bridge biodiversity and biomedical knowledge, including those that have been pioneered by the research team. This project will bring together biomedical informatics, library science, and ethnobotany experience and expertise from two institutions: the University of Vermont and The New York Botanical Garden. The multi- institutional and multi-PI aspects of this project support the feasibility of the proposed project aims and will furthermore enable the load-balancing of essential tasks such that they may meet the proposed milestones set for each aim. To this end, the success of the proposed endeavor will be built on a foundation of experiences in gathering ethnobotanical knowledge, analyzing and linking biodiversity and biomedical knowledge sources, and developing approaches for systematically annotating corpora for subsequent purposes in support of natural language processing and data mining pursuits.

Public Health Relevance

The identification of potential therapies is a significant area of research with direct public health implications. As such, the integration of knowledge from traditionally disjoint knowledge sources may offer a more holistic view of the ethnobotanical and biomedical research knowledge that can support the development of new disease treatment regimens.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Vermont & St Agric College
Schools of Medicine
United States
Zip Code
Sharma, Vivekanand; Sarkar, Indra Neil (2018) Identifying natural health product and dietary supplement information within adverse event reporting systems. Pac Symp Biocomput 23:268-279
Zhang, Patrick M; Sarkar, Indra Neil (2018) Exploring the Potential of Direct-To-Consumer Genomic Test Data for Predicting Adverse Drug Events. AMIA Jt Summits Transl Sci Proc 2017:247-256
Sharma, Vivekanand; Sarkar, Indra Neil (2018) Identifying Supplement Use Within Clinical Notes: An Applicationof Natural Language Processing. AMIA Jt Summits Transl Sci Proc 2017:196-205
Sharma, Vivekanand; Law, Wayne; Balick, Michael J et al. (2017) Harnessing Biomedical Natural Language Processing Tools to Identify Medicinal Plant Knowledge from Historical Texts. AMIA Annu Symp Proc 2017:1537-1546
Sharma, Vivekanand; Law, Wayne; Balick, Michael J et al. (2016) Identifying Plant-Human Disease Associations in Biomedical Literature: A Case Study. AMIA Jt Summits Transl Sci Proc 2016:84-93
Sharma, Vivekanand; Holmes, John H; Sarkar, Indra N (2016) Identifying Complementary and Alternative Medicine Usage Information from Internet Resources. A Systematic Review. Methods Inf Med 55:322-32