Most biomedical text mining systems target only text information and do not provide intelligent access to other important data such as Figures. More than any other documentation, figures usually represent the """"""""evidence"""""""" of discovery in the biomedical literature. Full-text biomedical articles nearly always incorporate images that are the crucial content of biomedical knowledge discovery. Biomedical scientists need to access images to validate research facts and to formulate or to test novel research hypotheses. Evaluation has shown that textual statements reported in the literature are frequently noisy (i.e., contain """"""""false facts""""""""). Capturing images that are essentially experimental """"""""evidence"""""""" to support the textual """"""""fact"""""""" will benefit biomedical information systems, databases, and biomedical scientists. We are developing a biomedical literature figure search engine BioFigureSearch. We develop innovative algorithms and models in natural language processing, image processing, machine learning and user interfacing. The deliverables will be novel biomedical natural language figure processing (bNLfP) algorithms and iBioFigureSearch allowing biomedical scientists to access figure data effectively, and open-source tools that will enhance biomedical information retrieval, summarization, and question answering. The bNLfP algorithms we will be developing can be applied or integrated into other biomedical text-mining systems.

Public Health Relevance

This project proposes innovative algorithms and models in natural language processing, image processing, machine learning, and user interfacing, to return figures in response to biomedical queries. It is anticipated that the algorithms, models, and tools developed will significantly enhance biomedical scientists'access to figures reported in literature, and thereby expedite biomedical knowledge discovery.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Massachusetts Medical School Worcester
Other Health Professions
Schools of Medicine
United States
Zip Code
Polepalli Ramesh, Balaji; Sethi, Ricky J; Yu, Hong (2015) Figure-associated text summarization and evaluation. PLoS One 10:e0115671
Yin, Xu-Cheng; Yang, Chun; Pei, Wei-Yi et al. (2015) DeTEXT: A Database for Evaluating Text Extraction from Biomedical Literature Figures. PLoS One 10:e0126200
Li, Yanpeng; Yu, Hong (2014) A robust data-driven approach for gene ontology annotation. Database (Oxford) 2014:bau113
Zhang, Qing; Yu, Hong (2014) Computational approaches for predicting biomedical research collaborations. PLoS One 9:e111795
Liu, Feifan; Yu, Hong (2014) Learning to rank figures within a biomedical article. PLoS One 9:e61567
Rastegar-Mojarad, Majid; Bales, Michael E; Yu, Hong (2013) Researchermap: a tool for visualizing author locations using Google maps. Stud Health Technol Inform 192:1187
Polepalli Ramesh, Balaji; Houston, Thomas; Brandt, Cynthia et al. (2013) Improving patients' electronic health record comprehension with NoteAid. Stud Health Technol Inform 192:714-8
Liu, Feifan; Moosavinasab, Soheil; Agarwal, Shashank et al. (2013) Automatically identifying health- and clinical-related content in wikipedia. Stud Health Technol Inform 192:637-41
Zhang, Qing; Yu, Hong (2013) CiteGraph: a citation network system for MEDLINE articles and analysis. Stud Health Technol Inform 192:832-6
Ramesh, Balaji Polepalli; Prasad, Rashmi; Miller, Tim et al. (2012) Automatic discourse connective detection in biomedical text. J Am Med Inform Assoc 19:800-8

Showing the most recent 10 out of 21 publications