Most biomedical text mining systems target only text information and do not provide intelligent access to other important data such as Figures. More than any other documentation, figures usually represent the "evidence" of discovery in the biomedical literature. Full-text biomedical articles nearly always incorporate images that are the crucial content of biomedical knowledge discovery. Biomedical scientists need to access images to validate research facts and to formulate or to test novel research hypotheses. Evaluation has shown that textual statements reported in the literature are frequently noisy (i.e., contain "false facts"). Capturing images that are essentially experimental "evidence" to support the textual "fact" will benefit biomedical information systems, databases, and biomedical scientists. We are developing a biomedical literature figure search engine BioFigureSearch. We develop innovative algorithms and models in natural language processing, image processing, machine learning and user interfacing. The deliverables will be novel biomedical natural language figure processing (bNLfP) algorithms and iBioFigureSearch allowing biomedical scientists to access figure data effectively, and open-source tools that will enhance biomedical information retrieval, summarization, and question answering. The bNLfP algorithms we will be developing can be applied or integrated into other biomedical text-mining systems.

Public Health Relevance

This project proposes innovative algorithms and models in natural language processing, image processing, machine learning, and user interfacing, to return figures in response to biomedical queries. It is anticipated that the algorithms, models, and tools developed will significantly enhance biomedical scientists'access to figures reported in literature, and thereby expedite biomedical knowledge discovery.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-HDM-C (02))
Program Officer
Brazhnik, Paul
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Massachusetts Medical School Worcester
Other Health Professions
Schools of Medicine
United States
Zip Code
Liu, Feifan; Yu, Hong (2014) Learning to rank figures within a biomedical article. PLoS One 9:e61567
Zhang, Qing; Yu, Hong (2014) Computational approaches for predicting biomedical research collaborations. PLoS One 9:e111795
Zhang, Qing; Yu, Hong (2013) CiteGraph: a citation network system for MEDLINE articles and analysis. Stud Health Technol Inform 192:832-6
Liu, Feifan; Moosavinasab, Soheil; Agarwal, Shashank et al. (2013) Automatically identifying health- and clinical-related content in wikipedia. Stud Health Technol Inform 192:637-41
Polepalli Ramesh, Balaji; Houston, Thomas; Brandt, Cynthia et al. (2013) Improving patients' electronic health record comprehension with NoteAid. Stud Health Technol Inform 192:714-8
Kim, Daehyun; Ramesh, Balaji Polepalli; Yu, Hong (2011) Automatic figure classification in bioscience literature. J Biomed Inform 44:848-58
Zhang, Qing; Cao, Yong-Gang; Yu, Hong (2011) Parsing citations in biomedical articles using conditional random fields. Comput Biol Med 41:190-4