?Semi-Automating Data Extraction for Systematic Reviews (?Renewal) Evidence-based Medicine (EBM) aims to inform patient care using all available evidence. Realizing this aim in practice would require access to concise, comprehensive, and up-to-date structured summaries of the evidence relevant to a particular clinical question. Systematic reviews of biomedical literature aim to provide such summaries, and are a critical component of the EBM arsenal and modern medicine more generally. However, such reviews are extremely laborious to conduct. Furthermore, owing to the rapid expansion of the biomedical literature base, they tend to go out of date quickly as new evidence emerges. These factors hinder the practice of evidence-based care. In this renewal proposal, we seek to continue our ground-breaking efforts on developing, evaluating, and deploying novel machine learning (ML) and natural language processing (NLP) methods to automate or semi-automate the evidence synthesis process. This will extend our innovative and successful efforts developing RobotReviewer and related technologies under the current grant. Concretely, for this renewal we propose to move from extraction of clinically salient data elements from individual trials to synthesis of these elements across trials.
Our first aim i s to extend our ML and NLP models to produce (as one deliverable) a publicly available, continuously and automatically updated semi-structured evidence database, comprising extracted data for all evidence, both published and unpublished. Unpublished trials will be identified via trial registries. Taking this up-to-date evidence repository as a starting point, we then propose cutting-edge ML and NLP models that will generate first drafts of evidence syntheses, automatically. More specifically we propose novel neural cross-document summarization models that will capitalize on the semi-structured information automatically extracted by our existing models, in addition to article texts. These models will be deployed in a new version of RobotReviewer, called RobotReviewerLive, intended to be a prototype for ?living? systematic reviews. To rigorously evaluate the practical utility of the proposed methodological innovations, we will pilot their use to support real, ongoing, exemplar living reviews.

Public Health Relevance

Semi-Automating Data Extraction for Systematic Reviews (?Renewal) Narrative We propose novel machine learning and natural language processing methods that will aid biomedical literature summarization and synthesis, and thereby support the conduct of evidence-based medicine (EBM). The proposed models and technologies will motivate core methodological innovations and support real-time, up-to-date, semi-automated biomedical evidence syntheses (?systematic reviews?). Such approaches are necessary if we are to have any hope of practicing evidence-based care in our era of information overload.

National Institute of Health (NIH)
National Library of Medicine (NLM)
Research Project (R01)
Project #
Application #
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Sim, Hua-Chuan
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Northeastern University
Schools of Arts and Sciences
United States
Zip Code
Marshall, Iain J; Noel-Storr, Anna; Kuiper, Joël et al. (2018) Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide. Res Synth Methods 9:602-614
Marshall, Iain J; Kuiper, Joël; Banner, Edward et al. (2017) Automating Biomedical Evidence Synthesis: RobotReviewer. Proc Conf Assoc Comput Linguist Meet 2017:7-12
Singh, Gaurav; Marshall, Iain J; Thomas, James et al. (2017) A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation. Proc ACM Int Conf Inf Knowl Manag 2017:1519-1528
Zhang, Ye; Marshall, Iain; Wallace, Byron C (2016) Rationale-Augmented Convolutional Neural Networks for Text Classification. Proc Conf Empir Methods Nat Lang Process 2016:795-804
Wallace, Byron C; Kuiper, Joël; Sharma, Aakash et al. (2016) Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision. J Mach Learn Res 17:
Yu, Zhiguo; Bernstam, Elmer; Cohen, Trevor et al. (2016) Improving the utility of MeSH® terms using the TopicalMeSH representation. J Biomed Inform 61:77-86