The Text Mining Pipeline to Accelerate Systematic Reviews in Evidence-Based Medicine will combine important research in several areas of biomedical text mining that are necessary to enable much-needed improvements in the process of conducting systematic reviews via a text mining enhanced workflow. Our consortium will undertake three specific aims to support this work:
Aim 1. Study how to create a metasearch engine and database that collects information from important systematic review sources, indexes this information consistently, and provides a robust information retrieval system with high recall and precision for accessing this expanded literature collection.
Aim 2. Study how to create a literature classification and ranking system that is customizable and trainable for each user, systematic review group, and systematic review topic. This supervised learning based classification and ranking system takes as input the list of retrieved articles corresponding to a given query, and outputs them grouped by article type, in order of predicted probability of relevance to an individual writing a systematic review on the given topic.
Aim 3. Study how to create a study aggregator that collects together articles that refer to the same underlying clinical trial. This will save reviewers work and time as they will now have automated assistance in determining whether two articles are independent data sources, or derive their evidence from the same primary data. Taken together, these results will inform construction of a text mining pipeline system that will decrease the manual burden of systematic reviewers during the literature collection and review process, and increase the proportion of reviewer time spent synthesizing evidence and performing meta-analyses. The system will lead to a real difference in the rate that high-quality evidence reports can be compiled. Ultimately, the coverage, dissemination, and acceptance of evidence- based medicine in the biomedical community will increase, resulting in better and more cost- effective clinical care.
This project will improve the process of summarizing the best available medical evidence for a wide range of medical conditions. These summaries are utilized by both medical practitioners and policy makers as an essential component of providing higher quality, more cost-effective medical care for everyone.
|Smalheiser, Neil R; Bonifield, Gary (2016) Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation. J Biomed Discov Collab 7:e1|
|Smalheiser, Neil R; Shao, Weixiang; Yu, Philip S (2015) Nuggets: findings shared in multiple clinical case reports. J Med Libr Assoc 103:171-6|
|Shao, Weixiang; Adams, Clive E; Cohen, Aaron M et al. (2015) Aggregator: a machine learning approach to identifying MEDLINE articles that derive from the same underlying clinical trial. Methods 74:65-70|
|Cohen, Aaron M; Smalheiser, Neil R; McDonagh, Marian S et al. (2015) Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. J Am Med Inform Assoc 22:707-17|
|Jiang, Yu; Lin, Can; Meng, Weiyi et al. (2014) Rule-based deduplication of article records from bibliographic databases. Database (Oxford) 2014:bat086|
|D'Souza, Jennifer L; Smalheiser, Neil R (2014) Three journal similarity metrics and their application to biomedical journals. PLoS One 9:e115681|
|Edinger, Tracy; Cohen, Aaron M (2013) A large-scale analysis of the reasons given for excluding articles that are retrieved by literature search during systematic review. AMIA Annu Symp Proc 2013:379-87|
|Cohen, Aaron M; Ambert, Kyle; McDonagh, Marian (2012) Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Mak 12:33|