Natural language processing (NLP) technology is now widespread (e.g. Google Translate) and has several important applications in biomedical research. We propose a new target for NLP: extraction of scientific questions stated in publications. A system that automatically captures and organizes scientific questions from across the biomedical literature could have a wide range of significant impacts, as attested to in our diverse collection of support letters from researchers, journal editors, educators and scientific foundations. Prior work focused on making binary (or probabilistic) assessments of whether a text is hedged or uncertain, with the goal of downgrading such statements in information extraction tasks?not computationally capturing what the uncertainty is about. In contrast, we propose an ambitious plan to identify, represent, integrate and reason about the content of scientific questions, and to demonstrate how this approach can be used to address two important new use cases in biomedical research: contextualizing experimental results and enhancing literature awareness. Contextualizing results is the task of linking elements of genome-scale results to open questions across all of biomedical research. Literature awareness is the ability to understand important characteristics of large, dynamic collections of research publications as a whole. We propose to produce rich computational representations of the dynamic evolution of research questions, and to prototype textual and visual interfaces to help students and researchers explore and develop a detailed understanding of key open scientific questions in any area of biomedical research.

Public Health Relevance

The scientific literature is full of statements of important unsolved questions. By using artificial intelligence systems to identify and categorize these questions, the proposed work would help other researchers discover when their findings might address an important question in another scientific area. This work would also make it easier for students, journal editors, conference organizers and others understand where science is headed by tracking the evolution of scientific questions.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Research Project (R01)
Project #
1R01LM013400-01A1
Application #
10069773
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Vanbiervliet, Alan
Project Start
2020-08-01
Project End
2024-07-31
Budget Start
2020-08-01
Budget End
2021-07-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
University of Colorado Denver
Department
Pharmacology
Type
Schools of Medicine
DUNS #
041096314
City
Aurora
State
CO
Country
United States
Zip Code
80045