Finding relevant information quickly is integral to effective and efficient decision making. This becomes increasingly difficult as the scale and heterogeneity of data continue to grow rapidly. Question answering (QA) systems, which aim to find precise answers to natural language questions from users, have shown great potential to address this problem. However, state-of-the-art QA systems still largely fall short in the following scenarios: (1) when questions are ambiguous and/or complex (e.g., involving multiple relations and operators), (2) when answering questions requires background knowledge that is not readily available in the data, and (3) when users need to understand the system’s answering process in order to better judge its trustworthiness. Such scenarios are prevalent in real application domains of QA (such as healthcare, finance, and sciences), and must be addressed in building practical systems. This project aims to develop a new QA model that can interact with users to resolve ambiguity and uncertainty during the answering process, and can tackle challenging problems such as identifying when requesting feedback from the user is necessary while achieving the optimal trade-off between answer quality and interaction cost. The project further aims to improve the QA model’s transparency by decomposing a complex question into several intermediate sub-questions and allowing users to validate them. The expected results can thus contribute to future human-technology partnership by enabling QA models to be more interactive, more transparent, and hence more trustworthy. The proposed QA model will be tested in a clinical domain, where doctors often ask questions about a patient and look for answers from his/her clinical notes in Electronic Medical Records (EMRs). Such a QA model can enable doctors to effectively and efficiently query EMRs and gather relevant evidence for critical decision making. The project plans to engage high school students and undergraduates, especially from underrepresented groups, and prepare them for future education and employment opportunities.

This project will contribute a new, learnable interactive QA model, which will detect the ambiguities and uncertainties during the answering process and interact with users in a natural fashion to seek clarifications. Moreover, the QA model will learn from such interactions to simultaneously improve answer quality and reduce human intervention over time, using imitation and reinforcement learning based frameworks. This project will further advance the QA model with a novel question decomposition component, which decomposes a compositional question into simpler sub-questions and can enhance the transparency of the answering procedure by allowing users to validate the sub-questions (i.e., confirming or correcting the sub-questions). To effectively train the QA model with limited human cost (for providing feedback or training data), the team will explore new learning strategies such as designing user simulators and weak supervision mechanisms. When applying the QA model to the clinical domain, this project will develop novel solutions to domain-specific challenges, such as how to incorporate background biomedical knowledge into a general QA model and how to create high-quality clinical QA datasets at a low cost. The team will closely collaborate with doctors and physicians for model evaluation and actively seek technology transfer opportunities. All datasets, software and demos will be publicly accessible via the investigator’s website. Potential research findings will be disseminated in computer science and medical informatics related venues and will be integrated into existing and new courses.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1942980
Program Officer
Hector Munoz-Avila
Project Start
Project End
Budget Start
2020-06-01
Budget End
2025-05-31
Support Year
Fiscal Year
2019
Total Cost
$93,657
Indirect Cost
Name
Ohio State University
Department
Type
DUNS #
City
Columbus
State
OH
Country
United States
Zip Code
43210