HCC-Medium: End-user debugging of machine-learned programs

Burnett, Margaret; Dietterich, Thomas; Stumpf, Simone; Wong, Weng-Keen

Abstract

This is a project to give the end user some ability to debug programs that were written by a machine instead of a person, especially when the users are not expert programmers. This is the problem faced by users of a new sort of program, namely, one generated by a machine learning system. For example, intelligent user interfaces, categorizers of email and web sites, and recommender systems use machine learning to learn how to behave. This learned set of behaviors is a program. Learned programs do not come into existence until the learning environment has left the hands of the machine learning specialist, because they learn from the user's ongoing data. Thus, if these programs make a mistake, the only one present to debug them is the user. Giving end users the ability to debug such programs can improve the speed and accuracy of these systems.

Specifically, the project envisions a fine-grained, iterative, interactive debugging process. First, a user notices an erroneous classification (with the system's help, based on reasoning about its own competence), such as an email message that might be misfiled. Second, the user asks for an explanation. Third, using the system's explanation, the user provides reasoning constraints, declaring, for example, that "today" is not an important word, and that anything from the company president should go into the "company" folder. The learned program reevaluates competence models and redoes its reasoning, giving the user an opportunity to immediately see the result of the change. The loop then begins again. Thus, the goals of this project are the following: 1. To help users identify reasoning problems, and to provide explanations of the behavior of machine-learned programs suitable for end users. 2. To elicit rich feedback from the user, incorporating it into the reasoning of the learned program. 3. To improve the speed and accuracy of machine learning by integrating this rich feedback into learning.

In addition to the potential speed and accuracy improvement in the machine learner, users may become more productive and make fewer errors. Providing disclosure of the learned programs' reasoning engenders trust, and with it, increased willingness to use the system. Thus, this project has the potential to make significant advances in the user acceptance of machine learning in a variety of new, real-world applications. Combining human constraints and guidance with statistical learning could enable highly accurate learning from small data sets, which is critical to creating successful intelligent user interfaces. The project will also result in learning systems whose data sources and input features are easy to change and whose behavior is easy to control. In combining human-computer interaction principles with machine learning, this project opens opportunities for novel perspectives, especially in the realm of interdisciplinary education. Graduate students will be trained in this blended research area, and aspects of it will be incorporated in classes in both human-computer interaction and machine learning, and in other educational experiences for undergraduates and high school students.

Project Report

Our findings and results have shown that it is feasible for end users to interactively debug machine learning systems and intelligent agents. We developed, prototyped, and empirically evaluated several approaches that accomplish this capability, which we describe next in the context of each objective. Objective 1: To improve user acceptance of machine learning by providing explanations of the behavior of intelligent agents (based on machine learning) that are suitable for end users untrained in computer science. Key Results: We developed a technology called WYSIWYT/ML that allows end users to quickly assess how much trust they should place in a particular intelligent agent. Our empirical investigation showed that WYSIWYT/ML allows ordinary users to identify mistakes of a machine learning assistant by displaying various visualizations about each test instance: in only 10 minutes, ordinary users without computer science backgrounds were able to judge enough instances to adequately cover over 1000 cases. We also compared three different strategies of selecting test instances for helping users quickly assess their intelligent agents. We found that selecting test instances in which the agent was "least confident" worked overall, but that using algorithms based on "unusual instances" and "least-common features" were better able to reveal "surprise" mistakes. We developed a crowdsourced variant of WYSIWYT/ML to investigate whether using very small "crowd" of end users (mini-crowdsourcing) to assess a machine-learning assistant is useful from a cost/benefit perspective. Mini-crowds are relevant when an intelligent agentâ€™s task is to serve a small group, such as a family or a team at work. Our empirical investigation showed that the mini-crowd supplied many more benefits besides the obvious decrease in workload: a crowd of as few as six found more errors, tested more of its logic, and introduced enough redundancy to reduce crowd mistakes. Objective 2: To help ordinary users identify reasoning problems in their intelligent agents, and then allow the users to provide rich feedback that can then be incorporated into the reasoning of the agent. Key Results: In identifying reasoning problems, users compare an intelligent agentâ€™s reasoning against a mental model they have formed as to how the reasoning is done or should be done. They form such mental models based on their own prior experiences, their assumptions, and what they see the agent do. Our empirical investigations showed that a userâ€™s mental model has a strong effect on the ability for the user to learn about the agent and to provide feedback to the agent that is useful enough to the agent to help it succeed better. Building upon that result, we then investigated how an agent should attempt to explain its reasoning to ordinary users. We created a range of variants of explanation fidelity along two dimensions: soundness (which is whether an explanation contains any untruths, usually via oversimplifications) and completeness (which is the extent of the reasoning factors that are actually described in an explanation). We found that using high soundness and high completeness led to improved trust and improved understanding of the agent, and that, surprisingly, users were willing to put in the time and effort to process these higher fidelity explanations in order to improve their agents. Objective 3: To enable the user to improve the speed and/or accuracy of their intelligent agentâ€™s machine learning. Key Results: We have developed ways for users to modify explanations themselves (not just example answers) to explain to the agent how it should change its reasoning. Our algorithms and user interface prototypes have empirically shown that users can be efficient and effective at guiding their agents to better, more correct behavior. These users do not have to spend much time doing so, and ordinary users can do so effectively—no prior background in computer science is required. Broader Impacts: This work has been the first to show ways in which it is possible to support ordinary end users controlling and correcting their own intelligent agents, from user experience through algorithms. In so doing, we have devised novel ways for ordinary end users to assess, understand, and provide feedback to their intelligent agents. This grant has also funded members of underrepresented groups in computer science (females) to participate in this research team at every level: high school students, undergraduate students, graduate students, postdoctoral scholars, and faculty members have all included members of this underrepresented group in computer science.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Application #: 0803487
Program Officer: William Bainbridge

Project Start
Project End
Budget Start: 2008-10-01
Budget End: 2013-09-30
Support Year
Fiscal Year: 2008
Total Cost: $929,362
Indirect Cost

HCC-Medium: End-user debugging of machine-learned programs
Burnett, Margaret Dietterich, Thomas Stumpf, Simone Wong, Weng-Keen
Oregon State University, Corvallis, OR, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments