Software engineering emerged out of a need to better address the real-world development of software, transforming the practice of software development from an art, into a craft, and eventually into an engineering discipline. In contrast, the effective application of statistical machine learning algorithms and techniques currently remains an art, based primarily in expert intuition and experience. This project is an integrated program of research and education targeting researcher, student, and practitioner application of statistical machine learning algorithms and techniques as tools for software development. The research focuses on the design of human-centered computing through the investigation of the needs of people applying statistical machine learning as a tool for software development, the creation of new methods and tools supporting the development and deployment of applications that use statistical machine learning, and the evaluation of these new methods and tools in supporting the complex and exploratory task of applying statistical machine learning. There are two primary foci in a new proposed toolset: explicit support for history and experimentation in an iterative and exploratory process and new opportunities for mixed-initiative tools to aid that process. By examining the application of statistical machine learning as a craft, this research enables the potential broad impact of statistical machine learning as a tool for software development in human-centered computing research and applications.

Project Report

The overall focus of the project is: (1) investigating the needs of people applying statistical machine learning in the course of their software development, (2) creating new methods and tools supporting the development and deployment of applications that use statistical machine learning, and (3) evaluating these new methods and tools in supporting the complex and exploratory task of applying statistical machine learning. We have pursued these high-level goals in several contexts. Major findings correspond to the activities previously discussed. Our work with Kylin shows the synergistic potential for large-scale integration of machine learning with human feedback. Our work with Gestalt shows that integrated support for implementation and analysis is critical to general tools for machine learning applications. Our work with Panoramic shows methods for presenting usable representations of low?level state-of-the-art statistical methods. Our work with CueFlik distills general strategies for the effective design of example-based end-user interactive machine learning applications. Our work with Prospect demonstrates a new and promising approach to using multiple classifiers to support developers as they build insight into their data. Our work with OASIS revealed important abstractions for enabling end-user intervention and interaction with modern object recognition systems. Our work with ReGroup demonstrated another important potential application of end-user interactive machine learning as well as several novel methods for incorporating end-user feedback. Our work with BeatBox has identified initial results with regard to the greater complexities of end-user specification of multiple classes. Our work with Hindsight examines the role of automated experimentation support in a development environment. Our research has shown some of the possibilities that are enabled by research at the intersection of statistical machine learning and human-computer interaction. This includes our development of new methods and tools supporting software developer adoption of statistical machine learning as well as our demonstrations of new applications of machine learning in human-centered computing applications. This award directly and indirectly contributed to the training of a number of graduate and undergraduate students. These students have gained extensive experience with human-computer interaction research, statistical machine learning, software development, experimental design, and statistical analyses. The award also provided primary support for two completed doctoral students. Our CueFlik research directly informed Microsoft’s December 2008 release of a ‘Show Similar Images’ feature on Microsoft’s Live Image Search (now Bing Image Search). Google released a similar feature in April 2009. On of the graduate students on the project worked with the Google+ team during Winter 2012 in order to help explicitly transfer some of the ideas from our ReGroup research. We expect that one graduate's position at Microsoft Research and another graduate's position at Google will continue to provide direct industrial impact for the ideas developed in this award.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0812590
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2008-09-01
Budget End
2012-08-31
Support Year
Fiscal Year
2008
Total Cost
$505,527
Indirect Cost
Name
University of Washington
Department
Type
DUNS #
City
Seattle
State
WA
Country
United States
Zip Code
98195