When confronted with large data sets, viewers use summaries to gain an overview of the data. For collections of complex objects, such as documents, medical records, or even chairs, standard statistical summaries either do not apply, or are difficult for viewers to interpret. Our project will develop the use of subset summaries: providing overviews of large collections by selecting a small number of data items and presenting them to the viewer. While the basic idea is straightforward, there is a lack of understanding to use summarization by subset effectively. The field lacks knowledge of how people interpret subsets to use them to accomplish overview tasks. There are no guidelines on how to choose the design of subsets, i.e., the choice of elements and their presentation. The premise of this project is that by developing a better understanding of how viewers interpret subsets, visualization creators can more appropriately choose designs to make the subsets more effective tools for summarization. The project has the potential for broad impact by providing general methods for data presentation that can be used across a wide variety of domains and through its plans for education and outreach to promote visualization proficiency and encourage a broad range of students to consider data topics.

This project develop subsetting as a summarization method for overview. The work will interleave three research threads. First, the project will develop the foundations of subset summarization, adding to our understanding of how people interpret subsets to perform overview tasks. The project will use empirical studies to provide connection between design choices and task performance, which will allow us to refine theories and choose methods. Second, the project will develop a library of methods for using subsetting as a summarization method. This will include algorithms for selecting appropriate subsets, display methods to present the subsets in tabular form, and interaction techniques to move through the data set by changing the visualization subset. Development will consider approaches to embedding complex objects into metric spaces, sampling approaches for selection in these spaces, and tabular presentation and selection mechanisms to provide better user experiences. Third, the project will develop practical implementations of the methods in general toolkits and provide guidelines that advise practitioners on how to employ subset selection in their systems. The project will disseminate its results through publications, guideline documents, open source toolkits, and example applications.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
2007436
Program Officer
Sylvia Spengler
Project Start
Project End
Budget Start
2020-10-01
Budget End
2023-09-30
Support Year
Fiscal Year
2020
Total Cost
$318,545
Indirect Cost
Name
University of Wisconsin Madison
Department
Type
DUNS #
City
Madison
State
WI
Country
United States
Zip Code
53715