Charts and graphs are ubiquitous forms of data representations, appearing in scientific papers, textbooks, reports, news articles and webpages. These visualizations leverage human visual processing to efficiently convey large amounts of quantitative information, and to illustrate trends and differences in the data. But, while people can easily interpret data from charts and graphs, machines cannot directly access this data. Today, a vast trove of information is locked inside data visualizations. This proposal will develop techniques for extracting data and structure from such visualizations and thereby enable data further analysis, reuse and new forms of indexing across the collection of existing charts and graphs. Some of the applications will be specifically designed to improve the accessibility of visualizations for visual impaired users. The tools will provide a novel computational infrastructure for knowledge integration and sharing and impact a broad range of users including scientists, journalists, economists, social scientists, and educators.

Specifically this proposal addresses three main goals. First, it develops computational models for interpreting visualizations to extract the underlying data, graphical marks, and mappings that relate the data to mark attributes. The approach will be informed by recent work on human perception and cognition of visualizations. The aim is to build generalized computational models that can accurately extract data from visualizations and also mimic the way people decode information from visualizations. Second, it supports development of a suite of applications that enable analysis and repurposing of visualizations and data. Third, it applies automated visualization interpretation techniques at Internet scale and develops a search engine that indexes visualizations based on their underlying data and graphical structure. The search engine will accelerate data-driven analysis and discovery by facilitating browsing and retrieval of data that is currently locked in computationally inaccessible visualizations. The project website will include information on the project and provide access to resulting publications, software and datasets.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
1714647
Program Officer
Hector Munoz-Avila
Project Start
Project End
Budget Start
2017-09-01
Budget End
2021-08-31
Support Year
Fiscal Year
2017
Total Cost
$499,419
Indirect Cost
Name
Stanford University
Department
Type
DUNS #
City
Stanford
State
CA
Country
United States
Zip Code
94305