Since the publication of the NSF landmark report "Visualization in Scientific Computing" in 1987, computer-aided visualization has been recognized as one of the most potent tool sets for scientific discovery. However, discoveries based on data displays are often criticized because they are not secured by statistical inference. The team of researchers from Iowa State University, Rice University and University of Pennsylvania is addressing exactly this issue by bringing the rigors of statistical inference to visual data exploration. Statistical inference for plots are cast as comparison of a plot of the actual data with plots of null data simulated under a null hypothesis. If the actual plot stands out from a background of "null plots", it amounts to the rejection of the null hypothesis. Executing this idea leads to rigorous protocols that can confer proper statistical significance to visual discoveries. Tools of mathematical statistics are employed to reduce composite null hypotheses to single reference distributions: conditioning on a minimal sufficient statistic, bootstrap plug-ins, and posterior predictive sampling. The protocols also have the potential to shift the perception of exploration-based findings in the scientific communities and dramatically increase the impact that these findings are allowed to have. The testing protocols will be made accessible with implementation in the open-source R language.

Data graphics are an essential part of communicating information. But how reliable is the information that we gather from them? The investigators will develop a rigorous framework for visual inference modeled after formal statistical testing. This framework allows the reader of a graphic to determine whether structure is real or spurious (is that a man in the moon, or just some rocks?). These protocols have the potential to shift the perception of exploration-based findings in the scientific community and dramatically increase the impact of exploratory work. Some aspects of the protocols are so intuitive that they can be used for general audiences and integrated in the teaching of introductory statistics at from grade school to college.

Project Report

Data visualization, has been part of statistical practice for a long time, but it has suffered from a stigma: Discoveries based on data displays are suspect because they are not secured by statistical inference. In the eyes of the statistics community, visual data exploration and statistical inference are incompatible. The goal of this project was to bring some of the rigors of statistical inference to visual data exploration. We published a paper at Infovis 2010, "Graphical inference for infovis" and recieved a best paper award. As of 2014, this work has been cited over 20 (http://scholar.google.com/citations?view_op=view_citation&hl=en&user=YA43PbsAAAAJ&citation_for_view=YA43PbsAAAAJ:ufrVoPGSRksC) indicating it's impact on the field. The project has contributed to the intellectual development of three undergraduate students. These students has developed their understanding of statistical inference, data visualization, and R programming skills. We gained an improved understanding of how three different visualisations (histogram, rootogram and hanging rootogram) display densities. We built an prototype infrastructure to perform experiments using amazon's mechanical turk. The difficulty of getting funding for this work, and the way in which the NSF handled budget cuts (i.e. smallest impact on most senior co-PI, largest impact on most junior co-PI), the amount of busy-work required for the sum of money awarded and the general hostility towards visualisation research were contributing factors to my departure from academia.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1007877
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2010-09-15
Budget End
2013-08-31
Support Year
Fiscal Year
2010
Total Cost
$59,808
Indirect Cost
Name
Rice University
Department
Type
DUNS #
City
Houston
State
TX
Country
United States
Zip Code
77005