All fruitful scientific and statistical analyses require assumptions. Some assumptions rightfully reflect past experience, present consensus, or future speculations. Others are imposed solely due to limitations of the investigation methods. Useful information in practice often comes in a vague, "low-resolution" form, like a blurred picture, both literally and figuratively. Currently, statistical models have largely relied on overly precise model structures, built upon a mix of sound scientific knowledge and some less verifiable assumptions. As models grow larger to accommodate the ever-growing volume and variety of data, statistical inference is faced with the pressing need to accurately and honestly express all types of low-resolution knowledge. Without adequate tools to deal with vague information, investigators are forced to concoct high-resolution assumptions that can neither be trusted nor invalidated in meaningful ways, the culprit in the ongoing crisis of irreplicable research. This project aims to provide scientists and statisticians both a theoretical framework and practical methods to tackle this challenge without having to abandon familiar probabilistic rules and tools, thereby strengthening the effort in reducing irreplicable scientific findings.

The need to reduce unwanted assumptions in scientific and statistical studies has led to an extensive literature on imprecise probability (IP), or more broadly, soft methods in probability and statistics (SMPS). As of today, both have received little attention from the statistics community, which generally equivocates on anything that does not obey precise probabilistic rules. This project demonstrates that both the precise probability and hard statistical principles have much to offer for studying IP and SMPS, with the fundamental realization that once going beyond precise probabilities, the learning rules by which we update the imprecise model must become the vehicle for implicit assumptions, explaining some paradoxes and puzzles that arise in IP and SMPS. With a clearer understanding of what IP/SMPS can and cannot do, the proposed research contributes in theoretical and practical ways to ensure and enhance replicability of scientific studies that rely on probabilistic reasoning and statistical analysis. The initial idea of this project stemmed from the PI's realization that in handling low-resolution information, the well-accepted Heitjan-Rubin framework for data coarsening in the literature of missing data induces essentially the same mathematical structure as does the Dempster-Shafer theory of belief function. Consequently, belief function can be understood and studied using ordinary probability. The proposed research explores this link and extensions to its variations, and aims to provide (1) a precise probabilistic formulation of belief function, which offers both insights and questions for the Dempster-Shafer theory, especially Dempster's Rule of Combination; (2) a detailed comparison and contrast of three learning rules for updating and propagating low-resolution information, especially with respect to the phenomena of dilation, contraction, and sure loss; and (3) an exploration of the design and implementation of efficient, MCMC-type algorithms for learning rules of low-resolution inference, in parallel to MCMC for Bayesian inference. The overarching goal of the proposed research is to enhance the scientists' and statisticians' toolkit for conducting more objective inference and data analysis.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Type
Standard Grant (Standard)
Application #
1812063
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2018-07-01
Budget End
2021-06-30
Support Year
Fiscal Year
2018
Total Cost
$199,928
Indirect Cost
Name
Harvard University
Department
Type
DUNS #
City
Cambridge
State
MA
Country
United States
Zip Code
02138