Combinatorial Inference: Statistical Uncertainty Assessment for Discrete Structures

Lu, Junwei

Abstract

Discrete structures like networks, clusters and ranking are observed in many real-life applications including brain networks, protein clusters, and portfolio selection. This project will develop a unified statistical framework for inferring these unknown discrete structures from large-scale datasets in biology, neuroscience, and finance. The replicability crisis on whether statistical inference for scientific discoveries can be trusted has drawn attention from both the scientific community and the public. A new field, "combinatorial inference", developed by the PI aims to fill the gap between various existing methods designed for continuous quantities and the lack of valid reproducibility assessment for discrete ones. This project will also train the next generation of data scientists to acquire statistical and computational skills for large-scale scientific data analysis.

This project aims to develop a unified inference framework to assess uncertainty (e.g. constructing confidence intervals, testing hypotheses and controlling the false discovery rates) when inferring discrete structures include networks, hypergraphs, clustering and ranking. Three types of problems will be considered: (1) Testing the hypothesis that the discrete structure has certain combinatorial properties (e.g., the network is connected, or the graph is triangle-free); (2) Constructing confidence sets covering the discrete quantities with given significance level (e.g., the maximum degree of a graph, the elements belong to a same cluster, top k ranking items); (3) Screening discrete structures of interest (e.g., the cycles, hubs and cliques in a graph) with the false discovery rate control. Classical inference mainly focuses on testing hypotheses and constructing confidence intervals for continuous parameters where analytical methods and theory can be directly applied. On the other hand, exiting research on discrete structures in statistical models mainly focuses on estimation and lacks systematic inferential methods for uncertainty assessment. This project seeks to advance the frontiers of modern statistical inference in four directions: (a) Methodology: Developing efficient methods for hypothesis testing, confidence interval construction, and false discovery controlling procedures; (b) Theory: Developing new probabilistic tools including non-asymptotic concentration inequalities and limiting theory and universality phenomena for combinatorial random quantities; (c) Computation: Computationally efficient algorithms to implement the combinatorial inferential methods for large-scale statistical models; and (d) Fundamental limits: Developing new combinatorial information-theoretic lower bounds to justify the optimality of the proposed inferential methods.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Mathematical Sciences (DMS)
Application #: 1916211
Program Officer: Gabor Szekely

Project Start
Project End
Budget Start: 2019-08-15
Budget End: 2022-07-31
Support Year
Fiscal Year: 2019
Total Cost: $97,046
Indirect Cost

Combinatorial Inference: Statistical Uncertainty Assessment for Discrete Structures
Lu, Junwei
Harvard University, Cambridge, MA, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments