Large scale sequential testing and estimation is now a daily task in the tech industry, with large internet companies running hundreds or thousands of experiments (sometimes called A/B tests) per week to understand their customer base preferences, in order to improve product performance and user experience. Such experiments are inherently sequential: visitors arrive in a stream and outcomes are typically observed quickly relative to the duration of the test. These experiments are loosely planned: there are few hurdles to starting a new test, and little oversight on how they are run, unlike clinical trials which are heavily regulated with clear formal planning due to the involvement of statisticians from the start. The experiments are also continuously monitored, and adaptive choices are made of whether to stop early and make conclusions, or to collect more data. In other words, sample sizes and budgets are rarely fixed in stone in advance and there is plenty of flexibility at the hands of the data scientist who is running the experiment. Such situations are common in the sciences as well, with either telescopes collecting astronomical data sequentially (and perhaps testing for presence of black holes or estimating sizes of galaxies), or psychologists collecting human subject data sequentially (and analyzing effect sizes along the way). However, a major drawback of such flexible, loosely planned, sequential experimentation with fluid decision making, is that it is very nontrivial to provide correct inferential guarantees, either along the way or when the experiment is terminated. Traditional confidence intervals and p-values, the bread and butter of classical statistics, are designed for fixed sample sizes, and can only be used once at that predetermined time. Using the standard CIs repeatedly at different sample sizes, or after adaptively stopping, without any correction to account for the multiple intervals constructed, completely invalidates their guarantees, leading to an increase in erroneous conclusions. The graduate student support will be used for research on sequential analysis and concentration inequalities.
The PI proposes to revisit a classical notion called a "confidence sequence" by Darling and Robbins (1967), which is a (potentially infinite) sequence of confidence intervals that is, with high probability, simultaneously valid over all times. Due to the simultaneous guarantee, an analyst may keep peeking at the data and the constructed confidence sequence, adaptively choosing to stop collecting data or to collect more, and still have correct inferential guarantees through the process including when it stops. These can be converted to always-valid p-values, that are also valid at arbitrary stopping times. Using modern martingale techniques, we have recently been able to generalize prior constructions of confidence sequences to several novel nonparametric settings, yielding both the tightest known closed-form CS expressions as well as the sharpest numerical methods in practice. This project seeks to extend the scope of the above advances both theoretically and practically. A few examples of extensions that the PI wishes to pursue include designing new confidence sequences for vector-valued mean vectors, and fully empirical bounds that do not depend on unknown parameters. We will also explore applications of these bounds to sequential testing and estimation tasks.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.