It is common in the technological and pharmaceutical industries to test a large sequences of hypotheses over time. As an example in the latter case, suppose a lab is trying to develop a cure for a disease like Alzheimer's. This is a complex disease for which it is unlikely to find a single cure that works for everyone. It is much more likely that research on the drug will continue for years, if not decades, and every few months a new drug may be tested for its efficacy using a clinical trial. When we are testing whether a particular drug is any better than a placebo, we have no idea how many more drugs (hypotheses) we will test in the future, but we do know the results of the earlier tests. This is the setup considered by online multiple hypothesis testing, the topic of this project --- a large sequence of hypotheses are tested over time in an online fashion, and we would like to ensure that there are not too many false discoveries in this process just due to chance. A false discovery results not just in false hopes, but in millions of wasted dollars in follow up clinical trials, and possibly worse outcomes for patients. This project aims to develop novel methodology to test such a sequence of hypotheses so that certain common error metrics are controlled at any time. The training component for undergraduate and graduate students will prepare new researchers with inter-disciplinary education via the planned cross-disciplinary tutorials/workshops, and outreach to K-12 students.

The methodology in offline multiple testing is rich, with a plethora of methods that control a wide variety of error metrics, and in fact the PI has contributed significantly to the literature recently. In contrast, the online multiple testing literature is less developed. This grant takes a holistic and comprehensive approach, that will result in new methods for a whole spectrum of error metrics: global null testing, family wise error rate, false discovery rate, false coverage rate, and simultaneous control of the false discovery proportion. The PI already has preliminary work on some of these fronts. We will also develop a public software package in R along with associated documentation to enable the easier assimilation and application of these methods. All methods will be accompanied by rigorous theoretical guarantees, is would be desirable in the aforementioned pharmaceutical application.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1945266
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2020-07-01
Budget End
2025-06-30
Support Year
Fiscal Year
2019
Total Cost
$102,483
Indirect Cost
Name
Carnegie-Mellon University
Department
Type
DUNS #
City
Pittsburgh
State
PA
Country
United States
Zip Code
15213