The project focuses on the development of new theory and methodology of sequential multiple comparisons. It aims to develop cost-minimizing methods and supporting theory for conducting multiple statistical inferences sequentially. This includes testing multiple hypotheses, constructing sequences of simultaneous confidence sets, detecting changes in multiple channels, and making other sequential statistical decisions involving multiple parameters or multiple measurements. This study extends the recently obtained step-up and step-down procedures for multiple comparisons to sequential designs. It searches for optimal stopping rules that minimize the expected cost of the experiment while controlling for the false positive and false negative rates. The new methodology combines flexibility and cost-optimization of sequential procedures with the ability of modern statistical methods for multiple comparisons to control the familywise error rate and power. Proposed sequences of simultaneous confidence sets generalize the idea of repeated confidence intervals to the case of multiple parameters and achieve the desired overall confidence level. The new multiple hypothesis testing methodology is used for the derivation of sequential change-point detection algorithms sensitive to a change in any one or several parameters.

Deliverables of the project include a sound statistical methodology for designing multiple comparison experiments at the minimum expected cost. One of the main applications is in sequential clinical trials that are conducted to answer multiple questions, for example, about the efficacy and safety of the tested treatment. Cost-optimization of such medical studies ultimately results in the reduced cost of health care. The new change-point detection procedures allow simultaneous tracking of changes in multiple parameters, which is used for the timely discovery of epidemic and pre-epidemic patterns and bioterrorist attacks. Controlling for the rate of false alarms, proposed change-point detection schemes are aimed to minimize the expected detection delay ensuring prompt reaction to unexpected changes. Their application sheds light to a number of global questions. Is the economy (welfare, climate, environment) changing? In what way and what direction is it changing? When did the change begin? Does the change continue, or has the process stabilized? Proposed sequential statistical tools address these and other important questions that involve multiple statistical comparisons.

Project Report

The project resulted in a cost-efficient methodology and supporting theory of multiple sequential testing and other multiple statistical inferences on sequentially collected data. Procedures were designed for a general sequence of observed random vectors whose components, dependent or independent data streams, are parameterized by one or several parameters. Efficient statistical tools were developed to conduct simultaneous inferences about these parameters, satisfying desired accuracy conditions and optimizing the expected cost of the whole experiment under these constraints. The main outcome of the project is development of stepwise sequential procedures for testing multiple hypotheses that control both the Type I and Type II familywise error rates (FWER), the probabilities of at least one Type I or Type II error. Stepwise design of the developed methods allowed reduction of the expected sample size comparing with the commonly used Bonferroni procedures. Further, the new schemes showed a 15% to 30% reduction of the sample size versus non-sequential procedures with the same error rates. Application of this approach in clinical research translates into reduced cost of medical treatments, and consequently, it will reduce the cost of health care. Other applications of the new methods are anticipated in quality and process control, acceptance sampling, and genetics. Addressing the issue of a very large number of simultaneous tests that often occurs in genomics, epidemiology, security, computer science, communications, and other areas, the stepwise methodology was developed further and resulted in sequential procedures controlling generalized FWER, i.e. probabilities of at least k Type I errors and at least m Type II errors at the given levels α and β. The new methods are applied to testing multiple efficacy and safety endpoints, performing the DNA and protein sequence analysis, and so on. Controlling k-FWER at the given desired level is a weaker constraint than the standard FWER, and therefore, this condition can be satisfied by a smaller sample. The new results showed that this generalization of the classical notion of FWER resulted in substantial reduction of the expected sample size of multiple testing procedures. Special methods were developed for multistage and truncated group sequential experiments. These procedures are designed to control both FWER at specified levels α and β, complete the entire experiment within at most K stages, and minimize the total expected cost under these constraints. These procedures are particularly attractive for group sequential clinical trials. Asymptotically optimal rules were obtained for a number of situations. That is, the best rate of the expected sample size was derived, and stopping boundaries were calculated that achieve the best asymptotic rate under the constraints on FWER I and II. Among other results, it was proved that in a battery of tests, the optimal Type I and Type II error allocation distributes all the error probabilities among the most difficult tests, determined according to the closeness between the null and alternative parameters. Developed sequential methodology for multiple hypothesis testing was applied to sequential multi-channel change-point detection. It was assumed that a number of "sensors" simultaneously collect and report data. These can be public health surveillance data, border patrol monitoring traffic across the border, or simply smoke detectors in different locations. When a significant event occurs, the distribution of data in one or several sequences changes, and the goal is to detect such a change as soon as possible after it occurs, subject to the rate of false alarms. This problem is well studied in the case of one sensor, but it lacked adequate solutions when multiple sensors were involved. By analogy with hypothesis testing, the new change-point detection tools control the probability of a false alarm (analogue of Type I FWER), the probability of missing a change point (Type II error), and they minimize the mean delay (sampling cost) under these constraints. As a practical tool, a formula for the minimum number of sensors attaining the desired level of sensitivity under the given rate of false alarms was derived. Proposed methodologies are ready for the use of researchers, doctors, clinicians, managers, quality control engineers, and other professionals who conduct sequential studies that answer multiple questions with the required confidence under the minimum expected cost. Particular applications have been elaborated for the design and analysis of drug dependence clinical trials, identification of defects in integrated circuits, and early detection of epidemic trends. The project involved three doctoral students, of which two are graduating within the next year, and one is already employed on a tenure track faculty position in Statistics. The PI and his students presented their research to mixed audiences at multiple conventions. For the dissemination of new results among the interested researchers and practitioners, the PI organized special invited sessions on the related topics at the 2011 and 2013 International Workshops on Sequential Methodologies and the 2014 International Conference on Ordered Data Analysis, Models, and Health Research Methods.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1007775
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2010-06-01
Budget End
2014-05-31
Support Year
Fiscal Year
2010
Total Cost
$200,000
Indirect Cost
Name
University of Texas at Dallas
Department
Type
DUNS #
City
Richardson
State
TX
Country
United States
Zip Code
75080