Project deals with the development of statistical theory and methodology for dealing with recurrent events occurring in many scientific fields; with complex systems characterized by dynamic changes incurred by the history of the component failures comprising the systems, such as load-sharing systems and networks; with the analysis of high-dimensional (HD) data sets such as microarray data sets in biology and medicine; and classification and prediction problems with HD predictors and many model classes. The development of the theory and methodology will use modern survival analysis techniques; coherent structure theory from reliability and engineering; Markov random fields and Gibbs measure; Neyman-Pearson paradigm, decision theory, and the Bayesian paradigm. Project goals are to obtain asymptotic properties of semiparametric estimators in the general class of dynamic recurrent event models of Pena and Hollander; to develop probabilistic models for complex systems incorporating component dependencies due to the load-sharing structure of the system; to devise and implement optimal decision-theoretic and Bayesian methodologies in multiple decision-making, classification, and prediction with HD data; and to provide training to students and junior faculty in performing statistical research.
Project results will be of high importance and impact since the class of dynamic recurrent events models considered is more realistic for complex systems, while those dealing with HD data will provide more efficient methodology for discovering relevant `genes.' Consequently, researchers and practitioners dealing with highly complex engineering systems will better predict system or component failures, thus leading to improved maintenance policies and decreased risk of catastrophic system failures; while those involved with HD data will improve discovery of important genes, with positive implications in chronic disease management and control. Finally, the training of graduate students and junior faculty in statistical research will benefit science since they are society's future researchers.
This NSF-supported research project dealt with three major topics: multiple-testing and decision-making; dynamic models for event-time analysis; and load-sharing models. A result pertaining to the control of the family-wise error rate (FWER) and false discovery rate (FDR) for a general class of multiple decision functions was obtained. A special case of this general result is the famous Benjamini-Hochberg FDR-controlling procedure. This result provides the potential of optimal multiple-decision functions that takes into consideration the receiver-operating characteristic curves of each of the individual decision functions. Compound p-value statistics were also introduced which are highly relevant in multiple testing and decision-making. Dynamic models which are also relevant for load-sharing models arising in the engineering and biomedical arenas were studied. These dynamic models take into consideration many facets of how complex systems are evolving over time in the sense that failure intensities utilize the history or evolution of the system. Asymptotic properties of estimators for a general class of dynamic models were obtained. Results have the potential of being extended to settings with complicated recurrent events (for instance, competing recurrent risks) and also complex load-sharing systems. This NSF grant was also extremely beneficial to graduate students that were supported under this project. One completed his PhD degree and is now a post-doctoral fellow at Emory University and others are continuing work on their dissertations. The newly-minted PhD, who was supported for three years by this NSF grant, worked in the area of Bayesian inference for complex systems as well as in load-sharing models. In essence the grant significantly advanced human resources needs in the statistical sciences by developing future researchers in the mathematical and statistical sciences. This NSF-supported project also led to several publications, results of which will be important in advancing models and inference methods for complex systems, and is a significant contribution to the infrastructure of science as the results could have wide-ranging utilities in other scientific areas, not only the statistical sciences.