The project focuses on the development of methods of inference for the analysis of time series and random fields that do not rely on unrealistic or unverifiable model assumptions. In particular, the investigator and his colleagues are working on: (a) extending the range of applicability of the AR-sieve bootstrap beyond the setting of linear time series; (b) devising a new Time-Frequency bootstrap procedure in which bootstrap pseudo-series are generated in the time domain although the resampling happens in the frequency domain; (c) devising a residual bootstrap scheme with larger resample size to be used for improved density estimation from time series data; (d) constructing an automatic method of efficient aggregation of spectral density estimators; (e) testing for the support of a density, as well as testing for overdifferencing and estimating the spectral density at a vanishing point; (f) devising an improved block bootstrap procedure to handle time series that are periodically or almost periodically correlated; (g) resampling and inference for locally stationary time series and inhomogeneous (but locally homogeneous) marked point processes; and (h) investigating different aspects of resampling with functional data, including the difficult problem of appropriately studentizing a functional statistic.
Ever since the fundamental recognition of the potential role of the computer in modern statistics, the bootstrap and other computer-intensive statistical methods have been developed extensively for inference with independent data. Such methods are even more important in the context of dependent data where the distribution theory for estimators and tests statistics may be difficult or impractical to obtain. Furthermore, the recent information explosion has resulted in data sets of unprecedented size that call for flexible, nonparametric, computer-intensive methods of data analysis. Time series analysis in particular is vital in many diverse scientific disciplines, e.g., in economics, engineering, acoustics, geostatistics, biostatistics, medicine, ecology, forestry, seismology, and meteorology. As a consequence of the proposal's development of efficient and robust methods for the statistical analysis of dependent data, more accurate and reliable inferences may be drawn from data sets of practical import resulting into appreciable benefits to society. Examples include data from meteorology/atmospheric science, such as climate data, economics, such as stock market returns, medicine, such as EEG data, and bioinformatics, such as genomic data.
Starting at the latter part of the 20th century, statisticians have been gradually moving away from parametric models that often rely on restrictive and/or unreliable assumptions, and going towards nonparametric models that are more flexible. Bootstrap methods---also known as `resampling’--have been instrumental in that respect since they provide practitioners with a general way to conduct statistical inference in the nonparametric context (e.g. hypothesis tests and confidence intervals) thus effectively replacing R.A. Fisher’s Maximum Likelihood inference that is only valid under a narrowly specified parametric model. The computer-intensive methodology of Model-Free prediction was developed under this project, and is expected to have an impact on the field as it pushes the nonparametric envelope a bit further. In the important setting of nonparametric regression, the Model-Free paradigm shows that an additive model is not needed in order to conduct statistical inference, i.e., estimation, prediction, confidence intervals, etc.. Hence, the practitioner can proceed with nonparametric inference under great generality without worrying about the validity of an assumed additive model (and without any apparent loss of accuracy). Until now, practitioners have been going to great lengths (via transformations and other means) to secure an additive model for their problem at hand---but some settings may defy such efforts. By contrast, the need of preliminary transformations and data preprocessing---a main stay of statistical practice for over a hundred years---is now rendered unnecessary and superfluous under the Model-Free paradigm. Another important class of problems involves time series data, i.e., observations obtained over time. Examples include data from meteorology/atmospheric science (e.g. climate data), economics (e.g. stock market returns), biostatistics and bioinformatics (e.g. fMRI data), etc. In this context, the PI and other researches have been devising different resampling algorithms with the purpose of capturing the dependence/correlation that this type of data invariably exhibit. One such new development is the linear process bootstrap proposed by the PI and collaborators in 2010. An older method is the autoregressive sieve bootstrap proposed 25 years ago. Both of these methods have been thought to be valid only for linear time series. Nevertheless, many time series of interest are well-known to be nonlinear; a prime example is financial returns data. In a breakthrough paper of 2011, the PI and co-authors were able to extend the applicability of the autoregressive sieve bootstrap and related methods to nonlinear time series; a similar extension of validity was also proven to hold for the linear process bootstrap. In addition, the PI and collaborators developed resampling methods that are applicable in other useful albeit complicated settings involving correlated data. One such setting has to do with time series that exhibit seasonal variation, e.g., annual variation due to the seasons, or daily variation; for example, consider a time series of a city’s electricity demand measured monthly (seasonal variation) or hourly (daily variation). Another interesting setting has to do with measurements that are not obtained at regular time intervals, e.g. daily; the setting of irregularly observed dependent data comes under the general heading of a `marked point process’. The PI and co-authors have recently proposed resampling methods for marked point processes that may also exhibit some degree of nonstationarity. The recent work on bootstrap methods for complicated dependent data (nonlinear/nonstationary time series, marked point processes, etc.) has rekindled the interest of practitioners in the subject. The PI recently co-organized a workshop on Bootstrap Methods for Time Series that took place in Copenhagen, Sept 8-10, 2013. In addition, the PI has accepted to be Guest Editor of the Journal of Time Series Analysis for a Special Issue devoted to Bootstrap Methods for Time Series; the Special Issue should be completed by late 2014. Finally, the project allowed for the training and involvement of graduate and undergraduate students through regular coursework, independent study, and research projects; over half a dozen students were involved in parts of this project during its 3.5 year duration.