The research concerns novel statistical theory, models and methods for structured dynamic and spatio-dynamic multivariate processes. The scope includes theoretical developments of new classes of stochastic process models for dynamic covariancestructures in multi- and matrix-variate time series, including novel classes of stationary Markov processes for multivariate volatility modeling. Theoretical and applied developments include new approaches to sparsity modeling for increasingly high-dimensional, time-varying parameter stochastic systems, applying to dynamic regression, time-varying vector auto-regression, dynamic factor models and covariance volatility models. Additional research focuses on new classes of spatial lattice and spatially-varying random field models, coupled with time series processes to define flexible models of spatio-temporal models for increasingly high-resolution lattice data observed through time. The investigator develops Bayesian simulation-based statistical computation-- including GPU-based parallelized algorithms-- for model implementations, and cross-disciplinary applications in financial time series as well as studies in atmospheric and biomedical sciences.

Faced with increasingly high-dimensional data sets generated in studies of temporal and spatial systems, statistical science research aims to substantially advance the ability to represent, analyze and use mathematical models of increasing dimension, realism and complexity. The investigator develops mathematical and statistical modelling theory and associated simulation-based computational methods for a range of contexts, motivated in part by collaborative cross-disciplinary applications in areas of finance, atmospheric science and the neurosciences. Innovations in statistical research include: (i) new and improved models for describing and predicting change in time of the complex patterns of relationships among several or many time series-- such as financial indicators, or nano-technology based recordings of neural signals in brain imaging; (ii) new theory and methods for inducing sparsity-- i.e., controlling complexity-- of applied stochastic models, to enable scaling to increasingly high-dimensional problems, such as arise in high-resolution satellite imaging in atmospheric studies as well as large-scale financial time series;(iii) innovations in simulation techniques for statistical computing, including parallel desktop computing, to advance the ability to fit, explore and use models of increasing scale and complexity and with increasingly large data sets. With cross-disciplinary collaborators and students, the research advances core mathematical and statistical modeling theory and technology, and contributes new, refined and relevant approaches to modeling and dataanalysis in several specific applied contexts as well as generating methods for broader use.

Project Report

Problems of formal, accurate and reliable statistical analysis of increasingly high-dimensional, complex data measured or observed over time require new quantitative methods and mathematical models as well as computational techniques that can operate fast and effectively. This research has addressed these general issues in a number of specific contexts, developing new and refined statistical methods-- based on new concepts and then coupled with advanced computational ideas, including parallel computing-- resulting in improved and new statistical methods that are of broad use for future studies in many fields. Specific applied problems in areas including macro-economics, finance, atmospheric sciences and neuroscience have defined case studies to showcase the abilities of the new statistical models and methods, as well as evidencing advances in understanding of the applied science problems based on these new, refined statistical analyses of time series data sets. A core theme in this research has been the concept of statistical sparsity -- that is, mathematical models and accompanying statistical algorithms for complex data sets that are able to scale to the kinds of data sets we are increasingly faced with. Building on several new such concepts, this research has defined new models that are able to automatically reduce or expand model complexity adaptively, and dynamically in time, as data evolve. Other aspects of the application of sparsity concepts in dynamic time series environments have defined an ability to build large models of very many time series measured simultaneously by decoupling their analyses in parallel, and using novel statistical concepts to properly recouple for formal resulting inferences, forecasting and input to decision analyses. Among several additional themes in this research have been extensions of such modeling ideas to dynamic processes observed over spatial regions, where sparsity modeling of complex, and increasingly high-dimensional data sets are critical. Finally, the research has linked into high-dimensional data analysis in other contexts where time is not an issue, but scale and complexity of data is central. Collaborations with disciplinary scientists and junior researchers at all levels (students, and postdoctoral researchers) have been central in motivating new theory and methods research. Significant applications and advances have been made in several areas: in applied forecasting in macroeconomic policy time series studies; in financial time series and portfolio decision analysis; in dynamic network analysis in studies of brain connectivity data from complex neuroscience time series; in cellular systems biology based on high-resolution spatial-temporal imaging data sets; in atmospheric chemistry studies of integrating physical computer model simulation-forecasts of low-level atmospheric CO fluxes from ground-level sources with high-resolution satellite imagery data sets, and others. Graduate students and postdoctoral researchers engaged in this research-- directly and through collaborators-- have gained experience in both statistical disciplinary research methods as well as in collaborations across disciplinary boundaries. Aspects of the research have impacted-- both formally through course material, examples, etc., and informally-- on undergraduate and graduate teaching on topics related to multivariate statistical modeling, high-dimensional data, and dynamic/time series areas, and more broadly through multiple short-courses presented by the PI over the course of the research, as well as via many oral presentations, publications, and web-based dissemination that includes software for all new models and methods.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1106516
Program Officer
Gabor J. Szekely
Project Start
Project End
Budget Start
2011-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2011
Total Cost
$400,000
Indirect Cost
Name
Duke University
Department
Type
DUNS #
City
Durham
State
NC
Country
United States
Zip Code
27705