The advent of high-dimensional time series from neuroscience, including EEG/MEG, fMRI and spike train data, has sparked a new interest in the analysis of multivariate time series data, particularly, to decipher the dynamics of brain connectivity networks. Despite significant recent progress, the vast majority of existing approaches for analyzing high-dimensional time series focus on real-valued time series from Gaussian noise and perturbation models. However, emerging applications in neuroscience involve discrete-valued time series, such as point processes and categorical observations. This project aims to develop flexible and scalable statistical machine learning methods and efficient software tools for inferring brain connectivity networks using discrete-valued high-dimensional time series data from neuroscience.
Large-scale brain connectivity networks often involve complex nonlinear and multi-scale interactions that are usually unknown in practice. Applications of parametric models in such settings may not provide an accurate window into the brain's dynamics, especially if the model assumptions are violated. This research bridges this gap by developing scalable statistical machine learning methods and theory for flexible nonparametric analysis of high-dimensional discrete-valued time series. In particular, this project will develop (i) clustering and variable screening methods for high-dimensional point processes, (ii) an efficient and general nonparametric estimation framework for network discovery from a general class of point processes, and (iii) a novel regularized estimation framework with provable identifiablity guarantees for network reconstruction from high-dimensional categorical time series. Theoretical properties of these methods will be investigated, and efficient open-source software tools will be developed to facilitate the application of the methods by the scientific community. Together, these tools provide a comprehensive framework for analysis of high-dimensional discrete values time series arising in various neuroscience applications, and will advance the current state of statistical machine learning methods for the analysis of high-dimensional time series. The PIs also plan to release the software developed as open source and build a user community around the language by ensuring that interested researchers are able to contribute to the codebase of the software developed. This will allow a wider growth of the project. This aspect is of special interest to the software cluster in the Office of Advanced Cyberinfrastructure, which has provided co-funding for this award.