The emerging era of big data has brought with it new unique challenges in both research and training in Statistics. For the new types of statistical problems researchers now aim to solve, the size of available data has grown immensely in many cases, and the nature of the data has changed no less dramatically. Statisticians now work routinely with data that combine many different kinds of observations, from genetic data to brain images to smartphone data. This creates a need for new training approaches and their close integration with current research directions, so that PhD students and postdocs are prepared to take on new challenges as they become independent researchers. It also creates an opportunity for recruiting undergraduates into the field, increasing and diversifying the domestic STEM workforce. This project will train undergraduate and graduate students and postdocs in modern techniques for dynamic big data with complex structures, in modern teaching methods for statistics, and provide mentoring on all aspects of professional development.
This project brings together three interlinked research streams: (1) statistical network analysis, (2) inference for dynamic systems, and (3) sequential decision making. This project will contribute to each of these areas, developing (1) realistic models for network community detection, link prediction and dynamically evolving networks, and tools for utilizing network connections to improve prediction of outcomes of interest on network-linked data; (2) practical algorithms with provably good properties for fitting complex partially observed Markov process models, with an emphasis on scalability; (3) sequential decision making algorithms based on reinforcement learning, with the goal of achieving excellent prediction performance and discovering interpretable decision variables. Each research stream will offer a short intensive graduate course and a regular interdisciplinary student workshop. Equally importantly, the streams will collaborate on topics that cut across these areas, such as inference for dynamically evolving networks or the role of social connections in predicting behavior and their impact on sequential decision making. Training undergraduates, PhD students, and postdocs in topics at the cutting edge of modern statistics will contribute to supplying much-needed statisticians and data scientists to both academia and industry, increasing and diversifying the STEM workforce. All three research streams have broad applications to areas beyond Statistics, such as neuroimaging, infectious disease transmission, and mobile health interventions. The project is thus expected to have wide-ranging impact on how the problems statisticians study are approached by domain scientists.