The emerging era of big data has brought with it new unique challenges in both research and training in Statistics. For the new types of statistical problems researchers now aim to solve, the size of available data has grown immensely in many cases, and the nature of the data has changed no less dramatically. Statisticians now work routinely with data that combine many different kinds of observations, from genetic data to brain images to smartphone data. This creates a need for new training approaches and their close integration with current research directions, so that PhD students and postdocs are prepared to take on new challenges as they become independent researchers. It also creates an opportunity for recruiting undergraduates into the field, increasing and diversifying the domestic STEM workforce. This project will train undergraduate and graduate students and postdocs in modern techniques for dynamic big data with complex structures, in modern teaching methods for statistics, and provide mentoring on all aspects of professional development.

This project brings together three interlinked research streams: (1) statistical network analysis, (2) inference for dynamic systems, and (3) sequential decision making. This project will contribute to each of these areas, developing (1) realistic models for network community detection, link prediction and dynamically evolving networks, and tools for utilizing network connections to improve prediction of outcomes of interest on network-linked data; (2) practical algorithms with provably good properties for fitting complex partially observed Markov process models, with an emphasis on scalability; (3) sequential decision making algorithms based on reinforcement learning, with the goal of achieving excellent prediction performance and discovering interpretable decision variables. Each research stream will offer a short intensive graduate course and a regular interdisciplinary student workshop. Equally importantly, the streams will collaborate on topics that cut across these areas, such as inference for dynamically evolving networks or the role of social connections in predicting behavior and their impact on sequential decision making. Training undergraduates, PhD students, and postdocs in topics at the cutting edge of modern statistics will contribute to supplying much-needed statisticians and data scientists to both academia and industry, increasing and diversifying the STEM workforce. All three research streams have broad applications to areas beyond Statistics, such as neuroimaging, infectious disease transmission, and mobile health interventions. The project is thus expected to have wide-ranging impact on how the problems statisticians study are approached by domain scientists.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1646108
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2017-09-01
Budget End
2022-08-31
Support Year
Fiscal Year
2016
Total Cost
$2,000,000
Indirect Cost
Name
Regents of the University of Michigan - Ann Arbor
Department
Type
DUNS #
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109