This project creates an institute at Brown University that brings together the disciplines of mathematics, statistics, and theoretical computer science to define and refine the foundational landscape of the emerging area of data science. The institute sponsors focused activities organized by small groups of researchers that cut across disciplinary boundaries. It connects with and informs a broad range of students at the undergraduate and graduate levels, working in a wide area of domain areas, from neuroscience, to genomics, to climate modeling, to public policy. Theoretical developments can improve diagnostic imaging and tumor classification, can develop improved models for neural structure, and can even inform findings regarding food stamps and recidivism in Rhode Island.
The mission of the institute is to foster development and principled application of theory and methods of big data to discover, refine, and validate underlying theoretical models that govern a system or data-generating process, which in turn improve predictions of new outcomes. Scientific projects in "causal and model-based inference," "data analysis on massive networks," and "geometric and topological methods to analyze and visualize complex data" drive home the role of the model, and its continuous refinement, in data analysis. Rather than seeking better "black-boxes" for analysis, the institute will emphasize the role of the "investigator-in-the-loop" interrogating the entirety of the data pipeline, seeking theoretical improvements and implications. It connects to the Brown Data Science Initiative and the Institute for Computational and Experimental Research in Mathematics (ICERM). Funds for the project come from CISE Computing and Communications Foundations, MPS Division of Mathematical Sciences, Growing Convergent Research, and EPSCoR. (Convergence can be characterized as the deep integration of knowledge, techniques, and expertise from multiple fields to form new and expanded frameworks for addressing scientific and societal challenges and opportunities. This project promotes Convergence by bringing together communities representing many disciplines including mathematics, statistics, and theoretical computer science as well as engaging communities that apply data science to practical research problems.)