The researchers propose to create a center of data science for improved decision-making that combines expertise from computer science, information science, mathematics, operations research, and statistics. Their goal is to pursue basic research that will contribute to the theoretical foundations of data science. The research topics chosen have applications that can benefit society as a whole and integrate the perspectives of the disciplines that the project brings together. The five concrete research directions proposed are: Privacy and Fairness, Learning on Social Graphs, Learning to Intervene, Uncertainty Quantification, and Deep Learning. The aim of the Center is to advance knowledge in these areas and to broaden the range of disciplines and perspectives that can provide contributions to these challenging issues. The researchers plan to incorporate the community beyond Cornell through online seminars, workshops, and student conferences.

The research findings will provide an urgently needed foundation for data science in several topic areas of importance to society. As the center is placed at the intersection of multiple disciplines, the intellectual merit spans all disciplines involved and findings may translate to new algorithms and approaches in each one of them.

The research focus spans five core areas.

1. Privacy and Fairness. As data science becomes pervasive across many areas of society, and as it is increasingly used to aid decision-making in sensitive domains, it becomes crucial to protect individuals by guaranteeing privacy and fairness. The investigators propose to research the theoretical foundations to providing such guarantees and to surface inherent limitations.

2. Learning on Social Graphs. Many of the fundamental questions in applying data science to the interactions between individuals and larger social systems involve the social networks that underpin the connections between individuals. The researchers will develop new techniques for understanding both the structure of these networks and the processes that take place within them.

3. Learning to Intervene. Data-driven approaches to learning good interventions (including policies, recommendations, and treatments) inspire challenging questions about the foundations of sequential experimental design, counterfactual reasoning, and causal inference.

4. Uncertainty Quantification. Quantifying uncertainty about specific predictions or conclusions represents a key need in data science, especially when applied to decision-making with potential consequences to human subjects. The researchers will develop statistical tools and theoretical guarantees to assess the uncertainties of predictions made by popular algorithms in data science.

5. Deep Learning. Deep Learning algorithms have made impressive advances in practical settings. Although their basic building blocks are well understood, there is still ambiguity about what they learn and why they generalize so well. There are indications that they may learn data manifolds and that the type of optimization algorithm influences generalization.

Advances in our theoretical understanding of these phenomena requires combined efforts from optimization, statistics, and mathematics but could lead to insights for all aspects of data science.

Funds for the project come from CISE Computing and Communications Foundations, MPS Division of Mathematical Sciences, MPS Office of Multidisciplinary Activities, and Growing Convergent Research. (Convergence can be characterized as the deep integration of knowledge, techniques, and expertise from multiple fields to form new and expanded frameworks for addressing scientific and societal challenges and opportunities. This project promotes Convergence by bringing together communities representing many disciplines including mathematics, statistics, and theoretical computer science as well as engaging communities that apply data science to practical research problems.)

Project Start
Project End
Budget Start
2017-10-01
Budget End
2021-09-30
Support Year
Fiscal Year
2017
Total Cost
$1,496,655
Indirect Cost
Name
Cornell University
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14850