The growing field of data science promises to bring many benefits to society: personalized knowledge and services; improved healthcare; improved decision-making at individual, organizational, national, and international levels; a safer and possibly fairer society; and many others. The ability to realize these promises, however, depends critically on building the right foundational principles for the field. This project establishes an NSF TRIPODS Institute, termed the Penn Institute for Foundations of Data Science (PIFODS), at the University of Pennsylvania, with the goal of bringing together scientists and ideas from multiple disciplines, including computer science, electrical engineering, statistics, and mathematics, in order to collectively develop long-lasting principles for data science that can serve the field for decades to come. The main activities of the Institute will include transdisciplinary research, education and training, engagement with the broader research community through invited seminars and workshops, and engagement with applied scientists and practitioners.

The PIFODS team seeks to develop principles for the following five thrusts: principles for complex learning tasks; principles for efficient optimization (convex, non-convex, and submodular); principles for streaming, distributed, and massively parallel data analysis; principles for privacy-preserving and fairness-preserving data analysis; and principles for reproducible data analysis. Each of these thrusts addresses an important foundational need in data science. These needs range from designing learning algorithms with stronger performance guarantees, and developing principles for optimization in adaptive settings, to developing a fundamental understanding of the tradeoffs between various modern computational resources in data science, as well as developing data science algorithms that guarantee meaningful notions of privacy, fairness, and reproducibility. Each thrust requires interactions among several of the TRIPODS disciplines; several of these thrusts also naturally interact with each other. On the education and training side, the PIFODS team has already initiated several new transdisciplinary courses related to data science that are aimed at developing a common language across disciplines; under the aegis of the Institute, the team will continue to further develop and refine these courses, and will incorporate feedback from these courses to inform the university's emerging transdisciplinary data science curriculum. On the applications side, the PIFODS team will actively engage with applied scientists and practitioners of data science, including both members of the broader university community and selected industry practitioners; these engagements will both help to inform possible additional research thrusts in the future, and help to solve important data-driven problems in society.

This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Computer and Communication Foundations (CCF)
Application #
Program Officer
Tracy Kimbrel
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
United States
Zip Code