This research project will develop a collaborative data science platform for computational social science called the Data Science Foundry. The collection and management of large-scale data currently is a relatively unstructured process, with data-processing decisions being made in an ad hoc fashion. Society has started to rely on data-driven science to address policy-related questions, however. The development of a collaborative platform that provides structure will allow social scientists to collaborate and validate each other's studies. This project has the potential to transform how studies are designed and how data will be processed. The collaborative platform will result in a higher level of trust in the studies conducted via the collaborative curation of study design, procedures, and validation. The collaborative platform also will increase the number of studies that can be done in a short span of time. The platform will be developed as open-source, thereby facilitating interactions with the community and enabling different institutions to install the program.

This project will develop a collaborative platform that social scientists can use to collaborate and validate each other's studies. The investigative team will attempt to identify the best possible collaborative model for data-driven social science, determine how automation can most enhance the studies, and develop explicit and implicit mechanisms to establish trust in end-to-end data processing pipelines and the results they generate. To aid in the platform's development, the research team will focus on the prediction of outcomes from surveys, a specific yet widely applicable type of problem within computational social science. This class of problems involves much subjective assessment during the feature engineering state as well as copious interpretation during the data transformation stage. These unique challenges will benefit both from a collaborative workflow and from mechanisms that enable trust in the eventual results. The project will bring together three distinct teams to develop this platform: computer scientists to develop abstractions, APIs and systems; statisticians to help with methods and study design; and social scientists to help define the problems and workflow and to provide user feedback.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Advanced CyberInfrastructure (ACI)
Standard Grant (Standard)
Application #
Program Officer
Cheryl Eavey
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Massachusetts Institute of Technology
United States
Zip Code