Data Science continues to have a transformative impact on Science and Engineering, and on society at large, by enabling evidence-based decision making, reducing costs and errors, and improving objectivity. The techniques and technologies of data science also have enormous potential for harm if they reinforce inequity or leak private information. As a result, sensitive datasets in the public and private sector are restricted from research use, slowing progress in those areas that have the most to gain: human services in the public sector. Furthermore, the misuse of data science techniques and technologies will disproportionately harm underrepresented groups across race, gender, physical ability, sexual orientation, education, and more. These data equity issues are pervasive, and represent an existential risk for the use of data-driven methods in science and engineering. This project will establish a Framework for Integrative Data Equity Systems (FIDES): an Institute for the study of systems that enable research on sensitive data while preventing misuse and misinterpretation.
FIDES will enable interdisciplinary community convergence around data equity systems, with an initial study in critical domains such as mobility, housing, education, economic indicators, and government transparency, leading to the development of a novel data analytics infrastructure that supports responsibility in integrative data science. Towards this goal, the project will address several technically challenging problems: (1) To be able to use data from multiple sources, risks related to privacy, bias, and the potential for misuse must be addressed. This project will develop principled methods for dataset processing to overcome these concerns. (2) Individual datasets are difficult to integrate for use in advanced multi-layer network models. This project considers methods to create pre-trained tensors over large collections of spatially and temporally coherent datasets, making them easier to incorporate while controlling for fairness and equity. (3) Any dataset or model must be equipped with sufficient information to determine fitness for use, communicate limitations, and describe underlying assumptions. This project will develop tools and techniques to produce "nutritional labels" for data and models, formalizing and standardizing ad hoc metadata approaches to provenance, specialized for equity issues. In addition to supporting methodological innovation in data science, the Institute will become a focal point for sharing expertise in data equity systems. It will do so by establishing interfaces for interaction between data science and domain experts to promote expertise development and sharing of best practices, and by consistently supporting efforts on diversity and equity.
This project is part of the National Science Foundation's Harnessing the Data Revolution Big Idea activity. The effort is jointly funded by the Office of Advanced Cyberinfrastructure.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.