Data-driven discoveries are permeating critical fabrics of society. Unreliable discoveries lead to decisions that can have far-reaching and catastrophic consequences on society, defense, and the individual. Thus, the dependability of data-science lifecycles that produce discoveries and decisions is a critical issue that requires a new holistic view and formal foundations. This project will establish the Dependable Data Driven Discovery (D4) Institute at Iowa State University that will advance foundational research on ensuring that data-driven discoveries are of high quality. The activities of the D4 Institute will have a transformative impact on the dependability of data-science lifecycles. First, the problem definition itself will have a significant impact by helping future innovations beyond academia. While the notion of dependability is well-studied in the computer-systems literature, challenges in data science push the boundary of existing knowledge into the unknown. This institute's work will define D4, and increase data science's benefit to society by providing a transformative theory of D4. The second impact will come from the process of shared vocabulary development facilitated by this institute, and its result that would encourage experts across TRIPODS disciplines and domain experts to collaborate on common goals and challenges. Third, the institute will set research directions for D4 by providing funding for foundational research, which will have a separate set of impacts. Fourth, the institute will facilitate transdisciplinary training of a diverse cadre of data scientists through activities such as the Midwest Big Data Summer School and the D4 workshop.

The project will advance the theoretical foundations of data science by fostering foundational research to enable understanding of the risks to the dependability of data-science lifecycles, to formalize the rigorous mathematical basis of the measures of dependability for data science lifecycles, and to identify mechanisms to create dependable data-science lifecycles. The project defines a risk to be a cause that can lead to failures in data-driven discovery, and the processes that plan for, acquire, manage, analyze, and infer from data collectively as the data-science lifecycle. For instance, an inference procedure that is significantly expensive can deliver late information to a human operator facing a deadline (complexity as a risk); if the data-science lifecycle provides a recommendation without an uncertainty measure for the recommendation, a human operator has no means to determine whether to trust the recommendation (uncertainty as a risk). Compared to recent works that have focused on fairness, accountability, and trustworthiness issues for machine learning algorithms, this project will take a holistic perspective and consider the entire data-science lifecycle. In phase I of the project the investigators will focus on four measures: complexity, resource constraints, uncertainty, and data freshness. In developing a framework to study these measures, this work will prepare the investigators to scale up their activities to other measures in phase II as well as to address larger portions of the data-science lifecycle. The study of each measure brings about foundational challenges that will require expertise from multiple TRIPODS disciplines to address.

This project is jointly funded by HDR TRIPODS and the Established Program to Stimulate Competitive Research (EPSCoR).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Computer and Communication Foundations (CCF)
Application #
1934884
Program Officer
Tracy Kimbrel
Project Start
Project End
Budget Start
2019-10-01
Budget End
2022-09-30
Support Year
Fiscal Year
2019
Total Cost
$1,031,999
Indirect Cost
Name
Iowa State University
Department
Type
DUNS #
City
Ames
State
IA
Country
United States
Zip Code
50011