Science faces significant challenges in relation to replicability of studies. These challenges affect prediction models used in a broad spectrum of business, scientific, and social activities. The investigators have identified underutilized opportunities to make most prediction modeling techniques more likely to produce replicable results by training them on multiple studies, and rewarding good replicability in this training phase. Recent work indicates that this novel and general strategy provides insight into the replicability of predictions, and is a promising venue for systematic improvement. As many areas of science and technology are becoming data-rich, multiple datasets are more commonly available for training, and it is also more important that they be simultaneously considered and systematically used for improving replicability. Steps towards more easily replicable predictions would increase public confidence in the scientific process, facilitate dissemination of results, and robustify public engagement with science and technology.

The goal of this project is to make progress in the area of cross-study replication of predictions. The investigators have identified two fundamental and underutilized opportunities: 1) to train on multiple studies; 2) to leverage ensembles of prediction models, each trained on one, or a subset, of the studies. The combination of these two elements can be used to design robust prediction algorithms that are trained to incorporate replicability across different contexts and populations. In this project, the investigators propose to implement and evaluate specific prediction techniques within this paradigm; to investigate their statistical properties theoretically and empirically; to compare them to existing alternative multi-study statistical methods; and to build free, open-source software to implement the successful strategies.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
1810829
Program Officer
Gabor Szekely
Project Start
Project End
Budget Start
2018-08-01
Budget End
2021-07-31
Support Year
Fiscal Year
2018
Total Cost
$350,000
Indirect Cost
Name
Dana-Farber Cancer Institute
Department
Type
DUNS #
City
Boston
State
MA
Country
United States
Zip Code
02215