A massive study of data science to address the scientific reproducibility crisis

Leek, Jeffrey

Abstract

There is a crisis of reproducibility and replicability of scienti?c results. This crisis is an increasing source of concern both in the scienti?c and poplar press. The crisis is so acute that the United States Congress is currently investigating reproducibility of the scienti?c process. At the heart of the crisis is a shortage of data analytc skill throughout the scienti?c enterprise. There is an emerging consensus that the best way to address the crisis is to increase data analytic training, particularly around reproducibility and replicability. In this application we (1) propose the ?rst formal statistical model for reproduciility and replicability and then use data and experiments from the largest massive online open program in data science in the world to (2) perform randomized studies to improve our knowledge about which statistical methods and protocols lead to increased reproducibility and replicability in the hands of average users and (3) to analyze learner, course, and content characteristics that increase learner success and throughput to increase the number of trained data analysts worldwide. To accomplish goals (2) and (3) we will use the largest and highest throughput data science program in the world: the Johns Hopkins Data Science Specialization. This specialization, developed by the investigators of this project, consists of nine courses that are offered every month. Since the launch of this program in April 2014, these classes have seen more than two million enrollments and nearly all their experiences have been recorded as data. Furthermore, the MOOC platform for this series permits random assignment of quiz questions and content. We will disseminate our results through open source software, analysis protocols, our popular blog, and the Data Science Specialization to maximally improve data science training and reduce the scienti?c replication and reproducibility problem. The size of ths program means that by increasing quality of the program and the number of completing students by even a small percentage we can affect global data analytic behavior.

Public Health Relevance

Many scienti?c results cannot be replicated or reproduced. One reason for this crisis is a shortage in the quantity and quality of trained data analysts acros all medical and scienti?c areas. We propose to de?ne a formal statistical model for reproducibility and replicability, then use the world's largest data science program to identify statistical methods and data analyst characteristics that improve scienti?c reproducibility and replication.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 1R01GM115440-01A1
Application #: 9100338
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Marcus, Stephen

Project Start: 2016-04-01
Project End: 2020-03-31
Budget Start: 2016-04-01
Budget End: 2017-03-31
Support Year: 1
Fiscal Year: 2016
Total Cost
Indirect Cost

Institution

Name: Johns Hopkins University
Department: Biostatistics & Other Math Sci
Type: Schools of Public Health
DUNS #: 001910777

City: Baltimore
State: MD
Country: United States
Zip Code: 21205

Related projects


NIH 2019 R01 GM	A massive study of data science to address the scientific reproducibility crisis Leek, Jeffrey T. / Johns Hopkins University
NIH 2018 R01 GM	A massive study of data science to address the scientific reproducibility crisis Leek, Jeffrey T. / Johns Hopkins University
NIH 2017 R01 GM	A massive study of data science to address the scientific reproducibility crisis Leek, Jeffrey T. / Johns Hopkins University	$328,050
NIH 2016 R01 GM	A massive study of data science to address the scientific reproducibility crisis Leek, Jeffrey T. / Johns Hopkins University

Publications

Patil, Prasad; Peng, Roger D; Leek, Jeffrey T (2016) What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science. Perspect Psychol Sci 11:539-44

Comments

Be the first to comment on Jeffrey Leek's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: