Big data such as those arising from sequencing, imaging, genomics and other emerging technologies are playing a critical role in modern biology and medicine. The generation of hypotheses about biological processes and disease mechanisms is now increasingly being driven by the production and analysis of large and complex datasets. Advanced computational methods have been developed for the robust analysis of these datasets, and the growth in number and sophistication of these methods has closely tracked the growth in volume and complexity of biomedical data. In such a crowded environment of diverse computational methods and data, it is difficult to judge how generalizable the performance of these methods is from one setting to another. Crowdsourcing-based scientific competitions, or challenges, have now become popular mechanisms for the rigorous, blinded and unbiased evaluation of the performance of these methods and the identification of best-performing methods for biomedical problems. However, despite the benefits of these challenges to the biomedical research enterprise, the impact of their findings has been remarkably limited in laboratory and clinical settings. This is likely due to two important aspects of current challenges: (i) their over-emphasis on identifying the best solutions rather than tryig to comprehensively assimilate the knowledge embedded in all the submitted solutions, and (ii) the absence of a stable channel of communication and collaboration between problem and solution providers due to a lack of sufficient incentives to do so.
The aim of this project is to boost the translational impact of scientific challenges through a combination of novel machine learning methods, development of novel scalable software and unique collaborations with disease experts to ensure the effective translation of knowledge accrued in challenges to real clinical settings and practice. These novel methods and software are designed to effectively assimilate the knowledge embedded in all the submissions to challenges into ensemble solutions. In a first of its kind effort, the ensemble solutions derived from disease-focused challenges under the DREAM project will be brought directly to scientists and clinicians that are experts in these disease areas. Initial effort in this project will focus on active DREAM challenges aiming at the accurate prediction of drug response and clinical outcomes respectively in Rheumatoid Arthritis (RA) and Acute Myeloid Leukemia (AML). Both these diseases are difficult to treat and thus they pose major medical and public health concerns. In collaboration with RA and AML experts, the ensemble solutions learnt in these challenges will be validated in independent patient cohorts and carefully designed clinical studies. This second-level validation is essential to judge the clinical applicability of any method, but is rarely done As the methodology is general, similar efforts will be made for other diseases in later stages of the project. Overall, using a smart combination of crowdsourcing-based challenges and computational methods and software, we aim to demonstrate a unique pathway for studying and treating disease by truly leveraging the wisdom of the crowds.

Public Health Relevance

Crowdsourcing-based scientific competitions, or challenges, have become a popular mechanism to identify innovative solutions to complex biomedical problems. However, the collective effort of all the challenge participants has been under utilized, and the overall impact on actual clinical and laboratory practice has been remarkably limited. Using novel computational methods and novel 'big data'-friendly software implementation, we plan to demonstrate how biomedical challenges, combined with our approach, can influence clinical practice in Acute Myeloid Leukemia and Rheumatoid Arthritis, as well as rigorously validate our approach.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Icahn School of Medicine at Mount Sinai
Schools of Medicine
New York
United States
Zip Code
El Naqa, Issam; Pandey, Gaurav; Aerts, Hugo et al. (2018) Radiation Therapy Outcomes Models in the Era of Radiomics and Radiogenomics: Uncertainties and Validation. Int J Radiat Oncol Biol Phys 102:1070-1073
Pandey, Gaurav; Pandey, Om P; Rogers, Angela J et al. (2018) A Nasal Brush-based Classifier of Asthma Identified by Machine Learning Analysis of Nasal RNA Sequence Data. Sci Rep 8:8826
Wang, Linhua; Law, Jeffrey; Kale, Shiv D et al. (2018) Large-scale protein function prediction using heterogeneous ensembles. F1000Res 7:
Fourati, Slim; Talla, Aarthi; Mahmoudian, Mehrad et al. (2018) A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection. Nat Commun 9:4418
Carcamo-Orive, Ivan; Hoffman, Gabriel E; Cundiff, Paige et al. (2017) Analysis of Transcriptional Variability in a Large Human iPSC Library Reveals Genetic and Non-genetic Determinants of Heterogeneity. Cell Stem Cell 20:518-532.e9
Stingone, Jeanette A; Pandey, Om P; Claudio, Luz et al. (2017) Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among U.S. children. Environ Pollut 230:730-740
Sieberts, Solveig K; Zhu, Fan; García-García, Javier et al. (2016) Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis. Nat Commun 7:12460
Evrard, Solene M; Lecce, Laura; Michelis, Katherine C et al. (2016) Endothelial to mesenchymal transition is common in atherosclerotic lesions and is associated with plaque instability. Nat Commun 7:11853
Margolies, Laurie R; Pandey, Gaurav; Horowitz, Eliot R et al. (2016) Breast Imaging in the Era of Big Data: Structured Reporting and Data Mining. AJR Am J Roentgenol 206:259-64

Showing the most recent 10 out of 12 publications