Crowd-Assisted Deep Learning (CrADLe) Digital Curation to Translate Big Data into Precision Medicine

Hadley, Dexter

Abstract

The NIH and other agencies are funding high-throughput genomics (?omics) experiments that deposit digital samples of data into the public domain at breakneck speeds. This high-quality data measures the ?omics of diseases, drugs, cell lines, model organisms, etc. across the complete gamut of experimental factors and conditions. The importance of these digital samples of data is further illustrated in linked peer-reviewed publications that demonstrate its scientific value. However, meta-data for digital samples is recorded as free text without biocuration necessary for in-depth downstream scientific inquiry. Deep learning is revolutionary machine intelligence paradigm that allows for an algorithm to program itself thereby removing the need to explicitly specify rules or logic. Whereas physicians / scientists once needed to first understand a problem to program computers to solve it, deep learning algorithms optimally tune themselves to solve problems. Given enough example data to train on, deep learning machine intelligence outperform humans on a variety of tasks. Today, deep learning is state-of-the-art performance for image classification, and, most importantly for this proposal, for natural language processing. This proposal is about engineering Crowd Assisted Deep Learning (CrADLe) machine intelligence to rapidly scale the digital curation of public digital samples. We will first use our NIH BD2K-funded Search Tag Analyze Resource for Gene Expression Omnibus (STARGEO.org) to crowd-source human annotation of open digital samples. We will then develop and train deep learning algorithms for STARGEO digital curation based on learning the associated free text meta-data each digital sample. Given the ongoing deluge of biomedical data in the public domain, CrADLe may perhaps be the only way to scale the digital curation towards a precision medicine ideal. Finally, we will demonstrate the biological utility to leverage CrADLe for digital curation with two large- scale and independent molecular datasets in: 1) The Cancer Genome Atlas (TCGA), and 2) The Accelerating Medicines Partnership-Alzheimer?s Disease (AMP-AD). We posit that CrADLe digital curation of open samples will augment these two distinct disease projects with a host big data to fuel the discovery of potential biomarker and gene targets. Therefore, successful funding and completion of this work may greatly reduce the burden of disease on patients by enhancing the efficiency and effectiveness of digital curation for biomedical big data.

Public Health Relevance

This proposal is about engineering Crowd Assisted Deep Learning (CrADLe) machine intelligence to rapidly scale the digital curation of public digital samples and directly translating this ?omics data into useful biological inference. We will first use our NIH BD2K-funded Search Tag Analyze Resource for Gene Expression Omnibus (STARGEO.org) to crowd-source human annotation of open digital samples on which we will develop and train deep learning algorithms for STARGEO digital curation of free-text sample-level metadata. Given the ongoing deluge of biomedical data in the public domain, CrADLe may perhaps be the only way to scale the digital curation towards a precision medicine ideal.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Library of Medicine (NLM)
Type: Research Project--Cooperative Agreements (U01)
Project #: 1U01LM012675-01
Application #: 9403171
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Florance, Valerie

Project Start: 2017-08-01
Project End: 2021-07-31
Budget Start: 2017-08-01
Budget End: 2018-07-31
Support Year: 1
Fiscal Year: 2017
Total Cost
Indirect Cost

Institution

Name: University of California San Francisco
Department: Internal Medicine/Medicine
Type: Schools of Medicine
DUNS #: 094878337

City: San Francisco
State: CA
Country: United States
Zip Code: 94118

Related projects


NIH 2020 U01 LM	Crowd-Assisted Deep Learning (CrADLe) Digital Curation to Translate Big Data into Precision Medicine Hadley, Dexter D. / University of Central Florida
NIH 2019 U01 LM	Crowd-Assisted Deep Learning (CrADLe) Digital Curation to Translate Big Data into Precision Medicine Hadley, Dexter D. / University of California San Francisco
NIH 2019 U01 LM	Crowd-Assisted Deep Learning (CrADLe) Digital Curation to Translate Big Data into Precision Medicine Hadley, Dexter D. / University of Central Florida
NIH 2018 U01 LM	Crowd-Assisted Deep Learning (CrADLe) Digital Curation to Translate Big Data into Precision Medicine Hadley, Dexter D. / University of California San Francisco
NIH 2017 U01 LM	Crowd-Assisted Deep Learning (CrADLe) Digital Curation to Translate Big Data into Precision Medicine Hadley, Dexter D. / University of California San Francisco

Publications

Hadley, Dexter; Pan, James; El-Sayed, Osama et al. (2017) Precision annotation of digital samples in NCBI's gene expression omnibus. Sci Data 4:170125

Himmelstein, Daniel Scott; Lizee, Antoine; Hessler, Christine et al. (2017) Systematic integration of biomedical knowledge prioritizes drugs for repurposing. Elife 6:

Comments

Be the first to comment on Dexter Hadley's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: