Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell RNA-Sequencing Data

Linderman, George

Abstract

Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell RNA-Sequencing data Single cell RNA-sequencing (scRNA-seq) is revolutionizing the study of gene expression. In contrast to bulk RNA- sequencing, where the average expression of all cells in a sample is measured, scRNA-seq allows researchers to measure gene expression in each cell individually. This new technology has profound implications for both basic and clinical research but also presents unique analytic challenges. Among them is the problem of scale: scRNA- seq datasets are growing exponentially in size, with recently developed droplet-based technologies already profiling over 1 million cells in a single experiment. When applied to data of this scale, the most common methods for dimensionality reduction and visualization of scRNA-seq data, principal component analysis (PCA) and t-distributed Stochastic Neighborhood Embedding (t-SNE), require many hours of processing on servers with large amounts of memory. Furthermore, scRNA-seq technologies attempt to measure the extremely small amount of RNA in individual cells, resulting in a phenomenon called ?dropout,? in which a gene is expressed but not detected and hence incorrectly measured as being unexpressed. In this fellowship, specifically tailored and highly scalable analysis methods for scRNA-seq data will be developed: 1) An ultra-fast, out-of-core implementation of randomized PCA allowing for anyone with a standard laptop to perform PCA of even the largest datasets. 2) An improved implementation of t-SNE that incorporates recent theoretical results and that will also use a numerical approximation called fast multipole methods to dramatically accelerate its runtime. 3) A method for imputing ?dropped out? gene expression using recent results from the theory of low-rank matrix completion. In summary, this research will provide practical tools for analysis and visualization of scRNA-seq data. The fellowship also includes a training plan with valuable learning experiences for the applicant?s development as a physician-scientist who can apply computational and mathematical methods to solving biomedical problems.

Public Health Relevance

Single-cell RNA-sequencing (scRNA-seq) is a powerful new technology that is providing key insights into both human physiology and disease. For this technology to reach its full potential, specially designed computational methods are required for analysis and visualization of increasingly large scRNA-seq datasets. This fellowship will fill the current gap in computational methods by developing ultra-fast implementations of tools for analysis and visualization of these and other genomic data.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Individual Predoctoral NRSA for M.D./Ph.D. Fellowships (ADAMHA) (F30)
Project #: 1F30HG010102-01
Application #: 9539023
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Gatlin, Christine L

Project Start: 2018-03-01
Project End: 2021-02-28
Budget Start: 2018-03-01
Budget End: 2019-02-28
Support Year: 1
Fiscal Year: 2018
Total Cost
Indirect Cost

Institution

Name: Yale University
Department
Type: Graduate Schools
DUNS #: 043207562

City: New Haven
State: CT
Country: United States
Zip Code

Related projects


NIH 2020 F30 HG	Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell RNA-Sequencing Data Linderman, George / Yale University
NIH 2019 F30 HG	Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell RNA-Sequencing Data Linderman, George / Yale University
NIH 2018 F30 HG	Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Single Cell RNA-Sequencing Data Linderman, George / Yale University

Publications

Li, Huamin; Linderman, George C; Szlam, Arthur et al. (2017) Algorithm 971: An Implementation of a Randomized Algorithm for Principal Component Analysis. ACM Trans Math Softw 43:

Comments

Be the first to comment on George Linderman's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: