Tensor Array Methods for RNA-Seq Analysis

Li, Gen

Abstract

RNA-Sequencing (RNA-Seq) analysis provides a critical means to understand gene functions. High-throughput RNA-Seq data are frequently measured under multiple conditions from the same set of samples. For example, in the NIH Common Fund?s Genotype-Tissue Expression (GTEx) project, samples from different tissues are collected from each post-mortem donor for sequencing. For another study on ultraviolet (UV) radiation, skin keratinocytes from the same set of subjects are exposed to different radiation doses and durations before sequencing. Such common-sample, multi-condition RNA-Seq data have information shared across both samples and conditions, and have the potential to provide key insights into gene functions. However, despite great endeavors to collect such data, there is a lack of analytical methods and computational tools to maximize their potential. Important tasks such as missing data imputation, functional gene module identification and association analysis remain unaddressed. In this proposal, we will build an innovative and powerful paradigm to analyze multi-condition RNA-Seq data and thus improve our understanding of gene functions. To leverage information across conditions, samples and genes simultaneously, we propose to model RNA-Seq data as multi-way tensor arrays. We will develop novel tensor methods and theory that are appropriate for read count data. In particular, our first aim is to extend tensor completion methods for block-wise missing RNA-Seq data imputation. By modeling unobserved samples as missing blocks in a tensor, we will aggregate information along different modes (subjects, conditions, genes) to impute missing values.
The second aim develops flexible tensor co-clustering methods, which simultaneously cluster genes, samples and conditions, for co- expressed gene module identification.
The third aim i s to build new tensor response regression models to associate gene modules with genotype and covariates which will provide insights into genetic regulation such as expression quantitative trait loci (eQTL). Finally, in the fourth aim, we will develop scalable statistical software to implement the proposed methods and make them more broadly applicable. We will apply the methods to the GTEx multi-tissue data and UV multi-condition data, and gain novel insights into gene expression and regulation. The proposed research will likely transform how we analyze multi-condition RNA- Seq data and enhance our understanding of human genomics and its relation to public health.

Public Health Relevance

High-throughput RNA-Seq data collected under multiple conditions (e.g., tissues, experimental conditions, time points) from the same set of subjects provide an ideal resource for studying gene function and regulation. We propose to develop novel statistical methods and computational tools to maximize the utilization of these data and provide critical new insights into human genomics and its relation to public health.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 7R01HG010731-02
Application #: 10214847
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Pillai, Ajay

Project Start: 2020-08-31
Project End: 2025-02-28
Budget Start: 2020-08-31
Budget End: 2021-02-28
Support Year: 2
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: University of Michigan Ann Arbor
Department
Type
DUNS #: 073133571

City: Ann Arbor
State: MI
Country: United States
Zip Code: 48109

Related projects


NIH 2021 R01 HG	Tensor Array Methods for RNA-Seq Analysis Li, Gen / University of Michigan Ann Arbor
NIH 2020 R01 HG	Tensor Array Methods for RNA-Seq Analysis Li, Gen / Columbia University (N.Y.)
NIH 2020 R01 HG	Tensor Array Methods for RNA-Seq Analysis Li, Gen / University of Michigan Ann Arbor

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: