Statistical Methods for RNA-seq Data Analysis

Sun, Wei

Abstract

Traditional RNA-seq studies collect RNA-seq data from bulk samples (bulk RNA-seq) and thus aggregate the signals from multiple cell types. Gene expression variation across samples may be due to difference of cell type composition or cell type-specific gene expression, and bulk RNA-seq data cannot distinguish these two factors. In fact, cell type-specific signals may be masked or even misrepresented by bulk RNA-seq data. Single cell RNA-sequencing (scRNA-seq) may overcome part of the limitations of bulk RNA-seq. However, in a foreseeable future, it cannot be applied to a large cohort due to cost and logistical barriers. In this R01 proposal, we propose new statistical/computational methods to study cell type composition or cell type-specific gene expression using bulk RNA-seq data, scRNA-seq data, or both bulk RNA-seq and scRNA-seq data. This approach can effectively exploit the huge amount of existing bulk RNA-seq data, and it can bring paradigm- shifting changes to many fields, for example, identifying cell types associated with a disease trait or defining new biomarkers using cell type-specific gene expression. We plan to achieve the following three specific aims.
In Aim 1, we propose novel methods for cell type-specific differential expression analysis as well as methods to assess the association between cell type composition and covariates of interest.
In Aim 2, we focus on the association between cell type-specific gene expression and germline genetic variants, i.e., studying cell type- specific gene expression quantitative trait loci (eQTLs).
In Aim 3, we study the association between somatic mutations and cell type composition or cell type-specific gene expression.

Public Health Relevance

We propose to develop statistical methods and software packages to study RNA sequencing data collected from bulk tissue samples and/or single cells. Our project will break new ground to study cell type composition as well as cell type-specific gene expression, which can have significant impact on many fields, for example, to identify cell types associated with certain disease trait or to identify cell type-specific biomarkers for treatment or prognosis.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM105785-07
Application #: 9851407
Study Section: Biostatistical Methods and Research Design Study Section (BMRD)
Program Officer: Krasnewich, Donna M

Project Start: 2014-05-15
Project End: 2022-12-31
Budget Start: 2020-01-01
Budget End: 2020-12-31
Support Year: 7
Fiscal Year: 2020
Total Cost
Indirect Cost

Institution

Name: Fred Hutchinson Cancer Research Center
Department
Type
DUNS #: 078200995

City: Seattle
State: WA
Country: United States
Zip Code: 98109

Related projects


NIH 2021 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / Fred Hutchinson Cancer Research Center
NIH 2020 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / Fred Hutchinson Cancer Research Center
NIH 2019 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / Fred Hutchinson Cancer Research Center
NIH 2017 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / Fred Hutchinson Cancer Research Center
NIH 2016 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / Fred Hutchinson Cancer Research Center
NIH 2015 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / University of North Carolina Chapel Hill	$365,881
NIH 2015 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / Fred Hutchinson Cancer Research Center	$254,443
NIH 2014 R01 GM	Statistical Methods for RNA-seq Data Analysis Sun, Wei / University of North Carolina Chapel Hill

Publications

Liu, Yanyan; Xiong, Sican; Sun, Wei et al. (2018) Joint Analysis of Strain and Parent-of-Origin Effects for Recombinant Inbred Intercrosses Generated from Multiparent Populations with the Collaborative Cross as an Example. G3 (Bethesda) 8:599-605

Sun, Wei; Bunn, Paul; Jin, Chong et al. (2018) The association between copy number aberration, DNA methylation and gene expression in tumor samples. Nucleic Acids Res 46:3009-3018

He, Qianchuan; Liu, Yang; Sun, Wei (2018) Statistical analysis of non-coding RNA data. Cancer Lett 417:161-167

Liu, Yang; He, Qianchan; Sun, Wei (2018) Association analysis using somatic mutations. PLoS Genet 14:e1007746

Kirk, Jessime M; Kim, Susan O; Inoue, Kaoru et al. (2018) Functional classification of long non-coding RNAs by k-mer content. Nat Genet 50:1474-1482

Chen, Ting-Huei; Sun, Wei (2017) Prediction of cancer drug sensitivity using high-dimensional omic features. Biostatistics 18:1-14

Zhang, Yiwen; Zhou, Hua; Zhou, Jin et al. (2017) Regression Models For Multivariate Count Data. J Comput Graph Stat 26:1-13

Zhou, Hua; Blangero, John; Dyer, Thomas D et al. (2017) Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data. Genet Epidemiol 41:174-186

Hu, Yi-Juan; Liao, Peizhou; Johnston, H Richard et al. (2016) Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls. PLoS Genet 12:e1006040

Rashid, Naim U; Sun, Wei; Ibrahim, Joseph G (2016) A STATISTICAL MODEL TO ASSESS (ALLELE-SPECIFIC) ASSOCIATIONS BETWEEN GENE EXPRESSION AND EPIGENETIC FEATURES USING SEQUENCING DATA. Ann Appl Stat 10:2254-2273

Showing the most recent 10 out of 22 publications

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: