Robust Identification and accurate quantification of RNA transcripts on a system wide scale

Li, Jingyi

Abstract

Next-generation, Illumina RNA sequencing (RNA-seq) is by far the most widely used assay for investigating animal transcriptomes, and numerous public RNA-seq data sets have been generated for various biological conditions in multiple species. However, there remain several barriers in using short RNA-seq reads to accurately identify the splicing structures and quantify the abundances of full-length RNA transcripts. In this proposal, we will develop a series of novel statistical and computational methods to improve the robustness of transcript identification and the accuracy of transcript quantification from Illumina RNA-seq data.
(Aim 1) We will develop a novel screening method to construct transcript candidates by first detecting sparse splicing structures from multiple RNA-seq data sets for a given biological condition. These transcript candidates will significantly reduce the search space of downstream transcript identification methods and hence improve their precision.
(Aim 2) We will develop a robust transcript identification method to identify novel transcripts in a conservative manner from RNA-seq data given existing annotations. Our method will be based on statistical model selection under the Neyman-Pearson paradigm, which will allow users to control the false positive rate of our identified novel transcripts under any given threshold with high probability.
(Aim 3) We will develop an accurate transcript quantification method to effectively leverage multiple RNA-seq data sets and to simultaneously assess the data quality based on low-throughput gold standards and cross-data similarities. All of these methods will be first used to study transcripts in mouse macrophage, for which gold standard qPCR and full length cDNA sequences will be generated for training and method validation. The methods will then be more broadly tested in other biological systems where suitable gold standard data is available. Our methods and software will significantly facilitate the use of Illumina RNA-seq data for gene expression studies at the transcript level, increase reproducibility of scientific discoveries from transcriptomic studies, and improve our understanding of gene expression mechanisms in various biological conditions.

Public Health Relevance

This project will create a set of computational methods to improve the robustness and accuracy of detecting and quantifying RNA molecules from next-generation RNA sequencing data. Those methods will serve as useful tools for investigating gene expression changes in different biological conditions on a finer scale at the transcript level. We will distribute the methods in open-source software packages to benefit the scientific and biomedical communities.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM120507-02
Application #: 9332408
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Ravichandran, Veerasamy

Project Start: 2016-09-01
Project End: 2021-05-31
Budget Start: 2017-06-01
Budget End: 2018-05-31
Support Year: 2
Fiscal Year: 2017
Total Cost: $301,125
Indirect Cost: $98,625

Institution

Name: University of California Los Angeles
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 092530369

City: Los Angeles
State: CA
Country: United States
Zip Code: 90095

Related projects


NIH 2020 R01 GM	Robust Identification and accurate quantification of RNA transcripts on a system wide scale Li, Jingyi / University of California Los Angeles
NIH 2019 R01 GM	Robust Identification and accurate quantification of RNA transcripts on a system wide scale Li, Jingyi / University of California Los Angeles
NIH 2018 R01 GM	Robust Identification and accurate quantification of RNA transcripts on a system wide scale Li, Jingyi / University of California Los Angeles
NIH 2017 R01 GM	Robust Identification and accurate quantification of RNA transcripts on a system wide scale Li, Jingyi / University of California Los Angeles	$301,125
NIH 2016 R01 GM	Robust Identification and accurate quantification of RNA transcripts on a system wide scale Li, Jingyi / University of California Los Angeles

Publications

Burke, Jordan E; Longhurst, Adam D; Merkurjev, Daria et al. (2018) Spliceosome Profiling Visualizes Operations of a Dynamic RNP at Nucleotide Resolution. Cell 173:1014-1030.e17

Tong, Xin; Feng, Yang; Li, Jingyi Jessica (2018) Neyman-Pearson classification algorithms and NP receiver operating characteristics. Sci Adv 4:eaao1659

Li, Wei Vivian; Li, Jingyi Jessica (2018) An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat Commun 9:997

Li, Wei Vivian; Zhao, Anqi; Zhang, Shihua et al. (2018) MSIQ: JOINT MODELING OF MULTIPLE RNA-SEQ SAMPLES FOR ACCURATE ISOFORM QUANTIFICATION. Ann Appl Stat 12:510-539

Yang, Yang; Yang, Yu-Cheng T; Yuan, Jiapei et al. (2017) Large-scale mapping of mammalian transcriptomes identifies conserved genes associated with different cell states. Nucleic Acids Res 45:1657-1672

Gao, Ruiqi; Li, Jingyi Jessica (2017) Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons. BMC Genomics 18:234

Li, Jingyi Jessica; Chew, Guo-Liang; Biggin, Mark D (2017) Quantitating translational control: mRNA abundance-dependent and independent contributions and the mRNA sequences that specify them. Nucleic Acids Res 45:11821-11836

Li, Wei Vivian; Chen, Yiling; Li, Jingyi Jessica (2017) TROM: A Testing-Based Method for Finding Transcriptomic Similarity of Biological Samples. Stat Biosci 9:105-136

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: