Computational genome-wide RNA profiling using next-generation sequencing

Wang, Li-San

Abstract

Recent evidence has shown that non-coding RNAs are ubiquitous in the cell and that their functions and structure vary to a greater extent than previously imagined. Multiple new RNA classes have been implicated in many diseases, and understanding how these RNAs work is a critical need. While exciting discoveries are accumulating, our functional knowledge of these new RNAs remains limited. Here we propose to couple a new high-throughput RNA duplex sequencing technology with new, computational methods to economically study novel functional non-coding RNA at a genomic scale. We propose to develop two computational methodologies to characterize putative newly found non-coding RNAs on the genomic scale. First, we will develop a maximum likelihood approach that estimates RNA secondary structure using RNA-seq assays that preferentially sequence single- or double-stranded nucleotides. Second, we will develop a machine-learning framework that predicts the functional category of novel non-coding RNAs using length and structure features of known RNAs. These structural and functional predictions will be validated by comparative genomics and experimentation. We will develop databases and analysis software, and investigate the human genome and five other model organisms. In total, our findings will yield tremendous insights into non-coding RNA biology and will substantially impact continued study of these important molecules.

Public Health Relevance

We propose to develop computational methods to study novel non-coding RNA transcripts by leveraging a new duplex RNA sequencing technique. Our first objective is to develop a maximum likelihood algorithm that estimates secondary structure using double-stranded or single-stranded RNA sequencing. We will also develop a machine-learning framework that predicts the functional category of novel non-coding RNAs using length and structure features from RNA-seq experiments. These methods will be used to annotate all RNA transcripts using experimental data from human and five model organisms.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Institute of General Medical Sciences (NIGMS)
Type: Research Project (R01)
Project #: 5R01GM099962-02
Application #: 8545184
Study Section: Biodata Management and Analysis Study Section (BDMA)
Program Officer: Brazhnik, Paul

Project Start: 2012-09-14
Project End: 2017-05-31
Budget Start: 2013-06-01
Budget End: 2014-05-31
Support Year: 2
Fiscal Year: 2013
Total Cost: $301,080
Indirect Cost: $112,905

Institution

Name: University of Pennsylvania
Department: Pathology
Type: Schools of Medicine
DUNS #: 042250712

City: Philadelphia
State: PA
Country: United States
Zip Code: 19104

Related projects


NIH 2016 R01 GM	Computational genome-wide RNA profiling using next-generation sequencing Wang, Li-San / University of Pennsylvania
NIH 2015 R01 GM	Computational genome-wide RNA profiling using next-generation sequencing Wang, Li-San / University of Pennsylvania
NIH 2014 R01 GM	Computational genome-wide RNA profiling using next-generation sequencing Wang, Li-San / University of Pennsylvania	$312,000
NIH 2013 R01 GM	Computational genome-wide RNA profiling using next-generation sequencing Wang, Li-San / University of Pennsylvania	$301,080
NIH 2012 R01 GM	Computational genome-wide RNA profiling using next-generation sequencing Wang, Li-San / University of Pennsylvania	$312,000

Publications

Amlie-Wolf, Alexandre; Tang, Mitchell; Mlynarski, Elisabeth E et al. (2018) INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Res 46:8740-8753

Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin et al. (2018) SPAR: small RNA-seq portal for analysis of sequencing experiments. Nucleic Acids Res 46:W36-W42

Hafez, Dina; Karabacak, Aslihan; Krueger, Sabrina et al. (2017) McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes. Genome Biol 18:199

Berkowitz, Nathan D; Silverman, Ian M; Childress, Daniel M et al. (2016) A comprehensive database of high-throughput sequencing-based RNA secondary structure probing data (Structure Surfer). BMC Bioinformatics 17:215

Leung, Yuk Yee; Kuksa, Pavel P; Amlie-Wolf, Alexandre et al. (2016) DASHR: database of small human noncoding RNAs. Nucleic Acids Res 44:D216-22

Vandivier, Lee E; Campos, Rafael; Kuksa, Pavel P et al. (2015) Chemical Modifications Mark Alternatively Spliced and Uncapped Messenger RNAs in Arabidopsis. Plant Cell 27:3024-37

Hwang, Yih-Chii; Lin, Chiao-Feng; Valladares, Otto et al. (2015) HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics 31:1290-2

Mirarab, Siavash; Nguyen, Nam; Guo, Sheng et al. (2015) PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences. J Comput Biol 22:377-86

Amlie-Wolf, Alexandre; Ryvkin, Paul; Tong, Rui et al. (2015) Transcriptomic Changes Due to Cytoplasmic TDP-43 Expression Reveal Dysregulation of Histone Transcripts and Nuclear Chromatin. PLoS One 10:e0141836

Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H et al. (2014) Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs. Methods 67:28-35

Showing the most recent 10 out of 14 publications

Comments

Be the first to comment on Li-San Wang's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: