Recent evidence has shown that non-coding RNAs are ubiquitous in the cell and that their functions and structure vary to a greater extent than previously imagined. Multiple new RNA classes have been implicated in many diseases, and understanding how these RNAs work is a critical need. While exciting discoveries are accumulating, our functional knowledge of these new RNAs remains limited. Here we propose to couple a new high-throughput RNA duplex sequencing technology with new, computational methods to economically study novel functional non-coding RNA at a genomic scale. We propose to develop two computational methodologies to characterize putative newly found non-coding RNAs on the genomic scale. First, we will develop a maximum likelihood approach that estimates RNA secondary structure using RNA-seq assays that preferentially sequence single- or double-stranded nucleotides. Second, we will develop a machine-learning framework that predicts the functional category of novel non-coding RNAs using length and structure features of known RNAs. These structural and functional predictions will be validated by comparative genomics and experimentation. We will develop databases and analysis software, and investigate the human genome and five other model organisms. In total, our findings will yield tremendous insights into non-coding RNA biology and will substantially impact continued study of these important molecules.

Public Health Relevance

We propose to develop computational methods to study novel non-coding RNA transcripts by leveraging a new duplex RNA sequencing technique. Our first objective is to develop a maximum likelihood algorithm that estimates secondary structure using double-stranded or single-stranded RNA sequencing. We will also develop a machine-learning framework that predicts the functional category of novel non-coding RNAs using length and structure features from RNA-seq experiments. These methods will be used to annotate all RNA transcripts using experimental data from human and five model organisms.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM099962-02
Application #
8545184
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Brazhnik, Paul
Project Start
2012-09-14
Project End
2017-05-31
Budget Start
2013-06-01
Budget End
2014-05-31
Support Year
2
Fiscal Year
2013
Total Cost
$301,080
Indirect Cost
$112,905
Name
University of Pennsylvania
Department
Pathology
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Amlie-Wolf, Alexandre; Tang, Mitchell; Mlynarski, Elisabeth E et al. (2018) INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Res 46:8740-8753
Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin et al. (2018) SPAR: small RNA-seq portal for analysis of sequencing experiments. Nucleic Acids Res 46:W36-W42
Hafez, Dina; Karabacak, Aslihan; Krueger, Sabrina et al. (2017) McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes. Genome Biol 18:199
Leung, Yuk Yee; Kuksa, Pavel P; Amlie-Wolf, Alexandre et al. (2016) DASHR: database of small human noncoding RNAs. Nucleic Acids Res 44:D216-22
Berkowitz, Nathan D; Silverman, Ian M; Childress, Daniel M et al. (2016) A comprehensive database of high-throughput sequencing-based RNA secondary structure probing data (Structure Surfer). BMC Bioinformatics 17:215
Amlie-Wolf, Alexandre; Ryvkin, Paul; Tong, Rui et al. (2015) Transcriptomic Changes Due to Cytoplasmic TDP-43 Expression Reveal Dysregulation of Histone Transcripts and Nuclear Chromatin. PLoS One 10:e0141836
Vandivier, Lee E; Campos, Rafael; Kuksa, Pavel P et al. (2015) Chemical Modifications Mark Alternatively Spliced and Uncapped Messenger RNAs in Arabidopsis. Plant Cell 27:3024-37
Hwang, Yih-Chii; Lin, Chiao-Feng; Valladares, Otto et al. (2015) HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics 31:1290-2
Mirarab, Siavash; Nguyen, Nam; Guo, Sheng et al. (2015) PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences. J Comput Biol 22:377-86
Ryvkin, Paul; Leung, Yuk Yee; Ungar, Lyle H et al. (2014) Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs. Methods 67:28-35

Showing the most recent 10 out of 14 publications