Aberrant RNA base modifications have been correlated with the development of major diseases including breast cancer, type-2 diabetes, obesity, and neurological disorders, each affecting millions of Americans. However, these modifications are undetectable by current high-throughput RNA sequencing technologies, which do not directly sequence RNAs, but instead sequence cDNAs that only contain the four canonical deoxynucleotides. Other tools to sequence nucleobase modifications in RNA are usually tailored for a single specific modified nucleotide and cannot provide single-base-resolution spatial information for modifications. Thus, very few of the over 160 identified RNA modifications have been studied. To better understand RNA with its rich modifications, we have been developing a mass spectrometry (MS)-based 2-dimensional hydrophobic end-labeling sequencing strategy (2-D HELS MS Seq) as: 1) a de novo and accurate method to directly sequence RNA and 2) a general method to sequence all base modifications in any RNA type at single-base resolution. The method can currently sequence purified or mixed samples of short synthetic RNAs and simultaneously identify, locate, and quantify the frequency of a specific modification in a population. In this proposal, we focus on improving read-length, throughput, and sensitivity to sequence rare RNA modifications, quantify post-transcriptional base modifications, and detect active isoforms of mixed cellular RNA samples. We propose to (a) de novo MS sequence specific and total cellular tRNA (<100 nt) as proof-of-concept examples (Aim 1), (b) de novo sequence complex endogenous RNA samples (up to 100 strands, 950 nt per run) (Aim 2), and (c) quantify genome wide post- transcriptional RNA modifications in metabolic disease models (Aim 3). This project is highly significant as successful accomplishment of the proposed work will 1) bring the power of MS-based laddering technology to RNA, thus providing a method comparable to analysis of peptide modifications in proteomics, that can reveal the identity and position of various RNA modifications, 2) allow direct and de novo RNA sequencing without cDNA synthesis, and 3) allow accurate reading of multiple base modifications at single nucleotide resolution in one experiment without prior knowledge of sequences and modifications, helping to address a long-standing unmet need in the broad field of epitranscriptomics. Our tool will promote better understanding of functions of post- transcriptional modifications and isoforms including their correlations to human diseases; we will develop the method into a gold standard for verifying other techniques for sequencing and annotating genome-wide base modifications, thereby helping to build more accurate and inclusive reference epitranscriptomic databases.

Public Health Relevance

Although more than 160 RNA modifications have been identified so far, we only know the function of a few of these, mainly due to technological limitations. The goal of the proposed research is to bring the power of mass laddering technology from proteomics to RNA for developing a direct RNA sequencing tool that will allow for accurate determination of the presence, type, quantity, and location of multiple RNA modifications at single- base resolution all in a single analysis, as a critical step toward elucidating their possible impact on human health.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
High Priority, Short Term Project Award (R56)
Project #
1R56HG011099-01
Application #
10217648
Study Section
Special Emphasis Panel (ZHG1)
Program Officer
Smith, Michael
Project Start
2020-09-03
Project End
2021-08-31
Budget Start
2020-09-03
Budget End
2021-08-31
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
New York Institute of Technology
Department
Other Basic Sciences
Type
Schools of Arts and Sciences
DUNS #
050594019
City
Old Westbury
State
NY
Country
United States
Zip Code
11568