A basic question in cell biology is to understand the driving mechanisms that control how and when genes are expressed, and to identify the active switches in those processes. The first step of gene expression is production of an RNA molecule from the genomic DNA, "transcription". As instruments become available that allow detection of the original RNA molecules from cells, it is becoming possible to identify sites where RNA bases have been chemically modified after their initial transcription. This is important because some of these post-transcriptional modifications play a role in how the expressed RNA is translated into expressed protein. Little is known as yet about the molecular players that are involved in the myriad steps that govern expression patterns, including localization, splicing, stability and folded structure of the RNA. This project aims to detect, identify and quantify the extent of modifications on RNA molecules as measured on the Oxford Nanopore platform, as a required first step in understanding those biological functions. Gold-standard calibration sets of synthetic oligonucleotides will be designed, produced and tested as part of the experimental design, and new algorithms and subsequent software will provide single-nucleotide resolution of the type and locations of robustly detected modifications in natural transcripts in yeast and human data sets.

Lack of efficient high throughput detection methods has plagued the emerging field of epitranscriptomics, which is focused on the role of chemical modifications on RNA bases in modulating the biological function and structure of RNA molecules. The overarching research goal of this project is to develop computational methods to map RNA modification sites for 5-methyl cytosine (5mC), 1-methyl adenosine (m1A) and methylation of the backbone of the RNA nucleotides (Nm) at a single nucleotide resolution. Experiments will employ synthetic calibration oligonucleotides as well as use newly developed algorithms to probe natural yeast and human transcripts, using the long-read direct RNA sequencing data resulting from Oxford Nanopore sequencing technology. The project will complement current transcriptomic reference maps of these modification events with additional data needed to train computational methods, from gold-standard calibration sets composed of synthetic RNA oligonucleotides. The resulting Oxford Nanopore signatures of modification sites will be analyzed using deep learning for signal analysis and statistical methods for robustness in precision and accuracy. All resulting methods, databases and maps of RNA modification types across species will be made publicly available from the project web site. The research program involves a team whose expertise intersects several domains of science, including engineering, bioinformatics, genomics and computational science, providing an excellent environment and experience for developing a new generation of inter-disciplinary scientists. Data, code and other infrastructure resources will be reported at www.iupui.edu/~jangalab/.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
1940422
Program Officer
Jean Gao
Project Start
Project End
Budget Start
2019-09-15
Budget End
2021-08-31
Support Year
Fiscal Year
2019
Total Cost
$299,698
Indirect Cost
Name
Indiana University
Department
Type
DUNS #
City
Bloomington
State
IN
Country
United States
Zip Code
47401