RNA is an essential molecule for life, serving many roles in the functioning of cells. Chemical modifications to RNA can affect its rate of turnover. In plants, little is known about the chemical diversity of these modifications and the biological contexts in which they occur, due to the difficulties in biochemically identifying them at the cellular level. This project aims to use computational approaches to systematically identify RNA modifications by reanalyzing ~1 petabyte of publicly available data from a broad diversity of model and agriculturally important plant species. This project will identify and classify RNA modifications based on where they are applied, the biological conditions (e.g., during the development of the plant, when the plant experiences stress, etc.), and the degree to which each modification is conserved between plant species. This project is devoted to training the next generation of plant scientists at the interface between computers and biology. Undergraduate students will play an integral role in the identification and biological validation of RNA modifications. Altogether, this project will provide insight into the biological significance of RNA modifications using novel applications of data in public repositories.
RNA chemical modifications are diverse, occur on all classes of RNA molecules, and are physiologically relevant. However, RNA modifications have not yet been studied in depth in plants. This gap in knowledge is in large part due to the cost and technical difficulties of the biochemical assays used to measure abundance of specific RNA modifications. In light of these difficulties, in silico methods have been developed that facilitate high-throughput identification and prediction of these chemical additions. This proposal aims to address challenges in identifying modifications and placing them into a biological context by: 1) developing an exhaustive, annotated plant epitranscriptomic resource of over 47 unique modifications using approximately 1 petabase of publicly available RNA-seq data, and 2) provide a biological and evolutionary context for each of these modifications and the RNAs they are found to modify. To process the wealth of publicly available RNA-seq data and present the resulting information in a manner that will drive hypothesis generation, this project will develop novel computational workflows and data visualization tools.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.