All cells make proteins by using molecular machines called ribosomes, which read a messenger RNA template and "translate" the RNA code into the protein code. Cells need to make the right proteins, at the right time, in the right quantities, and so this process is carefully controlled by signals that are also encoded in the RNA. These signals are complex and only just beginning to be understood because there are thousands of different RNA sequences in a cell and each is hundreds to thousands of nucleotides ("letters") long. Recent advances in DNA & RNA sequencing technology mean that we can now measure all parts of RNA that are translated into protein and how much by using a technique called ribosome profiling. Although this technique is amazing, it is not perfect, and statistical tools are needed to separate the interesting biological signals in the data from unwanted biases of the experimental measurement. These tools need to be implemented in usable and reliable software in order for all scientists studying studying protein synthesis to be able to get the maximum possible information from ribosome profiling data, which is expensive and time-consuming to collect. The RiboViz software suite, which is open source and free to use by anyone in the world, already takes raw data from sequencing machines and puts it through a series of processing steps. RiboViz estimates how much each part of RNA is translated, and how the amount of translation is controlled by the code of that RNA. RiboViz produces tables, figures and graphs that are accessible online, so is useful for both experts and non-experts. This kind of data sharing makes science more reproducible and more accessible.
This project will accelerate understanding of the mechanism and regulation of protein synthesis by extending the RiboViz open-source computational pipeline to extract biological insight from high-throughput data measuring protein synthesis. The goal is to further develop the RiboViz open-source software pipeline (https://github.com/shahpr/RiboViz) for accessible, reliable, reproducible, rigorous and bias-aware analysis and visualization of ribosome profiling data. Specific aims are to refactor RiboViz following best practices for scientific computing, by writing tests akin to experimental controls for each step of the setup, processing and analysis, and by containerization of the pipeline to enable running on different computers with full control of software dependencies; develop likelihood-based statistical methods for quantification of differential translation of open reading frames and codons while correcting for sequence-level bias, building on best practices in differential RNA abundance analysis, and implement these analysis and visualization tools within RiboViz; and to generate standardized ribosome profiling datasets by re-analyzing published datasets for all eukaryotes to quantify rigorously how codon usage and other sequence features predict protein synthesis. The improved RiboViz pipeline will accelerate studies of translation regulation and produce tested and rigorous tools that we will be disseminated as an open-source resource to the entire community studying translation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.