The human genome project led to intense efforts to pro?le gene expression globally to understand the difference between normal and disease states, tissue types, or developmental stages. The ultimate product of gene expression is protein, but researchers have largely been limited to measuring mRNA production as a proxy for protein. However, the translation of mRNA into protein can be highly regulated. Global measurements of protein production thus have enormous potential for understanding human physiology and ful?lling the promise of the human genome project to reveal how the genome encodes normal and disease states. Ribosome pro?ling is a powerful new technique to measure translation genome-wide by sequencing and counting the fragments of mRNA protected by translating ribosomes. This method has quickly been adopted by many labs, but at present, the data analysis requires computational expertise, and the analysis so far has used ad hoc methods without statistical rigor. This project aims to characterize the statistical properties of ribosomal protein reads, design methods to account for biases in fragment counts, and incorporate these into a maximum likelihood framework for estimating translation of each human transcript. The ?rst aim will generate high-quality, high-coverage data from human ribosome pro?ling experiments, and then use these data to develop maximum likelihood models for fragment positions, the effect of ligation bias on fragment recovery, and the proper proportional assignment of reads between alternative splice forms of a gene.
The second aim will combine the models of aim 1 into a piece of software that uses expectation maximization to generate genome-wide estimates of ribosome occupancy per codon and overall estimates of translation per transcript. This software will be designed to ?t into existing RNA-seq analysis pipelines. Th proposed work will provide an important and much needed tool for genome-wide measurement of gene expression. By providing a well-designed pipeline for ribosome pro?ling data, we will put this method within reach of many research groups. We will increase the accessibility of ribosome pro?ling as a method for understanding gene regulation in a wide range of medically relevant conditions.
Production of proteins from genes is tightly regulated in different tissues and at different developmental stages, and dramatically misregulated in cancers and other diseases. A new method, ribosome pro?ling, allows global measurement of protein production. This project will create analysis tools for ribosome pro?ling data, allowing researchers to measure changes in gene expression that may reveal causes and consequences of disease.