Ribosome profiling technology provides quantitative insights into translational regulation at a genomic scale, a mechanism that plays a crucial role in several important biological processes from embryonic development to carcinogenesis. Despite advances in ribosome profiling data analysis methods, a number of challenges remain to be addressed including tests with small sample sizes and read count biases due to ribosome stalling. I propose to develop a logistic-regression-based method called ?Ribolog? to model ribosome profiling data in which individual sequencing reads are units of observation and translation efficiency is calculated as the odds of observing ?RPF? vs. ?RNA? reads. The logistic regression model has several distinct advantages over the methods based on negative binomial modeling of RNA-seq and Ribo-seq read counts: (i) It neither assumes equality of mean and variance nor does it require estimation of dispersion. (ii) It has much higher statistical power than count-based methods because in this model, statistical sample size equals the number of reads, not the number of replicates. (iii) It works with single sample per condition (unreplicated datasets); therefore, it is applicable to clinical or single cell data. (iv) It is easily adaptable for experiments with synthetic spike-in standards. (v) In replicated datasets, it enables empirical significance testing and calculation of novel informative QC measures. (vi) It can accommodate complex experimental designs involving multiple samples and covariates in one model; and is not limited to pairwise comparisons. Our preliminary results applying Ribolog to a dataset comprising two non-metastatic and two corresponding metastatic cell lines indicate that this method is indeed highly powerful and 80-90% reproducible among biological replicates. Additionally, we provide modules for stalling bias correction, meta-analysis, model selection, experimental design and quality control. Combining Ribolog with other analytical methods ? some developed previously in our lab ? we construct a multiomic framework to integrate Ribo-seq data with RNA-seq, tRNA profiling, genetic variation, miRNA, codon optimality etc. to identify the driving causes of translation dynamics and contribute to the next generation of multi-layered genotype-to-phenotype maps. The method will be implemented in R and made available to the scientific community as an open-access package. Given my expertise in statistics, my continued training in experimental biology, access to state-of-the-art datasets, and support from multiple labs with expertise in computational and experimental studies of translation and broader genomic topics and technologies, I am uniquely situated to tackle this problem. In addition to providing novel insights into the biology of translational control and benefiting the community, this project will enable me to extend my training in a number of exciting areas that are most relevant to my career development as a successful and independent academic research scientist.

Public Health Relevance

I intend to develop a novel powerful statistical framework to infer the significance of differential translation from for ribosome profiling data in replicated and uprelicated datasets with modules for bias correction, reproducibility check, identification of ribosome stalling factors, model selection and meta-analysis. With experimental validation of highly significant hits and integration with other omic data, this project will move towards a multiomic model of human phenotypes (e.g. cancer metastasis) emphasizing post-transcriptional regulation.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Postdoctoral Individual National Research Service Award (F32)
Project #
1F32GM133118-01
Application #
9760832
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Sakalian, Michael
Project Start
2019-08-01
Project End
2020-07-31
Budget Start
2019-08-01
Budget End
2020-07-31
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of California San Francisco
Department
Biochemistry
Type
Schools of Medicine
DUNS #
094878337
City
San Francisco
State
CA
Country
United States
Zip Code
94118