Recent studies based on ribosome profiling (ribo-seq) technique have revealed an unanticipated, complex translational landscape in metazoans, with extensive translation beyond the conventional annotated translation events. Some of these novel open reading frame (ORF)-encoded polypeptides produced by unannotated translation events have been shown to play important developmental or physiological roles. However, the functions of most unannotated ORFs remain unknown, and the critical first step toward decoding their functions is to systematically catalogue those that undergo active translation. Ribo-seq data arguably provide the best source of information for this task, given the genome-wide coverage and sensitivity of the ribo-seq technique. There are variations of ribo-seq technique that are based on the use of different translational inhibitors. Regular ribo-seq (rRibo-seq) utilizes cycloheximide (CHX), a translation elongation inhibitor, to freeze all translating ribosomes. In contrast to CHX, the use of translation inhibitor harringtonine or lactimidomycin, which has a much stronger effect for capturing the initiating ribosomes, enables global mapping of translation initiating sites (TISs) by sequencing (TI-seq). Despite the broad applicability and wide adoption of rRibo-seq and TI-seq, a comprehensive and integrated computational platform that enables de novo prediction of novel ORFs from different types of ribo-seq data, and allows for interactive exploration, visualization and meta- analysis of in-house and publically available ribo-seq datasets to study unannotated ORFs is lacking. To fill this gap, we propose to develop an integrated computational platform to facilitate the study of unannotated ORFs in eukaryotes using different types of ribo-seq data. This computational platform will have three core components: first, a new computational toolkit that provides a comprehensive informatic solution to both low-level and high- level analysis of data from different types of ribo-seq experiments; second, a computational framework that enables user-friendly interactive exploration and visualization of quality control and analysis results as well as ribo-seq signal tracks of individual datasets; third, a web data portal that dynamically updates and analyzes the published ribo-seq datasets from metazoa, plants and fungi, and allows a general user to perform meta- analysis of unannotated ORFs across different datasets, species and biological contexts. Furthermore, as a biological application, we will combine this computational platform with both large- and small-scale experimental approaches to uncover novel ORFs whose expression is regulated by estrogen and that are important for estrogen-dependent cell proliferation or survival, and to dissect the molecular mechanisms underlying their biological function. The study proposed here builds upon strong preliminary data. Given our expertise in computational and experimental biology, and the highly complementary expertise and support provided by our collaborators from UT MD Anderson Cancer Center, Baylor College of Medicine and UT Southwestern, we are ideally situated to tackle this project.

Public Health Relevance

Growing evidence supports that the polypeptide encoded by unannotated ORFs can be key regulatory molecules in developmental and physiological processes. The proposed research will develop a novel computational platform to facilitate the study of unannotated ORFs and will combine it with experimental approaches to systematically identify unannotated ORFs that may have important function in estrogen signaling. The proposed studies will provide new insight into the function and mechanism of unannotated ORFs and might help to facilitate the development of diagnosis and therapeutics for human diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM130838-01A1
Application #
9816361
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Reddy, Michael K
Project Start
2019-09-10
Project End
2024-06-30
Budget Start
2019-09-10
Budget End
2020-06-30
Support Year
1
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Texas MD Anderson Cancer Center
Department
Biostatistics & Other Math Sci
Type
Hospitals
DUNS #
800772139
City
Houston
State
TX
Country
United States
Zip Code
77030