In eukaryotes, one gene can give rise to multiple protein isoforms through various types of alternative pre- mRNA processing (e.g., alternative splicing), contributing significantly to proteome complexity. Differential isoform expression manifests in pathogenesis of diseases from heart failure to neurodegeneration, as well as cellular responses to environmental stress including alcohol and oxidative damage. Advances in RNA-seq technology have led to the discovery of many novel alternative isoforms, but their biological impact is often unclear in the absence of protein information. Conversely, shotgun proteomics technology enables large-scale characterization of proteins, but the limitations of ?one-gene, one-product? databases prohibit their utility in protein isoform identification. Deeper insights into the biology of alternative isoforms require combining the complementary strengths of transcriptomics and proteomics. Accordingly, the integration of technical platforms from mRNA to protein has become an indispensable step in advancing a holistic portrait on gene products. Among the key challenges is the segregation of proteomics and transcriptomics repositories, as well as the disconnect of respective data analysis pipelines and expertise. Despite recent progress, there is an urgent and unmet need for well-integrated and user-friendly computational platforms that can support everyday biomedical researchers in harnessing diverse data types for multi-omics studies. The central goal of this project is to create a unified platform to decode alternative isoforms from RNA- seq/Ribo-seq data, and to guide shotgun proteomics characterization of protein isoforms. Our approach capitalizes on the rapid revolution of Big Data sciences in recent times, where new frontiers in multi-omics integration now make it possible to traverse heterogeneous computational resources and data types seamlessly. We will design, construct, and implement an integrative proteotranscriptomics framework (ProteoSeq), which will combine novel analytical models and custom proteomics workflows to coalesce transcriptomics and proteomics data for large-scale characterizations of alternative protein isoforms. Our proposal details three data science aims, which will (i) develop methods to infer full-length mRNA and protein isoforms from hybrid (short-read/long-read) RNA-seq and Ribo-seq data; (ii) engineer an integrative platform for users to analyze protein isoforms from proteotranscriptomics data on the cloud; and (iii) validate and accrue protein evidence for alternative isoforms in diverse high-value datasets. Our efforts aim to synergize two currently fragmentary omics fields and thereby empower inquiries on the regulations of alternative isoforms in health and disease. We envision the proposed computational tools will be generalizable to multiple biomedical disciplines, and will serve the broad scientific community for routine multi-omics investigations in translational medicine.

Public Health Relevance

Researchers now possess the technology to generate massive amounts of transcriptome sequencing data as well as proteomics data, but lack adequate means to analyze and integrate these Big Data effectively to discern deeper meanings. This application will create innovative computational solutions to integrate and model transcriptomics and proteomics data to systematically examine ?alternative isoform expression?, a biological phenomenon whereby one gene encodes for multiple protein products. By harnessing massive amounts of transcriptomics and proteomics data, this project will facilitate research into the biology of alternative mRNA and protein isoforms, and their roles in the origin and progression of numerous diseases including cancer, heart diseases, and neurodegeneration.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM117624-01A1
Application #
9193233
Study Section
Special Emphasis Panel (ZRG1-BST-T (03)M)
Program Officer
Ravichandran, Veerasamy
Project Start
2016-09-01
Project End
2019-08-31
Budget Start
2016-09-01
Budget End
2017-08-31
Support Year
1
Fiscal Year
2016
Total Cost
$364,980
Indirect Cost
$127,980
Name
University of California Los Angeles
Department
Microbiology/Immun/Virology
Type
Schools of Medicine
DUNS #
092530369
City
Los Angeles
State
CA
Country
United States
Zip Code
90095
Park, Eddie; Pan, Zhicheng; Zhang, Zijun et al. (2018) The Expanding Landscape of Alternative Splicing Variation in Human Populations. Am J Hum Genet 102:11-26
Park, Eddie; Guo, Jiguang; Shen, Shihao et al. (2017) Population and allelic variation of A-to-I RNA editing in human transcriptomes. Genome Biol 18:143