Recently, the RNA-seq technology is increasingly replacing microarray for expression profiling. In this proposal, we will timely address the emerging challenges and opportunities brought by the rapidly accumulating RNA-seq data. We will design novel methods to perform integrative analysis of many RNA-seq datasets to study the functions and regulations of alternative splicing. In particular, we have the following specifi aims: (1) We will develop a novel graph- based pattern mining method to reconstruct an atlas of splicing modules and identify the associated experimental conditions in human, mouse, fly, and yeast. (2) We will study the coupling between transcription and splicing, the two important regulatory processes, by exploiting both expression and splicing information provided by RNA-seq data. We will design a novel multi-layer network mining approach to systematically identify coupled transcription- splicing modules. (3) We will predict the functions of alternatively spliced transcripts to establish a high-resolution function annotation of human genome. The predicted functions will be incorporated into the GeneOntology and the Ensembl databases to benefit the biological community. (4) We will perform experimental validation on a subset of computational predictions made in Aims 1, 2, 3. (5) We will develop web databases and software to directly benefit the scientific community. Our methods and software will significantly facilitate the re-use of the vast amount of existing RNA-seq data, reduce the necessity to generate new data, and improve our understanding of gene regulations under a variety of perturbations.

Public Health Relevance

This project will create a set of computational methods to facilitate the re-use of the rapidly accumulating public RNA-seq repositories. We will generate an atlas of splicing modules specific to diverse diseases, and will predict specific functions of splicing isoforms. We will experimentally validate a subset of predictions related to cancer. Finally, we will develop software and web servers to directly benefit the biomedical community.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM105431-02
Application #
8637092
Study Section
Special Emphasis Panel (ZRG1-BST-U (02))
Program Officer
Bender, Michael T
Project Start
2013-04-01
Project End
2016-12-31
Budget Start
2014-01-01
Budget End
2014-12-31
Support Year
2
Fiscal Year
2014
Total Cost
$277,987
Indirect Cost
$91,812
Name
University of Southern California
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
072933393
City
Los Angeles
State
CA
Country
United States
Zip Code
90089
Lu, Zhi-xiang; Huang, Qin; Park, Juw Won et al. (2015) Transcriptome-wide landscape of pre-mRNA alternative splicing associated with metastatic colonization. Mol Cancer Res 13:305-18
Li, Wenyuan; Dai, Chao; Kang, Shuli et al. (2014) Integrative analysis of many RNA-seq datasets to study alternative splicing. Methods 67:313-24
Liu, Chun-Chi; Tseng, Yu-Ting; Li, Wenyuan et al. (2014) DiseaseConnect: a comprehensive web server for mechanism-based disease-disease connections. Nucleic Acids Res 42:W137-46
Li, Wenyuan; Kang, Shuli; Liu, Chun-Chi et al. (2014) High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method. Nucleic Acids Res 42:e39