Recently, the RNA-seq technology is increasingly replacing microarray for expression profiling. In this proposal, we will timely address the emerging challenges and opportunities brought by the rapidly accumulating RNA-seq data. We will design novel methods to perform integrative analysis of many RNA-seq datasets to study the functions and regulations of alternative splicing. In particular, we have the following specifi aims: (1) We will develop a novel graph- based pattern mining method to reconstruct an atlas of splicing modules and identify the associated experimental conditions in human, mouse, fly, and yeast. (2) We will study the coupling between transcription and splicing, the two important regulatory processes, by exploiting both expression and splicing information provided by RNA-seq data. We will design a novel multi-layer network mining approach to systematically identify coupled transcription- splicing modules. (3) We will predict the functions of alternatively spliced transcripts to establish a high-resolution function annotation of human genome. The predicted functions will be incorporated into the GeneOntology and the Ensembl databases to benefit the biological community. (4) We will perform experimental validation on a subset of computational predictions made in Aims 1, 2, 3. (5) We will develop web databases and software to directly benefit the scientific community. Our methods and software will significantly facilitate the re-use of the vast amount of existing RNA-seq data, reduce the necessity to generate new data, and improve our understanding of gene regulations under a variety of perturbations.

Public Health Relevance

This project will create a set of computational methods to facilitate the re-use of the rapidly accumulating public RNA-seq repositories. We will generate an atlas of splicing modules specific to diverse diseases, and will predict specific functions of splicing isoforms. We will experimentally validate a subset of predictions related to cancer. Finally, we will develop software and web servers to directly benefit the biomedical community.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-U (02))
Program Officer
Bender, Michael T
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Schools of Arts and Sciences
Los Angeles
United States
Zip Code
Lu, Zhi-xiang; Huang, Qin; Park, Juw Won et al. (2015) Transcriptome-wide landscape of pre-mRNA alternative splicing associated with metastatic colonization. Mol Cancer Res 13:305-18
Li, Wenyuan; Dai, Chao; Kang, Shuli et al. (2014) Integrative analysis of many RNA-seq datasets to study alternative splicing. Methods 67:313-24
Liu, Chun-Chi; Tseng, Yu-Ting; Li, Wenyuan et al. (2014) DiseaseConnect: a comprehensive web server for mechanism-based disease-disease connections. Nucleic Acids Res 42:W137-46
Li, Wenyuan; Kang, Shuli; Liu, Chun-Chi et al. (2014) High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method. Nucleic Acids Res 42:e39