There is a fundamental gap in understanding how the splicing of a group of exons is co-regulated, how the splicing of an exon is combinatorially controlled by multiple regulators, and what are the general rules of "splicing code." The advent of high-throughput sequencing technologies provides us an unprecedented opportunity to understand the coordinate and combinatorial alternative splicing regulation. However, existing statistical and computational methods are still lagging behind the advanced technologies. The long-term goal is to develop statistical and computational methods to discover principles of alternative splicing regulation in multicellular eukaryotes and explore how regulated splicing contributes to phenotypic complexity. The objective in this particular application is to develop statistically sound methods with computationally efficient algorithms to study alternative splicing and its regulation at both individual and network levels based on deep sequencing data. We will apply our proposed methods to study rat embryonic stem cell differentiation and self-renewal.
The specific aims of this proposal include: (1) Develop novel statistical methods to accurately quantify and compare transcriptome complexity based on RNA-seq. (2) Develop novel statistical tools to identify alternative splicing regulatory elements. (3) Develop novel statistical methods to reconstruct splicing regulatory networks. (4) Applications to rat stem cells and development of user-friendly software. Under the first aim, the proposed novel statistical methods will explicitly address the issue of positional bias inherent in RNA-seq to accurately quantify and compare transcriptomes at both the gene level and the transcript isoform level. Under the second aim, different evidence sources will be integrated to distinguish cis regulatory elements for alternative splicing from the false sites matching the motifs by chance. In the third aim, an efficient algorithm will be developed to reduce the model search space to reconstruct splicing regulatory networks. Multiple types of genomic data will be combined to infer regulation relationships. For the applications, this will be the first time to characterize rat embryonic stem cell transcriptomes and infer alternative splicing regulation during their self-renewal and differentiation toward neurons. The proposed methods are innovative. They meet the challenges arisen from the analysis of high- throughput sequencing data, and they fully utilize and integrate multiple types of omics data. The proposed research is significant, because it is expected to advance our understanding of alternative splicing regulation especially in rat embryonic stem cells, and contribute to deciphering the splicing code. Ultimately, such knowledge has the potential to inform the development of preventive and therapeutic interventions for splicing- related diseases, and pave the way for regenerative medicine.

Public Health Relevance

The proposed research is relevant to public health because the proposed statistical and computational methods will lead to the discovery of alternative splicing regulation especially in rat embryonic stem cells, which is ultimately expected to increase the understanding of cell fate determination and the pathogenesis of splicing-related diseases. The resultant discoveries will shed light on regenerative medicine and therapeutic treatment of human diseases. Thus, the proposed research is relevant to the part of NIH's mission in pursuit of fundamental knowledge that will help to prevent and cure of human diseases.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Lyster, Peter
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Southern California
Schools of Arts and Sciences
Los Angeles
United States
Zip Code