The goal of this research program is to develop methods and tools to analyze large heterogeneous RNA-seq data sets to better understand RNA splicing. The vast majority of human genes are alternatively spliced and variation in splicing has been shown to be associated with complex disease risk. Despite the wide spread adoption of affordable high throughput sequencing, variation in RNA splicing has remained understudied due to the limitations of short read sequencing data and the computational challenges associated with accurate transcript-level quantification of gene expression. We propose to develop methods to improve the detection, quantification, and visualization of complex splicing events. We will further develop methods to identify genetic variants associated with complex splicing variation and to characterize the mechanisms by which splicing variation affects complex traits. Importantly, the variations and mechanisms predicted by our methods will be replicated in independent cohorts and experimentally validated using orthogonal methods. The computational methods and software we will develop will be applied both to publicly available data and data generated by our groups. We propose to leverage not only our expertise but also our existing code base and tools. The tools will support both standalone and cloud based execution for scaling up analysis, and will integrate with existing tools for downstream analysis.

Public Health Relevance

The proposed research aims to create tools that enable researchers to understand regulatory mechanisms controlling gene processing at the RNA stage, across many human tissues and cell types. These tools will combine many types of experimental data and integrate with other tools to predict changes in gene processing under conditions such as a specific tissue type, disease state, or a person?s genetic variations. Immediate applications of this work include identifying harmful mutations in patients, changes in key genes that control RNA processing of other genes, and finding causes for complex diseases with a highly heritable component.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM128096-02
Application #
9688238
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2018-05-01
Project End
2022-01-31
Budget Start
2019-02-01
Budget End
2020-01-31
Support Year
2
Fiscal Year
2019
Total Cost
Indirect Cost
Name
University of Pennsylvania
Department
Genetics
Type
Schools of Medicine
DUNS #
042250712
City
Philadelphia
State
PA
Country
United States
Zip Code
19104
Norton, Scott S; Vaquero-Garcia, Jorge; Lahens, Nicholas F et al. (2018) Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates. Bioinformatics 34:1488-1497