The goal of this research program is to develop methods and tools to analyze large heterogeneous RNA-seq data sets to better understand RNA splicing. The vast majority of human genes are alternatively spliced and variation in splicing has been shown to be associated with complex disease risk. Despite the wide spread adoption of affordable high throughput sequencing, variation in RNA splicing has remained understudied due to the limitations of short read sequencing data and the computational challenges associated with accurate transcript-level quantification of gene expression. We propose to develop methods to improve the detection, quantification, and visualization of complex splicing events. We will further develop methods to identify genetic variants associated with complex splicing variation and to characterize the mechanisms by which splicing variation affects complex traits. Importantly, the variations and mechanisms predicted by our methods will be replicated in independent cohorts and experimentally validated using orthogonal methods. The computational methods and software we will develop will be applied both to publicly available data and data generated by our groups. We propose to leverage not only our expertise but also our existing code base and tools. The tools will support both standalone and cloud based execution for scaling up analysis, and will integrate with existing tools for downstream analysis.
The proposed research aims to create tools that enable researchers to understand regulatory mechanisms controlling gene processing at the RNA stage, across many human tissues and cell types. These tools will combine many types of experimental data and integrate with other tools to predict changes in gene processing under conditions such as a specific tissue type, disease state, or a person?s genetic variations. Immediate applications of this work include identifying harmful mutations in patients, changes in key genes that control RNA processing of other genes, and finding causes for complex diseases with a highly heritable component.
|Norton, Scott S; Vaquero-Garcia, Jorge; Lahens, Nicholas F et al. (2018) Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates. Bioinformatics 34:1488-1497|