The long-term goal of this project is to map the determinants of the human transcriptome and their effect in neurological disease. Over 90% of human genes are alternatively spliced, with tightly regulated changes in exon inclusion observed across many tissues such as brain and muscle. Splicing is associated with numerous diseases with an estimated 15 to 50 percent of human disease mutations affecting splice-site selection. Commonly, mutations occur in non-coding regions but disease studies cannot assign these function. Currently separate studies measure exon inclusion levels, binding sites of splicing regulators, and genomic variations. To fully exploit these data there is a need for methods that (a) integrate these data to identify underlying regulatory mechanisms, and (b) predict the consequences of sequence change, especially in regulatory elements in non- coding regions. To address these needs, in Phase 1 of this project we will create a human and mouse model for tissue-dependent splicing model, focusing on the central nervous system (CNS). The model will combine the abovementioned data sources to predict splicing outcome from genomic sequence and assess in silico the effect on splicing of small nucleotide variations (SNV). In Phase 2 of the project, we will collaborate with Dr. Kristen Lynch and perform elaborate biochemical experiments to validate novel regulatory mechanisms identified by the model of Phase 1, focusing on the CNS and genes involved in age-related neurological disease. Specifically, we will predict and validate disease-associated targets of two key RNA-binding proteins with CNS/disease function, TDP-43 and QKI. In Phase 3 we will collaborate with Dr. Alice Chen-Plotkin and apply Phase 1 splicing model and Phase 2 experimental validation to the study of frontaltemporal lombar degeneration (FTLD-TDP) where TDP-43 plays a key role. First, we will use the disjoint TDP-43 genomic datasets already available to produce a """"""""TDP-43 centered"""""""" splicing code model that addresses key questions about TDP-43's function in disease and normal tissues. Regulatory hypotheses from the model will be tested using Phase 2 methods. Next, we will apply the TDP-43 centered code to assess the effect on splicing of genetic variations found in a cohort of 512 FTLD-TPD patients. Genetic variations predicted to effect splicing, and enriched in FTLD-TDP patients compared to a cohort of over 1000 controls will be verified using mini-gene reporter assays and/or RNA from matching patients'brain samples available in Dr. Chen-Plotkin's lab. Overall, the research proposed in this grant will create a necessary and unique framework to elucidate the determinants of FTLD-TDP and human transcriptome complexity.

Public Health Relevance

The proposed research aims to discover regulatory mechanisms controlling gene processing at the RNA stage, with a focus on the central nervous system (CNS) and frontaltemporal lombar degeneration (FTLD-TDP). It will give researchers new tools to predict changes in gene processing under conditions such as a specific tissue type, disease state, or a person's genetic variations. Immediate applications of this work include improve estimates for disease susceptibility and finding causes for complex diseases with a highly heritable component.

National Institute of Health (NIH)
National Institute on Aging (NIA)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Petanceska, Suzana
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pennsylvania
Schools of Medicine
United States
Zip Code
Norton, Scott S; Vaquero-Garcia, Jorge; Lahens, Nicholas F et al. (2018) Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates. Bioinformatics 34:1488-1497
Black, Kathryn L; Naqvi, Ammar S; Asnani, Mukta et al. (2018) Aberrant splicing in B-cell acute lymphoblastic leukemia. Nucleic Acids Res :
Shinde, Mansi Y; Sidoli, Simone; Kulej, Katarzyna et al. (2017) Phosphoproteomics reveals that glycogen synthase kinase-3 phosphorylates multiple splicing factors and is associated with alternative splicing. J Biol Chem 292:18240-18255
Jha, Anupama; Gazzara, Matthew R; Barash, Yoseph (2017) Integrative deep models for alternative splicing. Bioinformatics 33:i274-i282
Gazzara, Matthew R; Mallory, Michael J; Roytenberg, Renat et al. (2017) Ancient antagonism between CELF and RBFOX families tunes mRNA splicing outcomes. Genome Res 27:1360-1370
Green, Christopher J; Gazzara, Matthew R; Barash, Yoseph (2017) MAJIQ-SPEL: Web-tool to interrogate classical and complex splicing variations from RNA-Seq data. Bioinformatics :
Brady, Lauren K; Wang, Hejia; Radens, Caleb M et al. (2017) Transcriptome analysis of hypoxic cancer cells uncovers intron retention in EIF2B5 as a mechanism to inhibit translation. PLoS Biol 15:e2002623
Vaquero-Garcia, Jorge; Lalonde, Emilie; Ewens, Kathryn G et al. (2017) PRiMeUM: A Model for Predicting Risk of Metastasis in Uveal Melanoma. Invest Ophthalmol Vis Sci 58:4096-4105
Rohacek, Alex M; Bebee, Thomas W; Tilton, Richard K et al. (2017) ESRP1 Mutations Cause Hearing Loss due to Defects in Alternative Splicing that Disrupt Cochlear Development. Dev Cell 43:318-331.e5
Ehrmann, Ingrid; Gazzara, Matthew R; Pagliarini, Vittoria et al. (2016) A SLM2 Feedback Pathway Controls Cortical Network Activity and Mouse Behavior. Cell Rep 17:3269-3280

Showing the most recent 10 out of 13 publications