The genetic code not only determines protein amino acid residue sequence but also defines the 'splicing code' of cis- and trans-acting regulatory elements that control pre-mRNA splicing. Single nucleotide variant (SNV) changes at key regions in pre-mRNA may disrupt splicing resulting in disease [1, 2]. Understanding which SNVs cause aberrant splicing and which are benign is important for understanding disease pathogenesis. SNVs at consensus splice sites, at exon-intron junctions, are known to cause aberrant splicing and contribute to at least 10% of inherited diseases [2]. However, SNVs outside consensus splice sites can still disrupt splicing [3]. Current, bioinformatics tools limit analysis to SNVs at or near consensus splice sites and lack the ability to generalize to SNVs beyond the consensus splice site [4-7]. In this application, I propose to substantially improve the ability to interpret the consequences of mutations on pre-mRNA splicing. This goal will be achieved by: 1) developing novel features, useful in predicting the impact of variation on cis- splicing regulation; 2) training a supervised machine learning algorithm that uses the novel features to predict the impact of SNVs; 3) sharing the algorithm in a publically available software package; and 4) comparing algorithm predictions to the relationships between SNVs and splicing patterns derived from matched DNA- and RNA-sequencing studies.

Public Health Relevance

Genetic sequences not only encode the amino acids of proteins but also regulate many critical biological functions, including pre-mRNA splicing. The impact of genetic variation on splicing is not well understood. The goal of this research project i to computationally identify features of variants useful in predicting aberrant splicing, then incorporate the features into a machine learning algorithm and test the utility of the predictions using publically available sequencing studies. 1

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Predoctoral Individual National Research Service Award (F31)
Project #
1F31HG007804-01A1
Application #
8833507
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Gatlin, Christine L
Project Start
2015-02-01
Project End
2018-01-31
Budget Start
2015-02-01
Budget End
2016-01-31
Support Year
1
Fiscal Year
2015
Total Cost
Indirect Cost
Name
Johns Hopkins University
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
001910777
City
Baltimore
State
MD
Country
United States
Zip Code
21205
Douville, Christopher; Springer, Simeon; Kinde, Isaac et al. (2018) Detection of aneuploidy in patients with cancer through amplification of long interspersed nucleotide elements (LINEs). Proc Natl Acad Sci U S A 115:1871-1876