Single-nucleotide resolution mapping of allelic protein-RNA interactions and splicing-regulatory variants Project Summary A long-standing hypothesis in human genetics is that many genetic variations affect human traits, evolution and predisposition to disease by modulating different steps of gene expression regulation. This hypothesis has gained substantial support from two directions. First, a vast majority of disease-associated SNPs identified through genome-wide association studies (GWAS) so far are located in the noncoding regions. Second, genetic variations affecting different steps of gene expression, or expression quantitative trait loci (eQTLs), are widespread and enriched in GWAS signals. However, an important bottleneck in this field is that current analyses mostly rely on ?guilt by association? and there remains a lack of effective computational methods and software tools to determine the underlying causative variants affecting the gene expression cascade and high- level traits. This is particularly true for software tools designed for analysis of variations affecting post- transcriptional regulation. To fill this gap, we will develop statistical models and computational tools to identify causal genetic variants affecting RNA splicing, or splicing-regulatory variants (sRVs), which are recently shown to be prevalent in the human genome.
In Aim 1, we will develop innovative analysis methods to map, at single- nucleotide resolution, protein-RNA interactions with allele-specific binding affinity.
In Aim2, we will develop an integrative modeling strategy to combine multiple modalities of data, including allelic protein-RNA interactions and splicing QTLs (sQTLs), to pinpoint sRVs with high confidence. To evaluate the effectiveness of the proposed methods, we will apply them to large datasets to map sRVs in normal and disease human tissues.
In Aim 3, we will describe our efforts to develop user-friendly software packages, web-based interface and detailed documentation to maximize the utility of these tools by the research community. If successful, this study will produce computational tools that will enable mapping of causal sRVs with unprecedented precision. These data and software tools will provide a valuable resource to better understand functional protein-RNA interactions, elucidate their relationships to genetic variations in human populations, and identify potential therapeutic targets of genetic diseases.
Alternative splicing is critical for expanding the complexity of genetic information encoded in the mammalian genome. It is a highly regulated process that, when disrupted, can give rise to aberrant transcript variants in human diseases. This study will develop computational methods to identify genetic variations that directly affect splicing, which will not only provide important insights into the mechanisms of splicing regulation, but also facilitate the development of potential therapeutic strategies for human diseases.