Single-nucleotide resolution mapping of allelic protein-RNA interactions and splicing-regulatory variants Project Summary A long-standing hypothesis in human genetics is that many genetic variations affect human traits, evolution and predisposition to disease by modulating different steps of gene expression regulation. This hypothesis has gained substantial support from two directions. First, a vast majority of disease-associated SNPs identified through genome-wide association studies (GWAS) so far are located in the noncoding regions. Second, genetic variations affecting different steps of gene expression, or expression quantitative trait loci (eQTLs), are widespread and enriched in GWAS signals. However, an important bottleneck in this field is that current analyses mostly rely on ?guilt by association? and there remains a lack of effective computational methods and software tools to determine the underlying causative variants affecting the gene expression cascade and high- level traits. This is particularly true for software tools designed for analysis of variations affecting post- transcriptional regulation. To fill this gap, we will develop statistical models and computational tools to identify causal genetic variants affecting RNA splicing, or splicing-regulatory variants (sRVs), which are recently shown to be prevalent in the human genome.
In Aim 1, we will develop innovative analysis methods to map, at single- nucleotide resolution, protein-RNA interactions with allele-specific binding affinity.
In Aim2, we will develop an integrative modeling strategy to combine multiple modalities of data, including allelic protein-RNA interactions and splicing QTLs (sQTLs), to pinpoint sRVs with high confidence. To evaluate the effectiveness of the proposed methods, we will apply them to large datasets to map sRVs in normal and disease human tissues.
In Aim 3, we will describe our efforts to develop user-friendly software packages, web-based interface and detailed documentation to maximize the utility of these tools by the research community. If successful, this study will produce computational tools that will enable mapping of causal sRVs with unprecedented precision. These data and software tools will provide a valuable resource to better understand functional protein-RNA interactions, elucidate their relationships to genetic variations in human populations, and identify potential therapeutic targets of genetic diseases.

Public Health Relevance

Alternative splicing is critical for expanding the complexity of genetic information encoded in the mammalian genome. It is a highly regulated process that, when disrupted, can give rise to aberrant transcript variants in human diseases. This study will develop computational methods to identify genetic variations that directly affect splicing, which will not only provide important insights into the mechanisms of splicing regulation, but also facilitate the development of potential therapeutic strategies for human diseases.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Columbia University (N.Y.)
Schools of Medicine
New York
United States
Zip Code
Luo, Weijun; Zhang, Chaolin; Jiang, Yong-Hui et al. (2018) Systematic reconstruction of autism biology from massive genetic mutation profiles. Sci Adv 4:e1701799
Ustianenko, Dmytro; Chiu, Hua-Sheng; Treiber, Thomas et al. (2018) LIN28 Selectively Modulates a Subclass of Let-7 MicroRNAs. Mol Cell 71:271-283.e5
Zhang, Chaolin; Shen, Yufeng (2017) A Cell Type-Specific Expression Signature Predicts Haploinsufficient Autism-Susceptibility Genes. Hum Mutat 38:204-215