Single-nucleotide resolution mapping of allelic protein-RNA interactions and splicing-regulatory variants Project Summary A long-standing hypothesis in human genetics is that many genetic variations affect human traits, evolution and predisposition to disease by modulating different steps of gene expression regulation. This hypothesis has gained substantial support from two directions. First, a vast majority of disease-associated SNPs identified through genome-wide association studies (GWAS) so far are located in the noncoding regions. Second, genetic variations affecting different steps of gene expression, or expression quantitative trait loci (eQTLs), are widespread and enriched in GWAS signals. However, an important bottleneck in this field is that current analyses mostly rely on ?guilt by association? and there remains a lack of effective computational methods and software tools to determine the underlying causative variants affecting the gene expression cascade and high- level traits. This is particularly true for software tools designed for analysis of variations affecting post- transcriptional regulation. To fill this gap, we will develop statistical models and computational tools to identify causal genetic variants affecting RNA splicing, or splicing-regulatory variants (sRVs), which are recently shown to be prevalent in the human genome.
In Aim 1, we will develop innovative analysis methods to map, at single- nucleotide resolution, protein-RNA interactions with allele-specific binding affinity.
In Aim2, we will develop an integrative modeling strategy to combine multiple modalities of data, including allelic protein-RNA interactions and splicing QTLs (sQTLs), to pinpoint sRVs with high confidence. To evaluate the effectiveness of the proposed methods, we will apply them to large datasets to map sRVs in normal and disease human tissues.
In Aim 3, we will describe our efforts to develop user-friendly software packages, web-based interface and detailed documentation to maximize the utility of these tools by the research community. If successful, this study will produce computational tools that will enable mapping of causal sRVs with unprecedented precision. These data and software tools will provide a valuable resource to better understand functional protein-RNA interactions, elucidate their relationships to genetic variations in human populations, and identify potential therapeutic targets of genetic diseases.

Public Health Relevance

Alternative splicing is critical for expanding the complexity of genetic information encoded in the mammalian genome. It is a highly regulated process that, when disrupted, can give rise to aberrant transcript variants in human diseases. This study will develop computational methods to identify genetic variations that directly affect splicing, which will not only provide important insights into the mechanisms of splicing regulation, but also facilitate the development of potential therapeutic strategies for human diseases.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM124486-03
Application #
9839618
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Krasnewich, Donna M
Project Start
2018-01-01
Project End
2021-12-31
Budget Start
2020-01-01
Budget End
2020-12-31
Support Year
3
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Columbia University (N.Y.)
Department
Biochemistry
Type
Schools of Medicine
DUNS #
621889815
City
New York
State
NY
Country
United States
Zip Code
10032
Luo, Weijun; Zhang, Chaolin; Jiang, Yong-Hui et al. (2018) Systematic reconstruction of autism biology from massive genetic mutation profiles. Sci Adv 4:e1701799
Ustianenko, Dmytro; Chiu, Hua-Sheng; Treiber, Thomas et al. (2018) LIN28 Selectively Modulates a Subclass of Let-7 MicroRNAs. Mol Cell 71:271-283.e5
Zhang, Chaolin; Shen, Yufeng (2017) A Cell Type-Specific Expression Signature Predicts Haploinsufficient Autism-Susceptibility Genes. Hum Mutat 38:204-215