RNA-protein binding is critical to gene regulation, controlling fundamental processes including RNA splicing, translation, localization and stability. RNA-protein interactions play a role in a wide variety of diseases including muscular dystrophy, fragile X syndrome, mental retardation, Prader-Willi syndrome, retinitis pigmentosa, spinal muscular atrophy, and cancer. Advances towards understanding the underlying mechanisms of RNA-protein interaction have great value for improvement of human health. While most studies of gene regulation have focused on DNA- protein interactions, the mechanisms by which RNA-binding proteins (RBPs) act remain enigmatic. However, new high-throughput measurements will soon yield vast amounts of RNA- protein interaction data. These could dramatically clarify RNA regulation, but also demand novel analytical approaches. To meet this challenge, we propose novel computational methods to determine sequences and structures critical to RNA-protein binding, based on hundreds of CLIP-seq and SELEX datasets now being generated for the human ENCODE project. Current methods for modeling RNA-protein interactions have low predictive power, which we hypothesize is due, mainly, to two different issues: (a) they ignore combinatorial binding of multiple elements within each RNA to the protein, and (b) they only account for structure in a superficial manner. We will solve these problems by developing innovative methods that use complementary CLIP-seq and SELEX data to: determine the different classes of RNA elements binding each RBP; learn the combinatorial logic among classes; and learn the sequences and structures that define each class. We will additionally validate our methods for at least two RBPs using RNA-protein gel shift experiments. We expect that this exploratory study will yield powerful, experimentally validated software tools to determine combinatorial and structural aspects of RNA-protein binding from high-throughput sequencing data.

Public Health Relevance

Gene regulation at the RNA-level is central to many human diseases. However, our understanding of RNA regulation, especially the mechanisms by which proteins bind to RNAs, is rudimentary, despite the fact that large datasets on these mechanisms are now being generated. Our combination of computational and experimental approaches will help decipher mechanisms from these datasets and provide new insights into how human disease can be ameliorated by targeting of RNA-level processes.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Exploratory/Developmental Grants (R21)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Gilchrist, Daniel A
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Jackson Laboratory
Bar Harbor
United States
Zip Code
Gu, Tongjun; Gatti, Daniel M; Srivastava, Anuj et al. (2016) Genetic Architectures of Quantitative Variation in RNA Editing Pathways. Genetics 202:787-98
Menghi, Francesca; Inaki, Koichiro; Woo, XingYi et al. (2016) The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc Natl Acad Sci U S A 113:E2373-82
Ishimura, Ryuta; Nagy, Gabor; Dotu, Ivan et al. (2016) Activation of GCN2 kinase by ribosome stalling links translation elongation with translation initiation. Elife 5: