RNA-protein binding is critical to gene regulation, controlling fundamental processes including RNA splicing, translation, localization and stability. RNA-protein interactions play a role in a wide variety of diseases including muscular dystrophy, fragile X syndrome, mental retardation, Prader-Willi syndrome, retinitis pigmentosa, spinal muscular atrophy, and cancer. Advances towards understanding the underlying mechanisms of RNA-protein interaction have great value for improvement of human health. While most studies of gene regulation have focused on DNA- protein interactions, the mechanisms by which RNA-binding proteins (RBPs) act remain enigmatic. However, new high-throughput measurements will soon yield vast amounts of RNA- protein interaction data. These could dramatically clarify RNA regulation, but also demand novel analytical approaches. To meet this challenge, we propose novel computational methods to determine sequences and structures critical to RNA-protein binding, based on hundreds of CLIP-seq and SELEX datasets now being generated for the human ENCODE project. Current methods for modeling RNA-protein interactions have low predictive power, which we hypothesize is due, mainly, to two different issues: (a) they ignore combinatorial binding of multiple elements within each RNA to the protein, and (b) they only account for structure in a superficial manner. We will solve these problems by developing innovative methods that use complementary CLIP-seq and SELEX data to: determine the different classes of RNA elements binding each RBP;learn the combinatorial logic among classes;and learn the sequences and structures that define each class. We will additionally validate our methods for at least two RBPs using RNA-protein gel shift experiments. We expect that this exploratory study will yield powerful, experimentally validated software tools to determine combinatorial and structural aspects of RNA-protein binding from high-throughput sequencing data. !
Gene regulation at the RNA-level is central to many human diseases. However, our understanding of RNA regulation, especially the mechanisms by which proteins bind to RNAs, is rudimentary, despite the fact that large datasets on these mechanisms are now being generated. Our combination of computational and experimental approaches will help decipher mechanisms from these datasets and provide new insights into how human disease can be ameliorated by targeting of RNA-level processes.