The human genome encodes hundreds of proteins that contain RNA-binding domains, most of which are poorly-characterized, and genomic analyses indicate widespread use of post- transcriptional gene regulation: there is high sequence conservation in 5'and 3'untranslated regions (UTRs), alternative splicing is prevalent, and there are many individual examples of subcellular transcript localization, differential regulation of translation, and regulation of transcript decay, often in a disease-relevant context. A key aspect of understanding human gene regulation will be to map post-transcriptional regulatory networks, and an essential step in mapping these networks is to obtain an accurate description of the RNA-binding activity of all of the RNA-binding proteins. We have developed a method called RNAcompete which measures, using a single binding reaction, the relative preference of an RNA-binding protein to hundreds of thousands of small RNAs (27-35 nt long) specially designed to encompass a broad range of primary sequences and secondary structures. In addition to being rapid and systematic, RNAcompete produces descriptions of binding activity that are generally superior to conventional motif models. Here, we propose to use RNAcompete to obtain a complete index of RNA-binding activities for all known and predicted human RNA-binding proteins.
Our Specific Aims are: (1) Application of the current array-based RNAcompete method to all 294 human RNA- binding proteins and all of their 470 individual RNA-binding domains. (2) Further development of the RNAcompete methodology to create more complex pools and use next-generation sequencing as an output, to facilitate more detailed analysis of proteins that have multiple RNA-binding domains, and, eventually, complexes of RNA-binding proteins. (3) Creation of a database of RNA-binding profiles, both compiled from the literature and produced by our analyses in Aims 1 and 2. A component of this aim will be to explore models for RNA-binding activities, in order to provide the most accurate predictions of potential binding sites in cellular RNAs. (4) Analysis of the determinants of sequence and structure recognition for the large RRM and KH domain classes. Deciphering (or refuting) the existence of a mapping between amino- acid sequence features of these prevalent RNA-binding domains and the types of RNAs they bind will be important for understanding their function, and also in determining how knowledge of the binding preferences can be transferred across species and among different proteins.

Public Health Relevance

Virtually all human genes produce RNA, and many genes are controlled by proteins that bind to the RNA. We propose to use a new method we have developed in order to obtain a complete index of RNA-protein interactions. This work should ultimately identify regulatory mechanisms that control both normal health and disease.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
5R01HG005700-02
Application #
8075668
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Pazin, Michael J
Project Start
2010-05-26
Project End
2013-03-31
Budget Start
2011-04-01
Budget End
2012-03-31
Support Year
2
Fiscal Year
2011
Total Cost
$288,912
Indirect Cost
Name
University of Toronto
Department
Type
DUNS #
259999779
City
Toronto
State
ON
Country
Canada
Zip Code
M5 1-S8
Ray, Debashish; Kazan, Hilal; Cook, Kate B et al. (2013) A compendium of RNA-binding motifs for decoding gene regulation. Nature 499:172-7
Cook, Kate B; Kazan, Hilal; Zuberi, Khalid et al. (2011) RBPDB: a database of RNA-binding specificities. Nucleic Acids Res 39:D301-8