Expression of the full complement of 20,000+ human genes requires splicing of an average of 8-10 introns per mRNA, and most human genes produce multiple distinct mRNA and protein isoforms through alternative splicing. Each of the ~200,000+ introns in the genome contains 3 specific sequence sites - the donor or 5'splice site, the acceptor or 3'splice site and the branch point - that are absolutely required because they participate in the chemistry of splicing. The branch point is a specific nucleotide (usually adenosine) that participates in the first catalytic step of splicing, generating the unique """"""""lariat intron structure that is released in the second step of splicing. Mutation of the branch site frequently results in exon skipping, intron retention or other perturbation of normal splicing, which can result in production of truncated or aberrant proteins, and sometimes leads to disease. However, branch points have been mapped for only several dozen human introns. Here, we propose to develop a technology to map RNA branch points on a large scale, using model organisms to test and optimize the method, followed by application of the optimized procedure to map branch points genome-wide in human and mouse. Our proposal is organized around the following specific aims: SA1. Develop a protocol for large-scale identification of branch points and associated mapping software and apply to model organisms (yeast, fly, or worm). SA2. Optimize and apply protocols and software from SA1 to mammalian systems to achieve large- scale identification of branch points in the human and mouse genomes. We have designed two molecular biology protocols that when coupled with second-generation sequencing and associated software pipelines have the potential to identify branch points on a genome-wide scale. Development of this technology and application to the worm, fly, human and mouse genomes has the potential to contribute a critical """"""""missing piece"""""""" in our understanding of RNA splice codes in these organisms, and will enable improved prediction of mutations or other genetic variations that perturb splicing and gene expression by interfering with branch point function.
This project seeks to develop a technology for genome-wide mapping of RNA branch points, which are genomic features that are required for the proper expression of nearly every human gene. Large-scale mapping of branch points will lead to deeper understanding of the mechanisms involved in gene expression, and will enable improved predictions of mutations and other genetic variations that contribute to human disease by disrupting the function of RNA branch points.