Almost all protein coding genes of mammalian organisms have a split structure with several exons and introns. Intronic sequences are removed from the primary transcript by the nuclear pre-mRNA splicing machinery. Often, several functional variants of one transcript can be generated by alternative splicing (AS), ileading to several protein isoforms of a single gene. We propose a pair hidden Markov model (PHMM) to identify conserved AS events, which have so far gone undetected by the standard approach of aligning cDNAs and expressed sequence tags (ESTs) to the genomic sequence. The research in this proposal thus aims at closing the gap between comparative gene finding and gene structure identification by cDNA and EST alignments. Even though current EST libraries contain an abundance of sequences, the scope of each of these libraries is inherently limited to a particular tissue or developmental stage. We will implement efficient PHMM algorithms, and a PHMM to detect AS will align orthologous intron sequences from two species to identify conserved events of AS and intronic regulatory sequences. A few dozen promising candidates will be tested experimentally by RT-PCR. To demonstrate the validity of our PHMM approach, we will concentrate on two test sets: Genes that encode splicing factors, and genes from the ENCODE target regions. The proposed research will result in a more complete picture of alternative gene structures in the human genome. Approximately 15% of mutations which cause human disease are associated with splicing defects, and numerous studies have pointed out links between alternative splicing and cancer and neurological diseases. Our analyses will point out possible new relationships between disease genes and candidate regions and alternative or aberrant splicing.
Ohler, Uwe; Shomron, Noam; Burge, Christopher B (2005) Recognition of unknown conserved alternatively spliced exons. PLoS Comput Biol 1:113-22 |