Almost all protein coding genes of mammalian organisms have a split structure with several exons and introns. Intronic sequences are removed from the primary transcript by the nuclear pre-mRNA splicing machinery. Often, several functional variants of one transcript can be generated by alternative splicing (AS), ileading to several protein isoforms of a single gene. We propose a pair hidden Markov model (PHMM) to identify conserved AS events, which have so far gone undetected by the standard approach of aligning cDNAs and expressed sequence tags (ESTs) to the genomic sequence. The research in this proposal thus aims at closing the gap between comparative gene finding and gene structure identification by cDNA and EST alignments. Even though current EST libraries contain an abundance of sequences, the scope of each of these libraries is inherently limited to a particular tissue or developmental stage. We will implement efficient PHMM algorithms, and a PHMM to detect AS will align orthologous intron sequences from two species to identify conserved events of AS and intronic regulatory sequences. A few dozen promising candidates will be tested experimentally by RT-PCR. To demonstrate the validity of our PHMM approach, we will concentrate on two test sets: Genes that encode splicing factors, and genes from the ENCODE target regions. The proposed research will result in a more complete picture of alternative gene structures in the human genome. Approximately 15% of mutations which cause human disease are associated with splicing defects, and numerous studies have pointed out links between alternative splicing and cancer and neurological diseases. Our analyses will point out possible new relationships between disease genes and candidate regions and alternative or aberrant splicing.

Agency
National Institute of Health (NIH)
Institute
National Library of Medicine (NLM)
Type
Small Research Grants (R03)
Project #
1R03LM008536-01
Application #
6852434
Study Section
Biomedical Library and Informatics Review Committee (BLR)
Program Officer
Ye, Jane
Project Start
2005-02-01
Project End
2007-01-31
Budget Start
2005-02-01
Budget End
2006-01-31
Support Year
1
Fiscal Year
2005
Total Cost
$80,000
Indirect Cost
Name
Massachusetts Institute of Technology
Department
Biology
Type
Schools of Arts and Sciences
DUNS #
001425594
City
Cambridge
State
MA
Country
United States
Zip Code
02139
Ohler, Uwe; Shomron, Noam; Burge, Christopher B (2005) Recognition of unknown conserved alternatively spliced exons. PLoS Comput Biol 1:113-22