This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Reconstruction of early evolutionary events leading to appearance of spiceosomal introns in the last eukaryotic ancestor. Background: Spliceosomal complex is an intricate machinery which involves over 150 components of RNA and proteins; it is involved in the removal of non-coding stretches of sequences that often occur within coding regions of proteins. Emergence of the spliceosome in early eukaryotes is of intense interest the primary questions are how and when are still under intense debate. Since most of eukaryotic genes are interrupted by multiple introns (majority of the human genes has several introns each; these are frequently involved in the regulation of gene expression), the understanding the early evolution of introns and the machinery involved in their removal is of great importance. We are investigating a unique set of Sm and lsm proteins which are basal components of spliceosomal machinery. The proteins come from a multi-gene family in eukaryotes, but are singe copy genes within Archaea and Bacteria (these do not have spliceosomal machinery and thus this protein performs different function(s)). Among the questions we are investigating: (1) is it possible that archaeal and bacterial genes gave rise to eukaryotic multi-gene family; (2) can we connect archaeal and/or bacterial Sm/lsm gene to a specific eukaryotic gene(s); (3) what was the sequence of events during paralogous duplication of Sm genes in early eukaryotes; (4) how fast was the paralogous duplication? Many of these questions require construction of phylogenetic trees. Reasons for request: We are in the process of constructing phylogenetic trees for a large set of eukaryotic taxonomic families as well as representative of many of prokaryotic taxonomic . Such trees will give us information about ancestral relationships among eukaryotic genes and even more importantly as to where prokaryotic Sm/lsm proteins fit with respect to eukaryotic paralogous set. We are using a sensitive method for constructing phylogenetic trees, which is Bayesian inference method which is implemented in the MrBayes 3 phylogeny package (http://mrbayes.csit.fsu.edu/index.php). The drawback of this method is its notoriously long computational time. We are in disadvantage because we are dealing with many different organisms and a large multi-gene family; thus our dataset includes over 200 distinct sequences. Additionally due to relatively high level of similarity among sequences, we need to run the program for several million steps, which becomes an impossibly long task on a single machine. Right now each run involves 1-2 months on Dell Precision 670 machine. MrBayes has a parallel version of the code and which permits a significant speed-up and which hope to use, if granted TeraGrid allocation.
Showing the most recent 10 out of 292 publications