This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. The Human Genome Project is generating a huge datastream of DNA sequences for which computational tools are needed for analysis. An examination of mRNA sequences in GenBank has revealed that calculated folding of mRNA sequences is more stable than expected by chance. Free energy minimization calculations of native mRNA sequences are more negative than mononucleotide randomized mRNA sequences of the same composition. This implies that gene sequences are globally optimized for folding by local selection of nucleotide bases. Genes can then be classified according to whether they are more, less or the same in calculated folding free energy compared to randomized sequences. A survey of thirty oncogene sequences has revealed an even stronger bias for mRNA secondary structures than found in typical genes. In addition, a correlation has been found between SAGE yeast expression levels and the mRNA folding bias. Highly expressed genes were found to possess a larger folding bias than single copy mRNAs. This suggests that oncogenes may possess more secondary structure for the purpose of gene regulation or enhancement of mRNA levels. These results will be applied to analyze Human transcriptome SAGE data, with a focus on oncogenes and genes involved in cancer progression. This proposed work will correlate human SAGE transcriptome expression levels with mRNA folding stability bias. This proposed work will assist the understanding of human gene expression, and would be of biomedical interest for 1) antisense gene therapy concerning mRNA folding stability and 2) provide computational tools to analyze and characterize oncogene mRNA sequences in GenBank and transcriptome SAGE data.
Showing the most recent 10 out of 99 publications