"This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5)."
Boston College has received an award to study non-coding functionality within the eukaryotic coding regions. Most of the regulatory sequences in eukaryotic genomes are still not understood. In particular, regulatory sequences within coding regions have been characterized only rudimentarily, despite a growing body of evidence showing their critical functional importance. Such sequences could have potentially massive relevance for gene expression, as they are well-positioned to influence either post-transcriptional regulation of mRNA or transcriptional initiation from DNA. Some of the diverse functions influenced by such sequences may include splicing, microRNA targeting, RNA-protein binding, and DNA-transcription factor binding. This project will develop novel computational algorithms to identify motifs and sequence blocks in coding nucleotide sequence with functions distinct from the encoded peptide sequence. This research will be coordinated with two experimental approaches to examine the functions of the strongest such noncoding signals at the RNA and DNA levels. These studies build off of previous molecular evolutionary investigations in which this laboratory has characterized mutational and selective pressures on synonymous sites throughout the mammalian and fungal phylogenies. The research outcomes will include: 1. The production of novel algorithms and software for the detection of sequence motifs and blocks with noncoding functionality in coding sequences, 2. Computational prediction of binding motifs from RNA-protein binding data covering the human genome, followed by high resolution experimental validation of individual sites, and 3. Computational analysis and experimental assays of the transcriptional regulatory effects of vertebrate highly conserved coding regions. By closely integrating algorithm development, data analysis, and experiments, these projects will substantially increase the understanding of the prevalence and evolution of functions contained in the nucleotide sequences of coding regions.
User-friendly software implementations of algorithms will be produced, providing tools for researchers to identify functional sequences in coding regions based on rigorous decoupling of protein-level from nucleotide-level effects. Software will be applicable to sequence data from any phylogeny and will be made publicly available through open-source code and a web server. A unique broader impact will be the PI?s leading of a series of annual science writing workshops for high school students, drawing on the PI?s experience in science journalism. These workshops will culminate in the students? production of a genomics-focused issue of the Greentimes science newsletter (circulation > 30000). Graduate students and undergraduates with biology and computing backgrounds will work together, providing them training in collaborative interdisciplinary research. Additional information about the project is available at http://bioinformatics.bc.edu/chuanglab.