Whole genome sequencing creates numerous opportunities for comparative analysis of different organisms elucidating the molds of conservation as well as patterns of divergence that lead to species diversification, robustness, fitness, and taxonomical organization. In particular, selective evolutionary forces create variable rate of conservation on different functional sites thereby producing distinctive comparative signatures in different genomic regions. These signatures can be exploited by computational methods for an improved detection of functionally important regions such as protein-coding exons, RNA genes, promoters, 3'UTR regions and other yet unexpected features. The exact identification of genes in the Human Genome remains a challenge as the number of predicted genes was significantly lower than previous estimates indicated, and the actual predictions appear to disagree tremendously and vary dramatically based on the specific gene finding methodology deployed. Since the pattern of conservation in different functional regions of the genome, a comparative computational analysis can lead, in principle, to a significantly improved computational identification of genes in the Human genome by using a reference genome such as mouse genome. However, this comparative methodology critically depend on three important factors: 1) The selection of comparative features that provide the most accurate signatures that can be used in comparative gene recognition? 2) The most appropriate selection of the reference genome at the right evolutionary distance from the Human genome to provide sufficiently distinctive patterns conservation in different regions to aid better gene recognition? 3) The selection of the specific gene recognition architecture that is most effective in interpreting the comparative signatures? In this proposal we develop a general computational framework for comparative analysis of genomic sequences focusing on achieving a substantial improvement in gene recognition accuracy. We propose a specific architecture for a comparative computational gene recognition system based on evidence integration frameworks. Based on this architecture we propose to develop a modular and highly portable system for comparative sequence analysis that we plan to use for mouse-human sequence analysis as well as new related genomes soon to be sequenced including generating an improved annotation of the Drosophila sequence using related genomes. ? ?
Dotan-Cohen, Dikla; Letovsky, Stan; Melkman, Avraham A et al. (2009) Biological process linkage networks. PLoS One 4:e5313 |
Molla, Michael; Delcher, Arthur; Sunyaev, Shamil et al. (2009) Triplet repeat length bias and variation in the human transcriptome. Proc Natl Acad Sci U S A 106:17095-100 |
Dotan-Cohen, Dikla; Melkman, Avraham A; Kasif, Simon (2007) Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics 23:3335-42 |
Zhang, Lingang; Kasif, Simon; Cantor, And Charles R (2007) Quantifying DNA-protein binding specificities by using oligonucleotide mass tags and mass spectroscopy. Proc Natl Acad Sci U S A 104:3061-6 |
Alon, Noga; Asodi, Vera; Cantor, Charles et al. (2006) Multi-node graphs: a framework for multiplexed biological assays. J Comput Biol 13:1659-72 |
Rachlin, John; Cohen, Dikla Dotan; Cantor, Charles et al. (2006) Biological context networks: a mosaic view of the interactome. Mol Syst Biol 2:66 |
Lee, Soohyun; Kohane, Isaac; Kasif, Simon (2005) Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics 6:168 |
Rachlin, John; Ding, Chunming; Cantor, Charles et al. (2005) Computational tradeoffs in multiplex PCR assay design for SNP genotyping. BMC Genomics 6:102 |
Zheng, Yu; Anton, Brian P; Roberts, Richard J et al. (2005) Phylogenetic detection of conserved gene clusters in microbial genomes. BMC Bioinformatics 6:243 |
Wu, Chang-Jiun; Kasif, Simon (2005) GEMS: a web server for biclustering analysis of expression data. Nucleic Acids Res 33:W596-9 |
Showing the most recent 10 out of 11 publications