Comparative Cross-Species Genomic Analysis System

Kasif, Simon

Abstract

Whole genome sequencing creates numerous opportunities for comparative analysis of different organisms elucidating the molds of conservation as well as patterns of divergence that lead to species diversification, robustness, fitness, and taxonomical organization. In particular, selective evolutionary forces create variable rate of conservation on different functional sites thereby producing distinctive comparative signatures in different genomic regions. These signatures can be exploited by computational methods for an improved detection of functionally important regions such as protein-coding exons, RNA genes, promoters, 3'UTR regions and other yet unexpected features. The exact identification of genes in the Human Genome remains a challenge as the number of predicted genes was significantly lower than previous estimates indicated, and the actual predictions appear to disagree tremendously and vary dramatically based on the specific gene finding methodology deployed. Since the pattern of conservation in different functional regions of the genome, a comparative computational analysis can lead, in principle, to a significantly improved computational identification of genes in the Human genome by using a reference genome such as mouse genome. However, this comparative methodology critically depend on three important factors: 1) The selection of comparative features that provide the most accurate signatures that can be used in comparative gene recognition? 2) The most appropriate selection of the reference genome at the right evolutionary distance from the Human genome to provide sufficiently distinctive patterns conservation in different regions to aid better gene recognition? 3) The selection of the specific gene recognition architecture that is most effective in interpreting the comparative signatures? In this proposal we develop a general computational framework for comparative analysis of genomic sequences focusing on achieving a substantial improvement in gene recognition accuracy. We propose a specific architecture for a comparative computational gene recognition system based on evidence integration frameworks. Based on this architecture we propose to develop a modular and highly portable system for comparative sequence analysis that we plan to use for mouse-human sequence analysis as well as new related genomes soon to be sequenced including generating an improved annotation of the Drosophila sequence using related genomes. ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Exploratory/Developmental Grants Phase II (R33)
Project #: 5R33HG002850-03
Application #: 7120158
Study Section: Special Emphasis Panel (ZRG1-SSS-Y (11))
Program Officer: Good, Peter J

Project Start: 2004-09-24
Project End: 2009-08-31
Budget Start: 2006-09-01
Budget End: 2009-08-31
Support Year: 3
Fiscal Year: 2006
Total Cost: $341,773
Indirect Cost

Institution

Name: Boston University
Department: Engineering (All Types)
Type: Schools of Engineering
DUNS #: 049435266

City: Boston
State: MA
Country: United States
Zip Code: 02215

Related projects


NIH 2006 R33 HG	Comparative Cross-Species Genomic Analysis System Kasif, Simon / Boston University	$341,773
NIH 2005 R33 HG	Comparative Cross-Species Genomic Analysis System Kasif, Simon / Boston University	$350,000
NIH 2004 R33 HG	Comparative Cross-Species Genomic Analysis System Kasif, Simon / Boston University	$350,000

Publications

Dotan-Cohen, Dikla; Letovsky, Stan; Melkman, Avraham A et al. (2009) Biological process linkage networks. PLoS One 4:e5313

Molla, Michael; Delcher, Arthur; Sunyaev, Shamil et al. (2009) Triplet repeat length bias and variation in the human transcriptome. Proc Natl Acad Sci U S A 106:17095-100

Dotan-Cohen, Dikla; Melkman, Avraham A; Kasif, Simon (2007) Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics 23:3335-42

Zhang, Lingang; Kasif, Simon; Cantor, And Charles R (2007) Quantifying DNA-protein binding specificities by using oligonucleotide mass tags and mass spectroscopy. Proc Natl Acad Sci U S A 104:3061-6

Alon, Noga; Asodi, Vera; Cantor, Charles et al. (2006) Multi-node graphs: a framework for multiplexed biological assays. J Comput Biol 13:1659-72

Rachlin, John; Cohen, Dikla Dotan; Cantor, Charles et al. (2006) Biological context networks: a mosaic view of the interactome. Mol Syst Biol 2:66

Zheng, Yu; Anton, Brian P; Roberts, Richard J et al. (2005) Phylogenetic detection of conserved gene clusters in microbial genomes. BMC Bioinformatics 6:243

Wu, Chang-Jiun; Kasif, Simon (2005) GEMS: a web server for biclustering analysis of expression data. Nucleic Acids Res 33:W596-9

Rachlin, John; Ding, Chunming; Cantor, Charles et al. (2005) MuPlex: multi-objective multiplex PCR assay design. Nucleic Acids Res 33:W544-7

Lee, Soohyun; Kohane, Isaac; Kasif, Simon (2005) Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes. BMC Genomics 6:168

Showing the most recent 10 out of 11 publications

Comments

Be the first to comment on Simon Kasif's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: