Rapid accumulation of genomic sequences has increased the demand for methods to decipher the genetic information gathered in data banks such as GenBank. While many methods for a thorough micro-analysis of small sequences have been developed in the past, there is still a shortage of powerful procedures for macro-analyses of large DNA sequences. Combining statistical analysis and modern computer power makes it feasible to search with high speed for diagnostic patterns within long sequences and to evaluate similarities and differences between them in order to recover much of the biochemical information hidden in these organic molecules. The objectives of the proposed study are to develop novel methods of computerized statistical analysis and simultaneously apply them to analyze available large genomic sequences, as the genomes of the medically important herpesvirus family. The methods aim at the identification of biologically active sites (e.g. origins replication, and nucleosome positioning signals). The anticipated approach is based on an automated scanning of long DNA letter sequences and to find similarities or differences between sequences, e.g., by using dinucleotide distance measures. Probabilistic criteria (as r-scan and spectral envelopes) will be applied to all detected sites of possible biological interest to sort out the most likely candidates among them. This project seeks close collaboration between experimental and theoretical scientists, namely virologists and mathematicians. The interdisciplinary approach ensures that the theoretical work is linked to current biomedical problems and that newly developed methods are immediately applied.
Showing the most recent 10 out of 99 publications