The applicant's long term goal is to understand and elucidate structure and organization within DNA sequences and uncover their relationship to biological functions. The objective of this application, which is a step toward the attainment of this long term goal, is to develop techniques for elucidating occult structural features in bacterial DNA which can be used for identification and differentiation of microbial organisms, including organisms whose genome has not been completely sequenced, using short fragments of DNA. A parallel goal is to achieve investigative independence as a computational biologist. The latter goal will be accomplished through coursework and training in the laboratories of Dr. S. Hinrichs, M.D. The urgent need for rapid identification tests for biological materials has intensified because of the threat posed by bio- terrorism. Rapid identification of both the fact and the mode of attack is essential for timely therapeutic intervention. The ability to identify bacteria based on short sequences of incomplete or possibly corrupt sequences allows for hazard detection, automation, and low cost distributed sensing capability. The identification techniques will be developed using three tools; the average mutual information (AMI) profile which reflects statistical relationships between bases along the DNA sequence, a cluster analysis technique developed by the applicant and co-workers which identifies genome specific trinucleotide clustering patterns, and a parsing technique for identification of polynucleotide sequences of interest. Components of the AMI profile which possess discriminatory capabilities will be identified by decomposing the profile and analyzing the coefficients using both supervised and unsupervised classification. The clustering strategy will be refined by correlating parameters in the technique with known biological behavior. Signature trinucleotide and polynucleotide clustering patterns will be identified for organisms of interest. The different classifications will be combined into a tree structured test for a model panel of bacteria of medical interest.
|Nalbantoglu, Ozkan U; Way, Samuel F; Hinrichs, Steven H et al. (2011) RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformatics 12:41|
|Russell, David J; Way, Samuel F; Benson, Andrew K et al. (2010) A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences. BMC Bioinformatics 11:601|
|Nalbantog?lu, O U; Russell, D J; Sayood, K (2010) Data Compression Concepts and Algorithms and their Applications to Bioinformatics. Entropy (Basel) 12:34|
|Sayood, Khalid; Hoffman, Federico; Wood, Charles (2009) Use of average mutual information for studying changes in HIV populations. Conf Proc IEEE Eng Med Biol Soc 2009:3861-4|
|Russell, David J; Otu, Hasan H; Sayood, Khalid (2008) Grammar-based distance in progressive multiple sequence alignment. BMC Bioinformatics 9:306|
|Bauer, Mark; Schuster, Sheldon M; Sayood, Khalid (2008) The average mutual information profile as a genomic signature. BMC Bioinformatics 9:48|