The applicant's long term goal is to understand and elucidate structure and organization within DNA sequences and uncover their relationship to biological functions. The objective of this application, which is a step toward the attainment of this long term goal, is to develop techniques for elucidating occult structural features in bacterial DNA which can be used for identification and differentiation of microbial organisms, including organisms whose genome has not been completely sequenced, using short fragments of DNA. A parallel goal is to achieve investigative independence as a computational biologist. The latter goal will be accomplished through coursework and training in the laboratories of Dr. S. Hinrichs, M.D. The urgent need for rapid identification tests for biological materials has intensified because of the threat posed by bio- terrorism. Rapid identification of both the fact and the mode of attack is essential for timely therapeutic intervention. The ability to identify bacteria based on short sequences of incomplete or possibly corrupt sequences allows for hazard detection, automation, and low cost distributed sensing capability. The identification techniques will be developed using three tools; the average mutual information (AMI) profile which reflects statistical relationships between bases along the DNA sequence, a cluster analysis technique developed by the applicant and co-workers which identifies genome specific trinucleotide clustering patterns, and a parsing technique for identification of polynucleotide sequences of interest. Components of the AMI profile which possess discriminatory capabilities will be identified by decomposing the profile and analyzing the coefficients using both supervised and unsupervised classification. The clustering strategy will be refined by correlating parameters in the technique with known biological behavior. Signature trinucleotide and polynucleotide clustering patterns will be identified for organisms of interest. The different classifications will be combined into a tree structured test for a model panel of bacteria of medical interest.

Agency
National Institute of Health (NIH)
Institute
National Institute of Allergy and Infectious Diseases (NIAID)
Type
Mentored Quantitative Research Career Development Award (K25)
Project #
5K25AI068151-03
Application #
7371917
Study Section
Microbiology and Infectious Diseases B Subcommittee (MID)
Program Officer
Beanan, Maureen J
Project Start
2006-02-01
Project End
2011-01-31
Budget Start
2008-02-01
Budget End
2009-01-31
Support Year
3
Fiscal Year
2008
Total Cost
$152,705
Indirect Cost
Name
University of Nebraska Lincoln
Department
Engineering (All Types)
Type
Schools of Engineering
DUNS #
555456995
City
Lincoln
State
NE
Country
United States
Zip Code
68588
Nalbantoglu, Ozkan U; Way, Samuel F; Hinrichs, Steven H et al. (2011) RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformatics 12:41
Russell, David J; Way, Samuel F; Benson, Andrew K et al. (2010) A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences. BMC Bioinformatics 11:601
Nalbantog?lu, O U; Russell, D J; Sayood, K (2010) Data Compression Concepts and Algorithms and their Applications to Bioinformatics. Entropy (Basel) 12:34
Sayood, Khalid; Hoffman, Federico; Wood, Charles (2009) Use of average mutual information for studying changes in HIV populations. Conf Proc IEEE Eng Med Biol Soc 2009:3861-4
Russell, David J; Otu, Hasan H; Sayood, Khalid (2008) Grammar-based distance in progressive multiple sequence alignment. BMC Bioinformatics 9:306
Bauer, Mark; Schuster, Sheldon M; Sayood, Khalid (2008) The average mutual information profile as a genomic signature. BMC Bioinformatics 9:48