The goal of this project is to develop new algorithms for protein function prediction. Recent rapid advancements in various technological developments produce biological data of unprecedented amount and complexity. Computational methods are becoming essential components in modern biomedical research. One of greatest challenges facing bioinformatician is the discovery of connections among different data sets and generating novel biological knowledge or hypotheses. Predicting the molecular function of novel proteins is ah urgent task for the post-genomics era. Especially, recent assessment of structural genomic efforts revealed a gap between experimental protein structure determination and the use ofthe structural knowledge for gaining understanding of biological function of the proteins at the molecular level. We will employ recent developments in discriminative machine learning approaches for constructing a residue-level classification system for function prediction from structure. Existing systems for functional prediction from structure either use global structural and sequence similarities over entire protein chain or use localized similarities such as putative functional sites. Our system will leverage the information from both global and local similarities, and identifies important residues and clusters of residues that are distinctive among different functional families. Our approach is based on and extend over an efficient optimization framework that we developed for protein superfamily classification. We expect that these methodological developments will not only improve the performance of state-of-the-art function prediction, but also help illuminating our understanding ofthe interplay of sequence and structure on defining functional variations among protein families. Beyond this major project, we will work on an additional project that extends the graph theoretical models for multiple sequence alignment we developed earlier to meet the challenge of domain annotation for large new sequence set.

Public Health Relevance

The advancement of medical research is partly based on our detailed understanding ofthe functions of genes and proteins. Our research will improve our understanding of protien evolution and function at the molecular level. Our computational approach will speed up the discovery of biological knowledge from large data sets generated by high-throughput methods.

National Institute of Health (NIH)
National Center for Research Resources (NCRR)
Research Transition Award (R00)
Project #
Application #
Study Section
Special Emphasis Panel (NSS)
Program Officer
Sheeley, Douglas
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Alabama Birmingham
Biostatistics & Other Math Sci
Schools of Public Health
United States
Zip Code
Jahandideh, Samad; Zhi, Degui (2014) Systematic investigation of predicted effect of nonsynonymous SNPs in human prion protein gene: a molecular modeling and molecular dynamics study. J Biomol Struct Dyn 32:289-300
Jahandideh, Samad (2013) Diversity in structural consequences of MexZ mutations in Pseudomonas aeruginosa. Chem Biol Drug Des 81:600-6
Lin, Wan-Yu; Yi, Nengjun; Lou, Xiang-Yang et al. (2013) Haplotype kernel association test as a powerful method to identify chromosomal regions harboring uncommon causal variants. Genet Epidemiol 37:560-70
Wu, Guodong; Zhi, Degui (2013) Pathway-based approaches for sequencing-based genome-wide association studies. Genet Epidemiol 37:478-94
Zhang, Kui; Zhi, Degui (2013) Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads. Bioinformatics 29:2427-34
Zhi, Degui; Wu, Jihua; Liu, Nianjun et al. (2012) Genotype calling from next-generation sequencing data using haplotype information of reads. Bioinformatics 28:938-46
Lin, Wan-Yu; Yi, Nengjun; Zhi, Degui et al. (2012) Haplotype-based methods for detecting uncommon causal variants with common SNPs. Genet Epidemiol 36:572-82
Zhi, Degui; Chen, Rui (2012) Statistical guidance for experimental design and data analysis of mutation detection in rare monogenic mendelian diseases by exome sequencing. PLoS One 7:e31358
Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui (2012) Comprehensive comparative analysis and identification of RNA-binding protein domains: Multi-class classification and feature selection. J Theor Biol 312C:65-75
Jahandideh, Samad; Mahdavi, Abbas (2012) RFCRYS: sequence-based protein crystallization propensity prediction by means of random forest. J Theor Biol 306:115-9

Showing the most recent 10 out of 20 publications