The goal of this project is to develop new algorithms for protein function prediction. Recent rapid advancements in various technological developments produce biological data of unprecedented amount and complexity. Computational methods are becoming essential components in modern biomedical research. One of greatest challenges facing bioinformatician is the discovery of connections among different data sets and generating novel biological knowledge or hypotheses. Predicting the molecular function of novel proteins is ah urgent task for the post-genomics era. Especially, recent assessment of structural genomic efforts revealed a gap between experimental protein structure determination and the use ofthe structural knowledge for gaining understanding of biological function of the proteins at the molecular level. We will employ recent developments in discriminative machine learning approaches for constructing a residue-level classification system for function prediction from structure. Existing systems for functional prediction from structure either use global structural and sequence similarities over entire protein chain or use localized similarities such as putative functional sites. Our system will leverage the information from both global and local similarities, and identifies important residues and clusters of residues that are distinctive among different functional families. Our approach is based on and extend over an efficient optimization framework that we developed for protein superfamily classification. We expect that these methodological developments will not only improve the performance of state-of-the-art function prediction, but also help illuminating our understanding ofthe interplay of sequence and structure on defining functional variations among protein families. Beyond this major project, we will work on an additional project that extends the graph theoretical models for multiple sequence alignment we developed earlier to meet the challenge of domain annotation for large new sequence set.
The advancement of medical research is partly based on our detailed understanding ofthe functions of genes and proteins. Our research will improve our understanding of protien evolution and function at the molecular level. Our computational approach will speed up the discovery of biological knowledge from large data sets generated by high-throughput methods.
Showing the most recent 10 out of 20 publications