I am a postdoctoral scholar associated with Steven Brenner's lab at Berkeley, working on structural biology and computational genomics. My long-term vision is to develop new algorithms for inferring protein evolution and function from sequence and structure. Currently I am working on algorithms that can automatically classify a protein into its proper superfamily. The long-term goal of this project is to improve the accuracy of protein structure classification and function prediction. ? ? The superfamily defines ancient protein homology. Protein superfamily classification remains a challenging task, even when 3D structure is available. Currently this task still requires experts' manual work. We believe that the classification of protein superfamilies relies on the integration of sequence information and structure information. We will employ recent breakthroughs in kernel-based machine learning approaches for combining different sources of information. We will also develop structure-based discriminative profile models for protein superfamilies. We expect these algorithmic developments will not only result in a practical tool for superfamily classification, but they will also improve our understanding of the interplay of sequence and structure on defining very remote homology. ? ? We will extend our structure-based discriminative profile models for protein classification to function prediction. We will develop new methods for the identification of structure-sequence signatures of protein functioin. In addition, we will extend the graph theoretical models for multiple sequence alignment I developed during my Ph.D. study to meet the challenge of domain annotation for large new sequence set. ? ? The advancement of medical research is partly based on our detailed understanding of the functions of genes and proteins. My research will improve our understanding of protein evolution and function at the molecular level. Our computational approach will speed up the discovery of biological knowledge from large data sets generated by high-throughput methods. ? ? ?

Agency
National Institute of Health (NIH)
Institute
National Center for Research Resources (NCRR)
Type
Career Transition Award (K99)
Project #
5K99RR024163-02
Application #
7495992
Study Section
Special Emphasis Panel (ZGM1-BRT-9 (KR))
Program Officer
Sheeley, Douglas
Project Start
2007-09-15
Project End
2009-06-30
Budget Start
2008-07-01
Budget End
2009-06-30
Support Year
2
Fiscal Year
2008
Total Cost
$73,440
Indirect Cost
Name
University of California Berkeley
Department
Other Basic Sciences
Type
Schools of Earth Sciences/Natur
DUNS #
124726725
City
Berkeley
State
CA
Country
United States
Zip Code
94704
Gao, Liyan; Fang, Zhide; Zhang, Kui et al. (2011) Length bias correction for RNA-seq data in gene set analyses. Bioinformatics 27:662-9
Zhi, Degui; Shatsky, Maxim; Brenner, Steven E (2010) Alignment-free local structural search by writhe decomposition. Bioinformatics 26:1176-84