To improve human health, a goal of the human genome project is to translate the genome sequence into an understanding of human biology. An important step in this process is knowledge of the structure of human proteins and the effects of sequence polymorphisms on structure and function. Currently, the structures of only 1000 human proteins are known, but the structures of up to one third or so of human proteins can be modeled based on the structures of homologous proteins in the Protein Data Bank. This fraction will increase rapidly due to structural genomics efforts. Unfortunately, general principles of what works in homology modeling and what does not have remained elusive. The reasons for this are several: 1) insufficient benchmarking of most prediction methods; 2) reliance on out-of-date statistical analysis of protein structures, performed without modem methods of statistics: 3) most modeling methods assume a relatively high level of sequence identity (>35 percent) between template structure and sequence to be modeled, when most proteins of unknown structure are only distantly related to proteins of known structure. The PI proposes benchmarking, new statistical analysis, and new algorithms for each of the three major aspects of homology modeling: alignment, building backbone coordinates for insertiondeletion regions, and sidechain placement. The primary tools will be Bayesian statistical analysis, including hierarchical models and non-parametric methods based on the Dirichlet process. The increase in size of the sequence and structure databases makes the new statistical analysis timely, both because of the increased power the new data provide, and the numerous applications afforded by more sequences and structures.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Research Project (R01)
Project #
1R01HG002302-01A1
Application #
6440018
Study Section
Molecular and Cellular Biophysics Study Section (BBCA)
Program Officer
Felsenfeld, Adam
Project Start
2001-12-14
Project End
2006-11-30
Budget Start
2001-12-14
Budget End
2002-11-30
Support Year
1
Fiscal Year
2002
Total Cost
$284,369
Indirect Cost
Name
Institute for Cancer Research
Department
Type
DUNS #
872612445
City
Philadelphia
State
PA
Country
United States
Zip Code
19111
Krivov, Georgii G; Shapovalov, Maxim V; Dunbrack Jr, Roland L (2009) Improved prediction of protein side-chain conformations with SCWRL4. Proteins 77:778-95
Wang, Qiang; Canutescu, Adrian A; Dunbrack Jr, Roland L (2008) SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling. Nat Protoc 3:1832-47
Shapovalov, Maxim V; Canutescu, Adrian A; Dunbrack Jr, Roland L (2007) BioDownloader: bioinformatics downloads and updates in a few clicks. Bioinformatics 23:1437-9
Shapovalov, Maxim V; Dunbrack Jr, Roland L (2007) Statistical and conformational analysis of the electron density of protein side chains. Proteins 66:279-303
Wang, Guoli; Dunbrack Jr, Roland L (2005) PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res 33:W94-8
Wang, Guoli; Jin, Yumi; Dunbrack Jr, Roland L (2005) Assessment of fold recognition predictions in CASP6. Proteins 61 Suppl 7:46-66
Kahsay, Robel Y; Wang, Guoli; Gao, Guang et al. (2005) Quasi-consensus-based comparison of profile hidden Markov models for protein sequences. Bioinformatics 21:2287-93
Jin, Yumi; Dunbrack Jr, Roland L (2005) Assessment of disorder predictions in CASP6. Proteins 61 Suppl 7:167-75
Tress, Michael; Tai, Chin-Hsien; Wang, Guoli et al. (2005) Domain definition and target classification for CASP6. Proteins 61 Suppl 7:8-18
Wang, Guoli; Dunbrack Jr, Roland L (2004) Scoring profile-to-profile sequence alignments. Protein Sci 13:1612-26

Showing the most recent 10 out of 14 publications