Bayesian Statistics and Algorithms for Homology Modeling

Dunbrack, Roland

Abstract

To improve human health, a goal of the human genome project is to translate the genome sequence into an understanding of human biology. An important step in this process is knowledge of the structure of human proteins and the effects of sequence polymorphisms on structure and function. Currently, the structures of only 1000 human proteins are known, but the structures of up to one third or so of human proteins can be modeled based on the structures of homologous proteins in the Protein Data Bank. This fraction will increase rapidly due to structural genomics efforts. Unfortunately, general principles of what works in homology modeling and what does not have remained elusive. The reasons for this are several: 1) insufficient benchmarking of most prediction methods; 2) reliance on out-of-date statistical analysis of protein structures, performed without modem methods of statistics: 3) most modeling methods assume a relatively high level of sequence identity (>35 percent) between template structure and sequence to be modeled, when most proteins of unknown structure are only distantly related to proteins of known structure. The PI proposes benchmarking, new statistical analysis, and new algorithms for each of the three major aspects of homology modeling: alignment, building backbone coordinates for insertiondeletion regions, and sidechain placement. The primary tools will be Bayesian statistical analysis, including hierarchical models and non-parametric methods based on the Dirichlet process. The increase in size of the sequence and structure databases makes the new statistical analysis timely, both because of the increased power the new data provide, and the numerous applications afforded by more sequences and structures.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project (R01)
Project #: 1R01HG002302-01A1
Application #: 6440018
Study Section: Molecular and Cellular Biophysics Study Section (BBCA)
Program Officer: Felsenfeld, Adam

Project Start: 2001-12-14
Project End: 2006-11-30
Budget Start: 2001-12-14
Budget End: 2002-11-30
Support Year: 1
Fiscal Year: 2002
Total Cost: $284,369
Indirect Cost

Institution

Name: Institute for Cancer Research
Department
Type
DUNS #: 872612445

City: Philadelphia
State: PA
Country: United States
Zip Code: 19111

Related projects


NIH 2006 R01 HG	Bayesian Statistics and Algorithms for Homology Modeling Dunbrack, Roland L. / Institute for Cancer Research	$246,078
NIH 2005 R01 HG	Bayesian Statistics and Algorithms for Homology Modeling Dunbrack, Roland L. / Institute for Cancer Research	$252,000
NIH 2004 R01 HG	Bayesian Statistics and Algorithms for Homology Modeling Dunbrack, Roland L. / Institute for Cancer Research	$252,000
NIH 2003 R01 HG	Bayesian Statistics and Algorithms for Homology Modeling Dunbrack, Roland L. / Institute for Cancer Research	$252,000
NIH 2002 R01 HG	Bayesian Statistics and Algorithms for Homology Modeling Dunbrack, Roland L. / Institute for Cancer Research	$284,369

Publications

Krivov, Georgii G; Shapovalov, Maxim V; Dunbrack Jr, Roland L (2009) Improved prediction of protein side-chain conformations with SCWRL4. Proteins 77:778-95

Wang, Qiang; Canutescu, Adrian A; Dunbrack Jr, Roland L (2008) SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling. Nat Protoc 3:1832-47

Shapovalov, Maxim V; Canutescu, Adrian A; Dunbrack Jr, Roland L (2007) BioDownloader: bioinformatics downloads and updates in a few clicks. Bioinformatics 23:1437-9

Shapovalov, Maxim V; Dunbrack Jr, Roland L (2007) Statistical and conformational analysis of the electron density of protein side chains. Proteins 66:279-303

Wang, Guoli; Dunbrack Jr, Roland L (2005) PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res 33:W94-8

Wang, Guoli; Jin, Yumi; Dunbrack Jr, Roland L (2005) Assessment of fold recognition predictions in CASP6. Proteins 61 Suppl 7:46-66

Kahsay, Robel Y; Wang, Guoli; Gao, Guang et al. (2005) Quasi-consensus-based comparison of profile hidden Markov models for protein sequences. Bioinformatics 21:2287-93

Jin, Yumi; Dunbrack Jr, Roland L (2005) Assessment of disorder predictions in CASP6. Proteins 61 Suppl 7:167-75

Tress, Michael; Tai, Chin-Hsien; Wang, Guoli et al. (2005) Domain definition and target classification for CASP6. Proteins 61 Suppl 7:8-18

Wang, Guoli; Dunbrack Jr, Roland L (2004) Scoring profile-to-profile sequence alignments. Protein Sci 13:1612-26

Showing the most recent 10 out of 14 publications

Comments

Be the first to comment on Roland Dunbrack's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: