Determination of Protein Secondary Structure with the Aid of Protein Folding Si

Jha, Abhishek

Abstract

This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator.The three dimensional structure of a protein is vital to the study of its function, the identification of its binding partners, and its potential use as therapeutic medicine. Because experimental methods to determine a protein structure consume large amounts of time and resources and because of the explosion in the number of new proteins obtained from genomes, a computational means is highly desired for deducing the structure of a protein from its amino acid sequence. The best protein tertiary structure prediction methods rely on an initial guess as to the secondary structure of the target amino acid sequence. Many servers exist to predict secondary structure, but these can be poor in confidence and accuracy, especially for target sequences with few or no homologous sequences in the Protein Data Bank (PDB). Thus, to improve our ability predict the three-dimensional structure of a protein, we need to develop new techniques that significantly enhance the quality of secondary structure prediction. Current secondary structure prediction techniques scan the PDB for structure fragments that have homologous sequence identity with some section of the target sequence. The secondary structure identities of these fragments are then aligned to their homologous section of the target sequence, and the most probable secondary structure for each position is calculated. For target sequences with low homology to proteins of known structure, however, very few fragments are available, and the fragments that are available can be inaccurate. As such, there is no way to generate an accurate secondary structure assignment for this sequence, and therefore it would be exceedingly difficult to predict the three-dimensional structure of the protein. Our algorithm aims to overcome this limitation by taking advantage of the idea that the folding of a protein into a three-dimensional structure often determines the preference for specific types of secondary structure. Hence, we suggest that the formation of secondary and tertiary structure should be a coupled process where each type of structure supplies information as to the identity of the other. Our model features a reduced representation of protein structure in order to be computationally efficient. Specifically, we allow protein structure to include the protein backbone and the beta carbon of each side chain, and we calculate the energy of a structure using a statistical potential based on the pairwise distances between atoms. By not explicitly treating the full amino acid side chains in our representation, we do not incur the computational cost of sampling the side-chain rotamers or calculating their energy. Monte Carlo simulated annealing minimization of the statistical potential during sampling of backbone phi/psi angles from a PDB-based secondary structure dependent trimer library generates a 3-D protein structure with a given secondary structure assignment. An ensemble of 100 independently minimized structures is used to calculate the positional probabilities of different secondary structure types for each residue. When a secondary structure type occurs in this ensemble at a given amino acid with sufficiently low probability, sampling of that secondary structure is disallowed for the residue in the subsequent iteration of folding. The process is repeated until no additional sampling restrictions can be made and there is only one type of secondary structure remaining for most amino acids. The final result is a folding-enhanced secondary structure prediction that coincides with the tertiary structure prediction selected as the lowest energy structure from the final folding iteration. We have tested this algorithm for a set of 30 sequences and have found significant improvement over current secondary structure prediction methods. However, in order to prove the efficacy of this method, we need the computational resources to fully test the generality of this method on a large number of target sequences of lengths 100-200+ residues. These resources would also be vital to the testing of possible improvements to our model, such as the inclusion of side chains in an additional iteration of folding. This could allow us to improve the quality of our three-dimensional prediction while at the same time not dramatically decreasing our computational efficiency. A developmental allocation from the Teragrid would greatly help us meet these needs.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Biotechnology Resource Grants (P41)
Project #: 5P41RR006009-18
Application #: 7723325
Study Section: Special Emphasis Panel (ZRG1-BCMB-Q (40))

Project Start: 2008-08-01
Project End: 2009-07-31
Budget Start: 2008-08-01
Budget End: 2009-07-31
Support Year: 18
Fiscal Year: 2008
Total Cost: $473
Indirect Cost

Institution

Name: Carnegie-Mellon University
Department: Biostatistics & Other Math Sci
Type: Schools of Arts and Sciences
DUNS #: 052184116

City: Pittsburgh
State: PA
Country: United States
Zip Code: 15213

Related projects

Publications

Simakov, Nikolay A; Kurnikova, Maria G (2018) Membrane Position Dependency of the pKa and Conductivity of the Protein Ion Channel. J Membr Biol 251:393-404

Yonkunas, Michael; Buddhadev, Maiti; Flores Canales, Jose C et al. (2017) Configurational Preference of the Glutamate Receptor Ligand Binding Domain Dimers. Biophys J 112:2291-2300

Hwang, Wonmuk; Lang, Matthew J; Karplus, Martin (2017) Kinesin motility is driven by subdomain dynamics. Elife 6:

Earley, Lauriel F; Powers, John M; Adachi, Kei et al. (2017) Adeno-associated Virus (AAV) Assembly-Activating Protein Is Not an Essential Requirement for Capsid Assembly of AAV Serotypes 4, 5, and 11. J Virol 91:

Subramanian, Sandeep; Chaparala, Srilakshmi; Avali, Viji et al. (2016) A pilot study on the prevalence of DNA palindromes in breast cancer genomes. BMC Med Genomics 9:73

Ramakrishnan, N; Tourdot, Richard W; Radhakrishnan, Ravi (2016) Thermodynamic free energy methods to investigate shape transitions in bilayer membranes. Int J Adv Eng Sci Appl Math 8:88-100

Zhang, Yimeng; Li, Xiong; Samonds, Jason M et al. (2016) Relating functional connectivity in V1 neural circuits and 3D natural scenes using Boltzmann machines. Vision Res 120:121-31

Lee, Wei-Chung Allen; Bonin, Vincent; Reed, Michael et al. (2016) Anatomy and function of an excitatory network in the visual cortex. Nature 532:370-4

Murty, Vishnu P; Calabro, Finnegan; Luna, Beatriz (2016) The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems. Neurosci Biobehav Rev 70:46-58

Ramakrishnan, N; Radhakrishnan, Ravi (2015) Phenomenology based multiscale models as tools to understand cell membrane and organelle morphologies. Adv Planar Lipid Bilayers Liposomes 22:129-175

Showing the most recent 10 out of 292 publications

Comments

Be the first to comment on Abhishek Jha's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: