A basic tenet of the protein folding problem is that the information contained in the amino acid sequence is sufficient to dictate the three-dimensional, folded structure of a protein. The goal of the present study is to understand and quantify this idea using techniques of information and complexity theory. Potential applications of these approaches to protein structure determination and prediction will also be explored. From an information theoretical point of view, protein folding can be envisioned as a communication process by which the sequence information is transmitted to the three-dimensional structure. There are a number of questions one can ask regarding such information transfer. How much information is transferred from sequence to structure? How redundant is the information? Is information transfer via protein folding, a """"""""noisy or noiseless"""""""" communication channel? These questions are approached by taking advantages of recent advances in our understanding of the relationship between thermodynamic entropy, information entropy and algorithmic complexity. The information content of the sequence is determined from the information entropy, and the content of the three-dimensional structure is related to the algorithmic complexity. The algorithmic complexity is a measure of the shortest computational representation of a structure. With these quantities, the information content (or data compression) of sequence and structural data will be determined. Using maximum entropy techniques, the shared or mutual information between sequence and structure will also be determined. Knowledge of this shared information will be used to develop models of the """"""""communication channel"""""""" of protein folding. This approach can also be used to quantitatively compare structure prediction algorithms. A long term goal is to incorporate this shared information into a maximum entropy algorithm for X-ray and NMR structure determination. This approach will also provide an algorithm for determining structures by jointly optimizing X-ray and NMR data.
Dewey, T G (2001) A sequence alignment algorithm with an arbitrary gap penalty function. J Comput Biol 8:177-90 |
Dewey, T G (2000) Information dynamics of in vitro selection-amplification systems. Pac Symp Biocomput :602-13 |
Dewey, T G (1999) Statistical mechanics of protein sequences. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 60:4652-8 |
Dewey, T G; Delle Donne, M (1998) Non-equilibrium thermodynamics of molecular evolution. J Theor Biol 193:593-9 |