The general research aims of my group are to use molecular modeling and bioinformatics to analyze structure, function, and molecular evolution of membrane proteins. Membrane proteins are one of the most important classes of proteins. They comprise about 30% of most genomes and are involved in many biological processes. They are especially important in biomedical research because most targets of current pharmaceutical projects are membrane proteins. Unfortunately, their structures are difficult to determine experimentally. We fill some of this structural void by developing computational methods of analyzing sequences and developing structural models of membrane proteins. We use computational analyses to do the following: 1)Address questions that are not answered by crystal structures. 2) Assist in understanding similarities and differences among homologous proteins. 3) Relate structural and sequence information to functional properties. 4) Assist in the design and interpretation of experimental studies. Our current projects can be classified into three areas: 1) models of the structure and gating mechanisms of the large mechanosensitive channel, MscL; 2) models of the structure and gating mechanisms of potassium channels and their relatives; and 3) development of methods to analyze sequences and construct structural models of membrane proteins. Project 1: Models of the Mechanosensitive Channel, MscL This project exemplifies our general approach to modeling the structures and functional mechanisms of membrane proteins. We have modeled the structure of the prokaryote mechanosensitive channel, MscL, as it undergoes a very large conformational change from a closed conformation to an open pore with a diameter greater than 30?. The crystal structure of TbMscL, from M. tuberculosis, was determined in 1998. It forms a homopentamer in which each subunit has two transmembrane a helices, M1 and M2. However, the crystal structure left several questions unanswered. The channel was closed in the crystal structure, so it did not explain how the pore with a diameter greater than 30? opens when the membrane is stretched. Furthermore, the first twelve residues, which are highly conserved among MscL homologs and are essential for MscL function, were unresolved in the crystal. Finally, most experiments had been performed on EcoMscL from E. coli, so it was unclear how the TbMscL structure related to the functional properties of EcoMscL. Sergei Sukharev from the University of Maryland was the first to clone EcoMscL and has performed biophysical studies of its properties. Working with him, we first developed an EcoMscL homology model from the TbMscL structure. His earlier studies suggested that the transmembrane portion of the protein expands substantially before it opens. In the crystal structure, the gate that closed the pore appeared to be formed by the M1 transmembrane helix. However, our modeling indicated that it was impossible to expand the transmembrane structure substantially without opening a transmembrane pore. To overcome this problem, we modeled the unresolved N-terminus as an amphipathic a helix, which we called S1. We modeled five S1 helices (one from each of five identical subunits) to form a pentameric bundle that blocks the pore until the transmembrane region expands substantially. We found that the expansion motion that best satisfied a series of modeling criteria involved dramatically increasing the tilt of both M1 and M2 relative to the axis of the pore. When the transmembrane pore expanded substantially, tension through the short segment linking S1 to M1 was hypothesized to pull the S1 bundle apart to open the channel. Sukharev's group has performed mutagenesis experiments to test most of the essential features of this model. So far they have found the following: 1) Replacement of any residue on the hydrophobic face of the putative S1 helix by a cysteine cross-links adjacent subunits. The channel cannot open when two pairs of S1 segments are cross-linked and can open only to a substate when one pair is cross-linked. These findings indicate that S1 helices interact in the closed conformation and come apart when the channel opens. 2) For the mutant in which the last hydrophobic residue on S1 is replaced with cysteine, the disulfide bridge formed between adjacent S1 segments is reduced more readily by mercaptoethanol applied from the opposite or extracellular side of the membrane than from the same or cytoplasmic side. This result is consistent with our model because this residue is in the radial center of the S1 `plug' where it is exposed to the transmembrane pore; thus extracellular mecaptoethanol can reach this site by diffusing through the transmembrane pore, whereas intracellular mecaptoethanol must diffuse through the tightly packed S1 bundle to reach it. 3) When the channel is opened, a disulfide bridge can form between a cysteine placed at the first hydrophobic position in a S1 helix and one placed within the M2 helix near its end. Formation of this bridge tends to lock the channel in open conformations. This supports our model in which the S1 helices form part of the transmembrane wall of the open pore. 4) Altering the length of the segment linking S1 to M1 affects gating in the predicted manner; i.e. making it longer inhibits activation while making it shorter facilitates activation. 5) A cysteine placed near the beginning of the M1 transmembrane helix can cross-link when the membrane is stretched to one placed near the end of M1 of an adjacent subunit. Formation of this bridge stabilizes the open conformation. This supports our hypothesis that adjacent M1 helices interact and are very tilted when the channel opens. 6) Disulfide bridge formation between residues that are in proximity in the crystal structure and are in M1 and M2 helices of adjacent subunits does not substantially alter the gating of the channel. This supports our model in which interactions between M1 and M2 of adjacent subunits remains relatively unaltered when the channel opens. 7) Replacement by cysteine of hydrophobic residues on C-terminus cytoplasmic helices cross-links adjacent subunits but has no detectable effect on the gating of the channel. This supports a model in which these C-terminus helices form second pentameric bundle that remains intact when the channel opens. This project epitomizes the synergy between experimentation and modeling. The models would not have been developed in the absence of the preliminary data and input from the experimentalists. Likewise, the experiments would never have been performed without the models because they required precise predictions about which amino acid residues interact in both the closed and open conformations. Project 2. Models of Potassium Channels and Their Relatives Our project on K+ channels and their homologs is much more extensive and difficult than the MscL project, but it is also more important. K+ channels and other channels and transporters that evolved from K+ channels comprise one of the largest and most diverse groups of membrane proteins. These proteins are found in almost all cells from bacteria on up. This category of membrane proteins contains several diverse superfamilies of channels including Na+, Ca2+, cyclic nucleotide-gated, TRP and its homologs, glutamate-activated, and Ca2+ release channels plus some K+ symporters and transporters. The smallest of these proteins are 2TM K+ channels that have four identical subunits; each of which has only two transmembrane helices, M1 and M2. A 'P' hairpin segment that spans only the outer half of the transmembrane region is located between M1 and M2. The P segment determines the selectivity of the channel. 6TM K+ channels are more complex, with each alpha subunit having four additional transmembrane segments, S1-S4, that precede the pore-forming S5-P-S6 motif (analogous to the M1-P-M2 motif of 2TM channels). Voltage-gated Ca2+ and Na+ channels have only one alpha subunit; however, it contains four homologous 6TM motifs. The most significant development related to the structures of these proteins was the determination of the crystal structure of the closed K+ channel from Streptomyces lividens and of the open K+ channel form Methanobacterium thermoautotrophicum. These are among of the simpler K+ channels; each is a homotetrameric 2TM channel. The computational challenge is utilize these known bacterial structures to develop structural and functional models of many other homologous proteins that have greater biomedical relevance. We are using the crystal structures to develop homology models of the pore region of at least one member of every major family of related proteins. We are also adding the S1-S4 transmembrane segments of the 6TM families to these models of the channel core. In order to better understanding the gating mechanism, we are developing models of closed, open, and several intermediate conformations. These models are constrained both by experimental results (many mutagenesis experiments have been performed on some of these channels) and by a series of modeling criteria that we have developed. Project 3. Development of Methods to Analyze Membrane Protein Sequences and to Develop Structural Models of Transmembrane Regions. We continue to develop methods and criteria for developing structural models of specific membrane proteins. Many of our methods rely on first aligning many homologous protein sequences from distantly related families. Most existing methods of aligning protein sequences rely upon information that has been generated by statistical analyses of water soluble proteins. We have found these methods to be unreliable for transmembrane segments of membrane proteins. We are currently developing a new alignment algorithm that will first identify likely transmembrane segments and then use parameters based on statistical analyses of transmembrane segments to align these segments while using parameters based on soluble proteins to align the other regions of the protein. Once a multisequence alignment has been made for a protein family, sequence profiles can then be developed that predict the probability that each of the 20 amino acids, or that an insertion or deletion, will occur at each position in these alignments. Developing sequence profiles helps in aligning distantly related families since profile-profile alignments yield substantially better and less ambiguous alignments than do sequence-sequence or sequence-profile alignments. The following examples illustrate how these data can be used to identify which residues form functionally important sites or are exposed to either water or lipids on the surface of the proteins: 1) Residues that are poorly conserved among closely related proteins tend to be on the surface of the protein; they tend to be hydrophilic when exposed to water and hydrophobic when exposed to lipid alkyl chains. Transmembrane helices that are positioned at the protein-lipid interface tend to have a lipid-exposed face composed of poorly conserved hydrophobic residues. 2) Residues that are highly conserved among protein families tend to form spatial clusters at functionally important regions or sites. 3) Residues that are highly conserved among set A families and among set B families but that are not conserved between set A and set B families tend to cluster at functionally important sites and the sites tend to function differently in the two sets of families. We are developing quantitative algorithms to identify these categories of residues and to use results of these calculations to constrain structural models. We are also developing better methods to predict the transmembrane topology of membrane proteins and to identify proteins that are distantly related to K+ channels. Conventional methods of predicting topologies of proteins with transmembrane a helices identify hydrophobic segments that are sufficiently long to span a lipid bilayer in as an a helix. These methods do not work well for K+ channels and their relatives because they typically identify the hairpin P segment as a transmembrane helix and do not identify the positively charged S4 segment of the 6TM channels as a transmembrane helix. We are developing hidden Markov methods to search for and identify such segments.