Our goal is to compare microbial genomics in terms of membrane protein structure, work falling into the emerging field of structural genomics. We will focus on the occurrence of membrane proteins composed of transmembrane (TM) helices and the interactions between pairs of these proteins. (I) Our first aim will be to inventory all the TM-helix proteins in the recently sequenced microbial genomes. Initially, we will use membrane- protein prediction methods based on transfer energy scales. Then we will try to improve upon these by building a Hidden Markov Model to identify membrane proteins. This probabilistic approach will allow us to systematically combine, in a Bayesian framework, prior information from biophysical scales with statistical information from the known membrane proteins. (Ii) Our second aim is to look at protein-protein interactions among helical membrane proteins from a database perspective. We will find all the common helix-helix interfaces in the database of known structures and compare these to the TM-helix oligomerization motifs found in genetic screens by the Beckwith and Engelman groups. In particular, we will measure the packing efficiency for all the helix-helix interfaces, trying to determine whether membrane-protein interfaces are packed less tightly than soluble ones. We will also see how often sequence motifs associated with TM-helix oligomerization occur in a number of genomes, estimating the fraction of proteins in a genome that could potentially interact via these motifs. (iii) Our final aim is to integrate into a comprehensive database the information on the occurrence and interaction of membrane proteins from the first two parts with further information, e.g. related to expression. This will allow us to compare genomes in terms of membrane-protein fold usage and look for TM-proteins common to many diverse organisms. It will also allow us to put the patterns of occurrence of TM-proteins into context, by comparing them to those of soluble proteins. We expect our analysis will initially involve approximately 10 genomes with this number increasing to approximately 100 during the funding period.