The therapeutic value of any protein can be easily compromised by low solubility, which often adversely affects purification, yield, activity, shelf-life, and delivery. Solubility concerns are thus frequent obstacles to the development and subsequent FDA approval of pharmaceutical compounds derived from protein models. These facts make the development of a quantifiable description of protein solubility, as well as a design platform that can be used to rationally and efficiently re-engineer therapeutic proteins with increased solubility, a high priority. Previous studies on a database of leptin mutants have shown that a sequence-based analysis of leptin solubility, in which amino acid properties (e.g., hydrophobicity, charge and solvation free energy) are summed over a protein sequence and then correlated to experimental solubility measurements, can provide high predictability (0.96 correlation) for additional mutants when information for similar mutations is already in the training dataset. However, predictability fails for mutation types not found in the training set. In contrast, when ensemble-based parameters derived from structural models of the individual mutants are correlated to the experimental solubilities, predictability readily extends to substitutions unknown to the training set, and shows an apparent structural-thermodynamic component to the solubility of proteins. In a blind test of this model on an additional mutant dataset, the ensemble- based approach predicted whether or not a mutation will increase or decrease leptin solubility with 86% accuracy, with an overall correlation of 0.80 with the actual experimental values. Initial tests also indicate that the ensemble-based parameterization of leptin solubility is readily transferable to non- leptin structures. The goal of this Phase I SBIR is to provide a proof-of-principal of the generality of the ensemble-based model of protein solubility by applying the same parameterization routine used on leptin to a second medically relevant compound, the small mitogenic protein called human epidermal growth factor (EGF). EGF is a target for cancer inhibitor drugs, making analogs designed for optimal solution properties likely to be valuable to the pharmaceutical industry. In a subsequent Phase II project application, a general and automated optimization strategy for therapeutic proteins will be developed using human erythropoietin and human granulocyte-colony stimulating factor as the test systems. ? ? ?