This SBIR Phase II research develops methods to improve the manufacture of recombinant protein products produced in foreign hosts. Cost-effective production of proteins generally utilizes organisms that are well-suited for protein engineering and large-scale production. Establishing a suitable production system for a protein is often a time-consuming, trial-and-error-based process and can be a significant barrier for the commercialization of a protein. In cases where production systems are found, they are often far from optimized due to the time and cost required as well as our current limited understanding of the critical parameters. In Phase I several gene design variables were assessed for their importance to protein expression in the bacterium Escherichia coli, a commonly used production organism. Data suggested novel means for gene optimization that were unexpected from conventional wisdom. In Phase II relevant gene design variables suggested by Phase I will be explored toward development of a refined model of the relationship of gene design to protein expression in E. coli as well as in other useful production organisms.
The broader impacts of this research are improved manufacturing techniques for recombinant protein based products. Protein products constitute a currently >$40 billion and rapidly growing world-wide market including industrial enzymes, diagnostic enzymes and protein pharmaceuticals. The tools developed from this project will drastically improve the speed, reduce the cost, and remove the uncertainties of modern protein manufacturing, which significantly limit this market. Improved production will also accelerate the study of proteins with therapeutic or otherwise marketable potential, expanding the field of candidate proteins for commercialization.
The ability to introduce modified or foreign (heterologous) genes into an organism for the production of proteins of interest is fundamental to essentially all biotechnology. Some applications involve use of cells to produce large amounts of a protein for studies of function and structure. Some involve production of high-value protein therapeutics. More advanced applications involve expression of a pathway of several protein catalysts that can convert low-cost feedstock to high-value materials, such as sugars to fuels. All of these require understanding of how to design genes that best support protein expression in the host organism utilized. Gene synthesis technologies allow researchers to construct complete DNA sequences that encode protein sequences and govern protein production levels. They give complete access to the critical information in gene sequences provided the knowledge exists to interpret that information. While such technologies have greatly facilitated access to and tailoring of genes, they also open the door to unprecedented study of their function. In this SBIR-funded project we sought to develop methods and technologies that leverage gene synthesis and informatics to determine how gene sequence influences protein yield in various organisms. Because of the redundancy of the genetic code, there are more ways to encode an average protein sequence than there are particles in the universe. Our technology, which we call GeneGPS, uses a systematic sampling method to simultaneously assess several different properties of genes that might impact protein production. Advanced multivariate analyses allow us to determine significant variables and quantify their impacts on protein level in the organism. This knowledge is then captured in a model which can be used to design synthetic genes for applications in that host organism. We have shown that we can dramatically and efficiently increase protein yield for several valuable protein products and in several host organisms, including bacteria, yeasts, and mammalian cells. This capability can dramatically reduce cost and time of protein product development and facilitate engineering of new systems for a wide variety of biotechnology goals.