Protein overexpression is desirable for many biotechnology applications ranging from vaccine production to drug discovery. Based on gene expression data of about 20,000 genes with common expression vectors from collaboration with the Northeast Structural Genomics Consortium, we observe that the free energy of the first ~50 coding nucleotides is strongly predictive of the expression level of polypeptides. Many mRNA sequences encode exactly the same protein sequence because multiple codons may map to the same amino acid. The recent emergence of experimental datasets of expression levels for genes has created an opportunity to maximize protein expression through modeling and algorithm design. We propose to develop algorithms that will enable biologists to evaluate whether native mRNA sequences are likely to express highly and to build synonymous mRNA sequences designed to optimize gene expression. Having relatively little mRNA secondary structure at the start of the coding region is of particular importance as that is where the ribosome assembles. Extensive mRNA secondary structure later in the gene also appears to be deleterious. In translation, splicing, and small interfering RNA gene regulation mechanisms, a region of messenger RNA must be unfolded to allow binding of the ribosome, splice factors, or microRNAs. Understanding the unfolding free energy costs offers opportunities to understand the biology of and to algorithmically engineer changes in gene expression.

Public Health Relevance

Engineering mRNA sequences to optimize polypeptide expression will facilitate production of proteins for therapies, for the study of protein structure and for drug discovery. Synonymous mutations have been linked to human disease risk.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Academic Research Enhancement Awards (AREA) (R15)
Project #
1R15GM106372-01A1
Application #
8689532
Study Section
Biochemistry and Biophysics of Membranes Study Section (BBM)
Program Officer
Preusch, Peter
Project Start
2014-06-01
Project End
2017-05-31
Budget Start
2014-06-01
Budget End
2017-05-31
Support Year
1
Fiscal Year
2014
Total Cost
$255,304
Indirect Cost
$60,304
Name
Williams College
Department
Physics
Type
Schools of Arts and Sciences
DUNS #
020665972
City
Williamstown
State
MA
Country
United States
Zip Code
01267
Aalberts, Daniel P; Boël, Gregory; Hunt, John F (2017) Codon Clarity or Conundrum? Cell Syst 4:16-19
Boël, Grégory; Letso, Reka; Neely, Helen et al. (2016) Codon influence on protein expression in E. coli correlates with mRNA levels. Nature 529:358-363