Efficient production of functional proteins is arguably the most important function of a cell. Ribosomes synthesize proteins by decoding mRNA codons, and N-terminal portions of proteins can begin to fold even while synthesis is still underway. The genetic code is degenerate, meaning that most amino acids can be encoded by more than one codon. Because synonymous codon substitutions do not alter the amino acid sequence of the encoded protein, they have historically been regarded as ?silent?. However, it is now known that some synonymous substitutions can disrupt the expression, folding, targeting and/or function of the encoded protein, although the precise mechanisms are poorly understood. Computational analyses have attempted to identify connections between the locations of synonymous codons and features of the encoded protein, but to date have yielded conflicting results, and there have been few attempts to experimentally test predictions made from these computational studies. Hence we currently lack a systematic understanding of the connections between synonymous codon usage and protein biogenesis. Establishing these connections would broadly transform our interpretation of synonymous codon substitutions, including single-nucleotide polymorphisms (SNPs) associated with human disease and synonymous substitutions in genome-wide association studies (GWAS). Establishing these connections would also enable the addition of coding sequence design as an integral aspect of the rational design of novel gene products (proteins). Thus, we aim to design an innovative integrative computational and experimental strategy with which to identify connections between codon usage patterns and protein biogenesis. We will search broadly for such connections, developing and applying several novel new approaches: (i) computational approaches to track, quantify and align synonymous codon usage patterns in homologous proteins, (ii) network approaches to map codon usage onto all levels of protein structure, and (iii) an innovative combination of broad and targeted experimental approaches to test the importance and specific effects of altering codon usage on protein biogenesis. Throughout the project, rigorous statistical methods will be applied to test the validity of identified connections, and cell-based experiments will be used to both test and refine hypotheses resulting from the computational analyses and develop new hypotheses that will feed back into the computational analyses. The goal of this project is to transform our understanding of the connections between synonymous codon usage and protein biogenesis. The endpoint for this project period is the development of a set of general principles for codon usage, including user-friendly open-source software to enable the biomedical community to analyze genes of interest for synonymous codon usage features likely to affect protein biogenesis. At the same time, our methodology will be generalizable, to allow the public to search for additional connections between sequence and/or network patterns and protein function, as well as for similar connections in other domains.

Public Health Relevance

The same protein sequence can be encoded by more than one nucleic acid sequence. Historically, ?synonymous? changes in a nucleic acid sequence were thought to be ?silent?, but it is now emerging that some synonymous changes can change the folding of the encoded protein, and/or its transport into a membrane. We will integrate novel computational analyses and laboratory experiments to uncover the connections between synonymous substitutions and protein production in the cell.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM120733-02
Application #
9315195
Study Section
Biodata Management and Analysis Study Section (BDMA)
Program Officer
Ravichandran, Veerasamy
Project Start
2016-07-15
Project End
2020-04-30
Budget Start
2017-05-01
Budget End
2018-04-30
Support Year
2
Fiscal Year
2017
Total Cost
Indirect Cost
Name
University of Notre Dame
Department
Biostatistics & Other Math Sci
Type
Biomed Engr/Col Engr/Engr Sta
DUNS #
824910376
City
Notre Dame
State
IN
Country
United States
Zip Code
46556
Rodriguez, Anabel; Wright, Gabriel; Emrich, Scott et al. (2018) %MinMax: A versatile tool for calculating and comparing synonymous codon usage and its impact on protein folding. Protein Sci 27:356-362
Chaney, Julie L; Steele, Aaron; Carmichael, Rory et al. (2017) Widespread position-specific conservation of synonymous rare codons within coding sequences. PLoS Comput Biol 13:e1005531
Faisal, Fazle E; Newaz, Khalique; Chaney, Julie L et al. (2017) GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison. Sci Rep 7:14890