Like all species, grasses such as maize, rice, wheat, oats and barley contain genes encoding information that determines the amino acid sequence of all the proteins that plant synthesizes to germinate, grow and reproduce. All genes in an organism use a simple, four-letter alphabet (A, C, G, T) to make three-letter words (codons) to specify each amino acid of a protein. There is redundancy in this genetic code with multiple, similar codons often coding for the same amino acid. In the DNA, the overall G+C content of genes within any given species is often very similar. However, in grass species there are high-GC and low-GC genes, and there is evidence that high-GC and low-GC genes may be transcribed from DNA into RNA and translated from RNA into protein differently. This project will test whether proteins from high-GC and low-GC genes are transcribed and translated with equal efficiency and, specifically, whether that differential regulation is used when responding to heat, cold and drought stress. This work will also discover rules about codon usage and protein synthesis that can be used to design transgenes for efficient protein production. Finally, current gene prediction software does not account for the high-GC and low-GC genes found in grasses. This project will create a new gene prediction program that will more accurately predict genes in grasses. Better gene prediction will benefit the larger community of plant researchers, including those who work with economically important grasses such as maize, rice, wheat, oats and barley.
Grass genes have a bimodal GC distribution and a significant 5' to 3' GC gradient. Little is known about the consequence of such variation on gene and protein expression. High gene GC content and strong negative 5' to 3' gradients in grasses strongly affect codon usage bias. It is possible that both GC biased gene conversion and codon usage bias are important in shaping the unusual GC features that are found in grass genes. GC biased gene conversion may help to move gene mutation towards a particular codon usage program, and then selection on that codon bias may maintain a gene's GC content and gradient. GC biased gene conversion in rice will be examined. Transcriptional and translational characteristics of native and transgenes that differ only in their GC content will also be studied in rice. Translational efficiency may also be affected by regulation of tRNAs in a tissue or condition-specific manner. Therefore, regulation of tRNAs will be examined to determine if changes in tRNA abundances affect protein translational efficiency. Recent published data indicate that due to the variation of GC content in grass genes, a notable number of genes in grasses have been missed or mis-annotated by existing gene finders. A new gene prediction tool that accounts for grass bimodal GC distribution will be designed to improve gene annotation in grasses. The results from this project will provide novel insights into gene evolution in grasses as well as important guidance for crop improvement via genetic modification.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.