Identifying genetic code reassignments in nucleotide sequence databases

Shulgina, Yekaterina

Abstract

Biological discoveries made in other organisms tell us about the functions of human genes because of the ability to compare homologous protein sequences. Recent efforts to sequence a greater diversity of species for comparative analysis have been primarily done on the DNA level, and protein sequences are subsequently translated in silico assuming some genetic code. However, there is currently no informed way of selecting the correct genetic code for a newly sequenced organism, which is critical for the correct translation of predicted protein sequences. As more diverse organisms are sequenced, species using variant genetic codes continue to be found, suggesting that there may be a hidden diversity of alternative genetic codes across the tree of life.
Aim 1 proposes building a computational tool to predict the genetic code used by an organism from nucleotide sequence alone. This would fill in a critical missing step in genome annotation pipelines and would ensure the accuracy of protein sequence databases, which are predominantly composed of predicted protein sequences.
In aim 2, the computational tool will be used to infer the genetic code usage of all publicly available genomes and validate any new genetic codes by computational analysis of tRNA genes, experimental confirmation of tRNA expression via Northern blotting, and confirmation of altered codon translation via proteomic mass spectrometry.
In aim 3, the updated distribution of alternative genetic codes will be used to address long-standing hypotheses in the field about how the genetic code is thought to evolve. This research training plan is intended to prepare the PI for a career as an independent and interdisciplinary researcher. The training environment will be in a collaborative computational laboratory, with access to a lab bench and shared lab equipment to do the proposed experiments. The training plan will also include development of science communication skills, including oral presentations and writing.

Public Health Relevance

Efforts are currently underway to sequence genomes from across the tree of life, but there is currently no informed way of selecting which genetic code to use when translating predicted protein sequences in silico. This proposal outlines a plan to build a computational tool to infer the genetic code used by an organism from nucleotide sequence data alone, and then to characterize the distribution of alternative genetic codes in all sequenced organisms. This would not only ensure the accuracy of protein sequence databases but would also allow us to address fundamental questions about how disruptive changes to protein translation can evolve.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Predoctoral Individual National Research Service Award (F31)
Project #: 5F31HG010984-02
Application #: 10075793
Study Section: Special Emphasis Panel (ZRG1)
Program Officer: Cubano, Luis Angel

Project Start: 2019-12-16
Project End: 2021-12-15
Budget Start: 2020-12-16
Budget End: 2021-12-15
Support Year: 2
Fiscal Year: 2021
Total Cost
Indirect Cost

Institution

Name: Harvard University
Department: Microbiology/Immun/Virology
Type: Graduate Schools
DUNS #: 082359691

City: Cambridge
State: MA
Country: United States
Zip Code: 02138

Related projects


NIH 2021 F31 HG	Identifying genetic code reassignments in nucleotide sequence databases Shulgina, Yekaterina / Harvard University
NIH 2020 F31 HG	Identifying genetic code reassignments in nucleotide sequence databases Shulgina, Yekaterina / Harvard University

Comments

Be the first to comment on Yekaterina Shulgina's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: