CHALLENGE: The evolution of novel genes underlies novelty in cellular networks and phenotypic traits, as well as adaptations to environmental perturbations. It is becoming increasingly clear that novel protein-coding genes frequently arise de novo from sequences that were previously non-coding, although this process was long perceived as improbable for lack of credible mechanism. The sequencing revolution has revealed de novo gene birth as a major source of novel genes in the genomes of viruses, bacteria, fungi, plants and animals. In the human genome, de novo genes tend to be expressed in the brain and several have been linked to disease including Alzheimer?s disease. Overall however, de novo genes tend to be uncharacterized and understudied because of their unexplained origins, leaving the most species-specific molecular mechanisms impenetrable. GOAL: The fundamental paradox of de novo gene birth lies in the biochemical improbability that non-coding sequences would spontaneously encode the complex arrangements of codons necessary to generate useful proteins. The proposed research aims at solving the fundamental paradox of de novo gene birth. INNOVATION: The PI pioneered an evolutionary model according to which novel genes emerge de novo through intermediate ?proto-genes? generated by widespread translation of non-coding transcripts. Proto-genes are transitory genetic elements that are currently undergoing the transition from non-coding to coding. The proposed research will investigate the past, present and potential future of proto-genes by taking advantage of the awesome power of yeast genetics and genomics. Recent computational and experimental developments in ancestral reconstruction and high-throughput microscopy will be deployed on proto-genes for the first time to uncover the patterns governing the emergence of novel elements in specific cellular locations. Focused experiments will reconstruct 14 million years of evolutionary trajectory for one specific adaptive proto-gene, revealing when its coding capacities arose and how it is integrating pre-existing cellular processes. The possibility that the evolutionary fate of proto-gene may be predictable from their biochemical propensities will be investigated by integrating modeling with a novel pipeline for high-throughput detection of adaptive potential developed by the PI. IMPACT: By revealing how non-coding sequences can transition to become novel protein-coding genes, this project will uncover the hidden potential of genomes and its impact in the emergence of novel cellular processes. These transformative concepts will provide novel insights into the molecular determinants of species-specificity and the mechanisms of adaptation.

Public Health Relevance

Genomes consist of sequences that code for proteins (coding DNA) and sequences that don?t (non-coding DNA). I propose to study how non-coding DNA can evolve to become coding, creating novel proteins never before seen in nature. By advancing understanding of this mysterious process, my research will help uncover the origins of what makes humans different from other species at the molecular level.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
NIH Director’s New Innovator Awards (DP2)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Janes, Daniel E
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Medicine
United States
Zip Code