We propose the first comprehensive characterization of sequence space around an ancestral protein. This work will 1) characterize the effects on function of all possible mutations and pairs of mutations across the protein's entire length and of all possible combinations of mutations at a key subset of sites, 2) illuminate how the distribution of function through this multidimensional sequence space would have affected the processes of protein evolution (a key goal in molecular evolution), and 3) quantify the complete set of main-effect and epistatic genetic determinants of DNA specificity in a transcription factor and elucidate their biochemical causes ? an important goal for protein biochemistry and molecular gene regulation. We use the steroid hormone receptor DNA-binding domain as an ideal model system, because it is of great biomedical importance; it is experimentally and phylogenetically tractable; and its specificity for DNA targets diversified through a well-understood evolutionary process, with a known set of historical mutations and biophysical mechanisms. The proposed work will reveal why this history occurred relative to the many other mutational trajectories the protein could have taken as it evolved its new specificity. With the map of sequence space in hand, we will then apply locus-specific, replicated experimental evolution to the ancestral protein, placing it under strong selection to explore sequence space and evolve the same novel specificity that it acquired during historical evolution. By identifying commonalities and differences among the historical trajectory, experimental evolution trajectories, and the many other possible pathways through sequence space, we will gain fundamental insight into the roles of contingency and determinism in evolution and illuminate underlying mechanistic factors that caused those phenomena. Specific questions include: how many ways were there to evolve the derived DNA specificity, and how many were accessible under selection and drift? Did the historical outcome evolve because it was the optimal genotype, because it was the best or only accessible genotype, or simply due to chance? If more optimal genotypes exist, what prevented the evolving protein from reaching them? To what extent must new specificities evolve through promiscuous intermediates, and how many mutations does it take to evolve a new specificity? We will also characterize sequence space and experimental evolutionary trajectories around ancient receptors that existed at different times during history; this will reveal how the protein's evolvability and robustness fluctuated over evolutionary time due to epistatically acting mutations. Finally, by fully characterizing the main and epistatic genetic determinants of the protein's DNA specificity, we will identify common biophysical mechanisms that underlie DNA recognition, contributing to an important goal in molecular biology, biochemistry, cell biology, and development. The methods and conceptual tools we develop will be applicable to studying other transcription factors and the evolution of many other protein families.
Steroid hormone receptors bind specific DNA sequences and regulate the expression of genes that play key roles in a wide variety of diseases, including cancer, reproductive and immune dysfunction, cardiovascular disease, and disruptions of metabolic and osmotic homeostasis. Understanding how and why they evolved to recognize their specific DNA targets can help explain the mechanisms by which these proteins work as they do, aiding efforts to predict how genetic differences affect DNA recognition, while also answering important questions about the nature of evolutionary processes. This project combines techniques from protein biochemistry and evolutionary biology to characterize the effects on a reconstructed ancestral protein?s DNA binding specificity of every possible mutation at every site and pair of sites that could have occurred in the deep past, as well as all possible combinations of mutations at a subset of functionally essential sites; it will also study the mutations the ancient protein amasses during evolution in the laboratory under selection for the same change in DNA specificity that occurred during history, thus characterizing the many possible histories that evolution could have taken, contributing to our understanding of fundamental issues in protein biochemistry, gene regulation, and evolution.
Venkat, Aarti; Hahn, Matthew W; Thornton, Joseph W (2018) Multinucleotide mutations cause false inferences of lineage-specific positive selection. Nat Ecol Evol 2:1280-1288 |
Starr, Tyler N; Flynn, Julia M; Mishra, Parul et al. (2018) Pervasive contingency and entrenchment in a billion years of Hsp90 evolution. Proc Natl Acad Sci U S A 115:4453-4458 |
Liu, Qinwen; Onal, Pinar; Datta, Rhea R et al. (2018) Ancient mechanisms for the evolution of the bicoid homeodomain's function in fly development. Elife 7: |
Hochberg, Georg K A; Thornton, Joseph W (2017) Reconstructing Ancient Proteins to Understand the Causes of Structure and Function. Annu Rev Biophys 46:247-269 |
Starr, Tyler N; Thornton, Joseph W (2017) Exploring protein sequence-function landscapes. Nat Biotechnol 35:125-126 |
Starr, Tyler N; Picton, Lora K; Thornton, Joseph W (2017) Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549:409-413 |
Siddiq, Mohammad A; Hochberg, Georg Ka; Thornton, Joseph W (2017) Evolution of protein specificity: insights from ancestral protein reconstruction. Curr Opin Struct Biol 47:113-122 |