Adjacent protein domains interact to fold into a functional protein structure. Adjacent binding sites in the DNA interact with the multiprotein complex for efficient binding. Yet, both protein domains and clusters of binding sites are encoded as one-dimensional arrays in the genome. This project tests the hypothesis that, in order for specific and correct interactions to occur, there are optimal distances between these functional units and that they are maintained across evolutionary time. Since insertions and deletions of DNA (indels) change the distance between these functional units, the genome will be under evolutionary constraint against indel mutations that affect the distance. To test this hypothesis, the investigator will develop software to systematically estimate and compare the changes in distance between functional elements and statistical models to test the likelihood of the events observed. Upon project completion, the scientific community will have tools that identify distances that are conserved, and will be able to predict the effect and importance of indel mutations occurring in these genomic regions. The investigator will hold workshops for girls in grades 6-12 to develop software games that model the concept of evolutionary constraint. She will also develop undergraduate and graduate classes with hands-on activities on molecular evolution and computational sequence analysis.
The goal of this project is to utilize the variation in rates of indels to infer the evolutionary constraint on the distance between functional elements in the genome. Experimental evidence has been accumulating on selection against indels in the loops and linkers within proteins, or in the space between binding sites of regulatory elements. But, studies on the evolution of these sequences are almost nonexistent, due to the difficulty in aligning these sequences. This project addresses these challenges by applying methods that can model variable indel rates across sites or methods that model length instead of relying on alignments. Using these methods, the investigator will produce phylogenetic, and quantitative estimates of indel rates on a significant proportion of the genome that has been neglected so far. In Objective 1, using a new software the investigator has developed, variable site-specific indel rates will be estimated across loops between protein motifs to identify structural motifs with strong constraints on their distance. In Objective 2, the investigator will develop a new software based on birth-death processes to estimate indel rates without relying on alignments. Using this software, she will test the hypothesis that there is stronger constraint on the distance between tandem homologous domains, compared to non-homologous domains. In Objective 3, the investigator will use the software described above to test the hypothesis that there is stronger constraint on the distance between binding sites of homodimers, compared to the distance between binding sites of heterodimers. This study will integrate the knowledge gained in the fields of structural biology and developmental biology into a phylogenomic context, and provide tools for the community to test specific evolutionary hypotheses on distance between functional elements of interest. The results of the project will be presented at https://github.com/HanLabUNLV.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.