My laboratory is pursuing the genetic basis of recently evolved human traits that are not observed in chimpanzees, or other great apes. We are working towards this goal by identifying and functionally characterizing the fastest evolving regions in the human genome. While this line of research has been pursued before, the studies have always focused on the 5% of the genome that shows strong cross-species sequence constraint in non-human mammals, and is therefore clearly functional. However, we now know there are many functional regions that do not show strong cross-species constraint across diverse mammals. It is also true that many phenotypic transitions seen on the branch to humans, such as brain expansion and upright posture, are also seen in other mammalian lineages. For example, dolphins and elephants have larger brains than humans. Lemurs and macropods (e.g. kangaroos) also have a spine that is generally perpendicular to the ground. For these reasons, we hypothesize that many genomic elements underlying human adaptations and disease risks will not be restricted to the 5% of the genome showing strong cross-species conservation in other mammals. My laboratory is therefore venturing into the other 95% of the human genome, in search of regions that have rapidly changed in humans, but may also vary among other species. This has caused us to be faced with the problems that originally restricted researchers to the 5% of mammalian genomes that is highly constrained. How do we identify which of our 2089 regions are functional, and how do we know if the sequence changes on the human lineage change their function? This is a common disconnect in genomics research: a computational screen has generated thousands of interesting mutations, whether it be fast evolving regions in humans, or haplotypes from genome-wide association studies, but making mouse models for in depth study of these mutations can only be done for one to ten mutations. This results in a disconnect of two orders of magnitude between the mutations we can identify for further analysis and those we can further analyze. The vast majority of the 2089 fastest evolving regions in the human genome are located outside of protein-coding exons and are likely regulating the spatial and temporal expression of genes during development. While there are high-throughput methods for testing enhancer activity in cell lines, the cell type of activity is not usually known beforehand, and cell lines do not capture the complexity of cell types transiently present during development. To address this problem for our own work, and with the intention of it being adopted by other groups, we are developing a method to assay the gene regulatory potential of thousands of DNA segments in developing tissue and at single cell resolution.
Many disease susceptibilities are not caused by changes in the genes themselves, but rather by changes in their regulatory elements that control when and where the gene turns on and off during development. However, there are not currently high-throughput methods to understand the regulatory potential of DNA sequences during development, or how a mutation may change this function. We are developing such a method and expect this new method to lead to a better understanding of the genetic differences that lead to disease risk, and ultimately improve treatment options.