Genetic variants that disrupt the functionality of regulatory sequences, and thereby alter gene expression levels, are major contributors to both evolutionary divergence between species and differences in risk for complex disease among humans. However, due to the complexity of the gene regulatory programs encoded in mammalian genomes and their rapid turnover between species, evaluating the function of non-protein-coding mutations is challenging. This is a major roadblock to tracing the evolution of human-specific biology. In addition, since the majority of disease-associated variants are non-coding, it impairs our ability to map the genetics of complex disease. The long-term mission of my lab is to interpret the complex gene regulatory programs encoded in the human genome and accurately model the effects of genetic mutations to these elements on phenotypes relevant to disease and human evolution. We work toward these goals by integrating cutting-edge machine learning, statistical modeling of evolution, and the analysis of genotypes and phenotypes from large-scale clinical biobanks. In particular, my lab is uniquely well positioned to build on our previous work to address the following fundamental questions: 1. How have evolutionary transitions on the human-lineage modified the genome?in particular gene regulatory programs?to produce human-specific biology? And how do these modifications relate to human-specific disease risk? 2. What are the combinatorial rules underlying how TF binding patterns specify precise control of gene regulation? And how do these gene regulatory ?programs? evolve between species? 3. How do genetic and epigenetic mechanisms interact to specify the dynamic gene regulatory programs that drive cellular development? And how are these programs perturbed in disease? 4. How can we interpret non-protein-coding mutations identified in patient genomes to inform treatment and preventative care? Our work will produce much-needed methods for understanding the effects of mutations to gene regulatory regions and identify mutations responsible for differences in disease risk between human populations.
The genetic mutations that distinguish humans from other great apes and that influence risk for complex disease are mostly found outside of genes. Many of these mutations influence when and where genes are expressed. However, we do not know how to interpret the effects of most such mutations on gene regulation, disease risk, or human evolution, we will develop computational models that leverage machine learning, evolutionary patterns, and large-scale clinical biobanks to identify mutations that disrupt proper control of genes and lead to disease.