Natural selection can be challenging to study. Fitness differences that are too small to directly measure can have profound evolutionary consequences. This has motivated the creation of statistical tools for characterizing natural selection from data sets of naturally occurring genetic variation. Selection operates on phenotype but this tends to be ignored by these statistical tools. Instead, the fitness associated with each allele or genotype is often treated as a free parameter. Our research employs computational methods for predicting phenotype from DNA sequence data. This enables our statistical procedures to extract more information about selection from data sets and it facilitates studies of the impact of phenotype on evolution of the genotype. Another unconventional feature of our research is that we analyze interspecific data but frame estimates with respect to population genetics. We do this because most of evolutionary history can be studied only through interspecific comparisons and because population genetics is the natural framework within which to study selection. Our research focuses on natural selection to maintain protein structure, but our inference strategies can assess the evolutionary impact of other phenotypes. To better understand the influence of tertiary structure, we will simultaneously examine the evolutionary roles of context-dependent mutation, codon usage, and mRNA abundance. The main consequence of our more realistic evolutionary models will be better population genetic inferences about natural selection from interspecific data, but the models also have the potential to assist with applications ranging from ancestral sequence reconstruction to inferring adaptive landscapes. Simulation will help to evaluate the quality of our population genetic inferences from interspecific data and will let us determine how to improve these inferences. We will devote particular attention to the situation where populations have concurrent fitness-affecting polymorphisms that interfere with each other via the Hill-Robertson effect. Because our interspecific models are framed with respect to population genetics, we can combine interspecific and intraspecific data in a sensible way. This explicit evolutionary perspective will lead to improvement of an already successful approach for predicting which nonsynonymous variation has effects on human health.
This project will lead to improved understanding of the role that natural selection has in shaping genetic variation. Via this improved understanding, we will develop statistical techniques for identifying which variation in protein-coding genes is likely to be deleterious to human health.
Showing the most recent 10 out of 34 publications