Background: A major challenge in evolutionary genomics is to characterize the forces shaping present-day patterns of genetic variation. For instance, the extent and manner in which natural selection affects genetic diversity remains highly controversial. Researchers have largely addressed this problem by developing statistical tests or summaries of genome sequence variation that provide insights into the evolutionary forces at play. However, because such approaches typically rely on a single univariate summary of the data, valuable discriminatory information present in the original dataset is lost. A more fruitful strategy would thus be to use multidimensional summaries of genomic data (e.g. a large vector of summary statistics) or even the totality of the input data (e.g. a matrix-representation of a sequence alignment) to make more accurate inferences. An even more powerful approach is to utilize data sets in which the same population is sampled at multiple time points, allowing one to observe evolutionary dynamics in action. Although such genomic time-series data are becoming more prevalent, the development of appropriate computational methodologies has lagged behind the proliferation of such data. Proposal: The Schrider Lab seeks to develop and apply powerful machine learning methods for evolutionary inference. Our work over the next five years will yield powerful software tools leveraging novel representations of genomic datasets, including time-series data. These efforts will dramatically improve researchers' ability to make accurate evolutionary inferences from both population genomic and phylogenetic data. Indeed, preliminary results demonstrate that our methods vastly outperform current approaches in evolutionary genetics. More importantly, we will use these tools to answer pressing evolutionary questions. In particular, our use of time-series data will reveal loci responsible for recent adaptation with much greater confidence than currently possible. Our efforts will help to resolve the controversy over the role of adaptation in shaping patterns of diversity across the human genome. This research has important implications for public health as well, as genes underlying recent adaptations are enriched for disease-associations. Moreover, we are constructing a time-series dataset in the mosquito vector species Aedes aegypti and Aedes albopictus. We will interrogate these data for evidence of recent and ongoing adaptation?this work will reveal loci responsible for the evolution of resistance to insecticides and other control efforts. Encouraging preliminary data also suggest that our work in phylogenetics will substantially improve inferential power in this important research area. More broadly, the success of the novel approaches described in this proposal has the potential to transform the methodological landscape of evolutionary genomic data analysis.
The work proposed here seeks to develop and apply powerful machine-learning based software tools for evolutionary genetic inference in humans, mosquito vectors, and other species. Such efforts have important health implications, as they can identify genes involved in adaptation, which in humans are often associated with disease and in mosquitos are often associated with resistance to insecticides and other control efforts.