Many features of virus populations make them ideal candidates for population genetic study, including a very high rate of mutation, high levels of nucleotide diversity, exceptionally large census population sizes, and frequent positive selection. However, these attributes also mean that special care must be taken in population genetic inference. For example, highly skewed progeny distributions, frequent and severe population bottleneck events associated with infection and compartmentalization, and strong selection all affect the distribution of genetic variation but are generally not taken into account. Thus, improved inference of viral populations will necessarily require not only theoretical development, but also the implementation of this developed theory into statistical inference tools capable of analyzing thousands of viral genomes in a computationally efficient manner. Here, I propose these necessary developments (Aims 1-2), as well as present an application to two exceptionally deep datasets to which we have unique access via our consortium affiliations (Aim 3). In total, this proposal represents not only a significant step in forwarding our understanding of population genetics in these extreme parameter spaces, but will also provide valuable clinical insights that are expected to improve future patient treatment strategies.
Having demonstrated the strong population genetic mis-inference that may occur when the pervasive effects of skewed progeny distributions are not taken in to account, we will here develop novel theory and statistical methodology to infer the parameters underlying this process, as well as the demographic and adaptive history of the populations in question. This inference will be of particular value in studying viral evolution, and I highlight two examples with meaningful evolutionary and clinical implications.