Post-zygotic mutations, both those arising early in life (mosaic) and acquired in somatic tissues throughout life, are present in a sub-population of cells and have been implicated in a variety of disorders, but the prevalence of these mutations in the general population is largely unknown due to limitations in tools and reference datasets. Recent studies have suggested that post-zygotic mutations can have a profound impact on disease, particularly neuropsychiatric and neurodegenerative disorders. From these prior studies, it is clear that post-zygotic mutations are prevalent in the population and could represent a significant contribution to human disease, but systematic analyses in very large reference sets using tightly benchmarked tools are needed. Prior studies have been limited in scope, focusing solely on single nucleotide variants (SNVs) and small indels, or a narrow class of mega-base scale structural variation (SV). The emergence of population-scale whole-genome sequencing (WGS) in controls, and tens of thousands of case cohorts, offers the first opportunities to interrogate a mutational spectrum of mosaic variation including SV, SNVs, and indels at WGS-resolution. In this fellowship, I propose that post-zygotic mutations are abundant in the population across the size spectrum, increase with age, and can have a measurable impact on gene function. To further describe this cryptic class of variation in the population, across ages, and in disease cohorts, I plan to first optimize and benchmark recently developed mosaic SV detection algorithms, then, using these tools, derive allelic fraction thresholds to distinguish mosaic variation from somatic variation (Aim 1). I will then apply these tools and annotations to a population-scale WGS reference dataset in the genome aggregation database (gnomAD). These analyses will determine the incidence and prevalence of post-zygotic mutations at sequence resolution and determine the variance explained by age in the accumulation of somatic mutations in the population (Aim 2). Finally, I will directly test prior hypotheses that post-zygotic mutations influence disease risk from WGS data in an early onset neurodevelopmental disorder (autism) cohort and a late onset neurodegenerative disorder (Alzheimer?s disease) cohort, as well as matched controls, to determine the differential influence of early mosaic variation and the accumulation of somatic variation on disease risk and brain function (Aim 3). Collectively, the aims outlined in this proposal will leverage unique tools and resources to further characterize this underappreciated class of genomic variation and its influence on human disease, as well as provide outstanding mentorship in each of my targeted areas of development during my PhD training.
Post-zygotic mutation, also known as mosaic and somatic variation, refers to genetic differences present in only a portion of an individual?s cells, and has been shown to influence disease. In this proposal, I will systematically process whole-genome and exome sequencing from hundreds of thousands of individuals to explore post-zygotic mutations in the general population, build the first population maps of such variation at sequence resolution, and then quantify the contribution of this unique class of variation to neurodevelopmental and neurodegenerative disease. This work will derive fundamental resources for the field, yield new insight into the prevalence of different post-zygotic variant classes, and provide context for their contribution to genome function and disease.