Mutations are the ultimate source of genetic variation and one of the driving forces of evolution. Both the absolute mutation rate and the relative rate among mutation subtypes fluctuate along the genome, affected by adjacent nucleotide motifs and local features such as GC content and replication timing. Characterizing regional variation of mutation patterns is critical for understanding genome evolution and to identify variants causing genetic diseases. However, mutation rate and molecular spectrum are difficult to measure at high resolution, genomewide, and in an unbiased fashion. Estimates based on common variants and between- species substitutions are confounded by natural selection, population demographic history, and biased gene conversion (BGC). Methods relying on incidence rates of monogenic diseases or finding de novo variants by trio sequencing can inform global trends, but do not provide sufficient data to assess fine-scale local parameters. This study will overcome these limitations by using the extremely rare variants (ERVs) as a new data source to characterize patterns of recent germline variation in humans. ERVs, defined in this study as singletons in 30,000 samples, are becoming available via large-scale whole-genome sequencing (WGS) of population samples. Unlike common variants or substitutions, ERVs arose very recently and are largely unaffected by selection, BGC, etc. We will analyze 200-300 million singleton variants observed in 30,000 subjects at 20-30X coverage. The regional distribution of ERV subtypes will establish a quantitative atlas of the rate and spectrum of human germline mutations mostly unaltered by selection. We will share this resource with the research community and apply it to determine the impact of local genomic features and epigenomic attributes. We will use the systematic departures between ERVs and variants of higher frequencies (polymorphisms and substitutions) to infer local effects of selection, and this may uncover hitherto unknown functional regions of the genome. By comparing mutation signatures in ERVs with those in somatic variations observed in diverse cancers we will attribute distinct mutational signatures to known biochemical processes and thus infer the major contributors to new germline mutations in the human genome. This subtype-specific atlas will also be used to predict the probability of observing every possible single-base mutation in the genome, thus facilitating the interpretation of candidate causal variants of human diseases. We will assess mutation pattern differences among European Americans, African Americans and Latinos, and seek to discover genetic modifiers of germline mutation rate by finding functionally damaging mutations that show increased ERV counts in the surrounding genomic region, potentially identifying both known and previous unknown mutator genes that play a role in transmission fidelity in humans. This research will provide an essential resource to study the genesis and maintenance of germline mutations in humans. Understanding such a fundamental process will be the basis for a deeper understanding of human evolution and diseases.

Public Health Relevance

We will study the patterns of inherited mutations in humans using approximately 250 million extremely rare DNA variants in human populations. Our results will allow the prediction of the rate of new mutations at every site in the genome based on features of the surrounding DNA sequence, thus providing a common resource to study the arrival and maintenance of mutations in humans. Understanding such a basic process is important for answering fundamental questions in human evolution, the cause of inherited diseases, and the role of DNA abnormality in cancer and aging.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Michigan Ann Arbor
Schools of Medicine
Ann Arbor
United States
Zip Code
Carlson, Jedidiah; Li, Jun Z; Zöllner, Sebastian (2018) Helmsman: fast and efficient mutation signature analysis for massive sequencing datasets. BMC Genomics 19:845
Carlson, Jedidiah; Locke, Adam E; Flickinger, Matthew et al. (2018) Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun 9:3753