Mutations are the ultimate source of genetic variation and one of the driving forces of evolution. Both the absolute mutation rate and the relative rate among mutation subtypes fluctuate along the genome, affected by adjacent nucleotide motifs and local features such as GC content and replication timing. Characterizing regional variation of mutation patterns is critical for understanding genome evolution and to identify variants causing genetic diseases. However, mutation rate and molecular spectrum are difficult to measure at high resolution, genomewide, and in an unbiased fashion. Estimates based on common variants and between- species substitutions are confounded by natural selection, population demographic history, and biased gene conversion (BGC). Methods relying on incidence rates of monogenic diseases or finding de novo variants by trio sequencing can inform global trends, but do not provide sufficient data to assess fine-scale local parameters. This study will overcome these limitations by using the extremely rare variants (ERVs) as a new data source to characterize patterns of recent germline variation in humans. ERVs, defined in this study as singletons in 30,000 samples, are becoming available via large-scale whole-genome sequencing (WGS) of population samples. Unlike common variants or substitutions, ERVs arose very recently and are largely unaffected by selection, BGC, etc. We will analyze 200-300 million singleton variants observed in 30,000 subjects at 20-30X coverage. The regional distribution of ERV subtypes will establish a quantitative atlas of the rate and spectrum of human germline mutations mostly unaltered by selection. We will share this resource with the research community and apply it to determine the impact of local genomic features and epigenomic attributes. We will use the systematic departures between ERVs and variants of higher frequencies (polymorphisms and substitutions) to infer local effects of selection, and this may uncover hitherto unknown functional regions of the genome. By comparing mutation signatures in ERVs with those in somatic variations observed in diverse cancers we will attribute distinct mutational signatures to known biochemical processes and thus infer the major contributors to new germline mutations in the human genome. This subtype-specific atlas will also be used to predict the probability of observing every possible single-base mutation in the genome, thus facilitating the interpretation of candidate causal variants of human diseases. We will assess mutation pattern differences among European Americans, African Americans and Latinos, and seek to discover genetic modifiers of germline mutation rate by finding functionally damaging mutations that show increased ERV counts in the surrounding genomic region, potentially identifying both known and previous unknown mutator genes that play a role in transmission fidelity in humans. This research will provide an essential resource to study the genesis and maintenance of germline mutations in humans. Understanding such a fundamental process will be the basis for a deeper understanding of human evolution and diseases.

Public Health Relevance

We will study the patterns of inherited mutations in humans using approximately 250 million extremely rare DNA variants in human populations. Our results will allow the prediction of the rate of new mutations at every site in the genome based on features of the surrounding DNA sequence, thus providing a common resource to study the arrival and maintenance of mutations in humans. Understanding such a basic process is important for answering fundamental questions in human evolution, the cause of inherited diseases, and the role of DNA abnormality in cancer and aging.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
5R01GM118928-02
Application #
9275505
Study Section
Genomics, Computational Biology and Technology Study Section (GCAT)
Program Officer
Krasnewich, Donna M
Project Start
2016-06-01
Project End
2020-05-31
Budget Start
2017-06-01
Budget End
2018-05-31
Support Year
2
Fiscal Year
2017
Total Cost
$300,742
Indirect Cost
$103,242
Name
University of Michigan Ann Arbor
Department
Genetics
Type
Schools of Medicine
DUNS #
073133571
City
Ann Arbor
State
MI
Country
United States
Zip Code
48109
Carlson, Jedidiah; Li, Jun Z; Zöllner, Sebastian (2018) Helmsman: fast and efficient mutation signature analysis for massive sequencing datasets. BMC Genomics 19:845
Carlson, Jedidiah; Locke, Adam E; Flickinger, Matthew et al. (2018) Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun 9:3753