Long spans of tandemly arrayed, highly repetitive sequences known as satellite repeats form an important and underexplored component of eukaryotic genomes. These satellite regions play vital roles especially in centromere and telomere function, and their dysregulation has been associated with several human genetic disorders. The repetitive nature of satellite sequences render them nearly impossible to assemble, thwarting efforts at further analysis. We have devised a pair of approaches that lets us explore patterns of kmer ?word? repeats and large-scale structures from unassembled reads to quantitatively describe the satellite repeat composition of a genome. We have discovered that these features are highly variable across lines, opening the door to understanding how satellite repeat variability evolves and how it impacts the biology of the organism.
In Specific Aim 1 we will characterize the patterns and rates of turnover of satellite repeats in Drosophila. We will apply our novel computational methods to quantitatively characterize satellite repeats using DNA sequence data from population re-sequencing surveys and reference genomes of Drosophila species. Long- read sequences from PacBio runs and using GemCode technology from 10x Genomics will allow anchoring of a subset of repeats to neighboring euchromatin. These data will allow us to identify and quantitate satellite repeats, determine their species distribution, and infer rates of change and inter-satellite correlations within species and along the phylogeny. We apply Gaussian process models to recapitulate the rates and patterns of change of satellite repeats in fly genomes, and test whether patterns of satellite repeats are consistent with models of mutation-drift balance.
In Specific Aim 2 we will determine mutational profiles of satellite DNA changes in mutation-accumulation lines of Drosophila. Satellite repeats display rapid turnover across closely related species, which motivates examining whether a purely mutational process is driving the evolutionary divergence. By contrasting satellites from genomes of flies derived from mutation accumulation lines, we will quantify key attributes of the mutational divergence. We will also construct reversible mutants of Su(var)3-9, which is required for heterochromatin maintenance, and test its effects as a sensitized background for mutation accumulation.
In Specific Aim 3 we will test models of centromere- and telomere-drive by scoring segregation in F1s that have large differences in satellite repeat abundances. During female meiosis, any factor with an increased chance of ending up in an ovum instead of a polar body will have great evolutionary advantage. Prior publications suggest that such meiotic distorters play a role in shaping the evolution and distribution of satellite repeats. We will make F1 females in crosses between lines with specific differences in satellite repeats, and then score the progeny of these F1 females by sequencing to search genome-wide for distortions from Mendelian segregation. In sum, this project leverages novel analytical approaches to address long-standing and fundamental questions about satellite repeat evolution.

Public Health Relevance

This project aims to apply novel bioinformatics approaches and experimental designs to quantitatively describe and to understand the process of turnover of heterochromatic satellite DNA sequences in the genome. Focusing on the genomes of Drosophila species, we will make use of analysis of both short repeated ?words? or ?kmers? and complex satellite structures in the genome sequences of inbred lines of diverse species and of mutation-accumulation lines. We will model satellite changes as a Gaussian process, and score their meiotic behavior by testing for departures from Mendelian segregation by genome sequencing of backcross progeny.

Agency
National Institute of Health (NIH)
Institute
National Institute of General Medical Sciences (NIGMS)
Type
Research Project (R01)
Project #
1R01GM119125-01A1
Application #
9238932
Study Section
Genetic Variation and Evolution Study Section (GVE)
Program Officer
Janes, Daniel E
Project Start
2017-03-03
Project End
2021-02-28
Budget Start
2017-03-03
Budget End
2018-02-28
Support Year
1
Fiscal Year
2017
Total Cost
$329,970
Indirect Cost
$119,970
Name
Cornell University
Department
Biochemistry
Type
Schools of Arts and Sciences
DUNS #
872612445
City
Ithaca
State
NY
Country
United States
Zip Code
14850
Lower, Sarah Sander; McGurk, Michael P; Clark, Andrew G et al. (2018) Satellite DNA evolution: old ideas, new approaches. Curr Opin Genet Dev 49:70-78
Flynn, Jullien M; Lower, Sarah E; Barbash, Daniel A et al. (2018) Rates and Patterns of Mutation in Tandem Repetitive DNA in Six Independent Lineages of Chlamydomonas reinhardtii. Genome Biol Evol 10:1673-1686
Wei, Kevin H-C; Lower, Sarah E; Caldas, Ian V et al. (2018) Variable Rates of Simple Satellite Gains across the Drosophila Phylogeny. Mol Biol Evol 35:925-941
Kelsey, Keegan J P; Clark, Andrew G (2017) Variation in Position Effect Variegation Within a Natural Population. Genetics 207:1157-1166
Flynn, Jullien M; Caldas, Ian; Cristescu, Melania E et al. (2017) Selection Constrains High Rates of Tandem Repetitive DNA Mutation in Daphnia pulex. Genetics 207:697-710