The intent of this proposed research is to contribute to the basic understanding of the structure and function of specific DNA sequences found near centromeres. Despite extensive sequencing of the human genome, the DNA sequences residing within the centromere and the adjacent regions (the pericentromere) are vastly unknown, due to their highly repetitive nature. These sequences are subject to strict regulation, so they are not normally transcribed into RNA. However, in cancer cells, HSATII, a tandemly repeated pericentric satellite sequence, is aberrantly transcribed into RNA. In these cells, HSATII RNA accumulates in the nucleus, adjacent to its site of transcription, where it recruits nuclear regulatory proteins. While HSATII DNA is found on eleven different human chromosomes, only a few of these locations transcribe HSATII RNA, suggesting that the regulation and sequence composition may vary from one chromosome to another. Thus, a full analysis of the HSATII sequences residing on individual human chromosomes is long overdue and merited, and promises to ultimately drive studies to uncover the consequence of pericentric satellite transcription. In order to study tandemly repeated sequences, a combinatorial approach is proposed to map and define HSATII variants within individual chromosomes by integrating both in situ and in silico datasets, which will produce a rich repository of HSATII sequences to further inform functional studies. It is hypothesized that individual HSATII loci will harbor unique sequence variants and that integrated sites of HSATII expression will be more prone to chromosomal breakage, resulting in cell division defects. The hypothesis will be tested by mapping chromosome-specific HSATII variants in the human genome (Aim 1) and by testing the role of HSATII expression in promoting cell division defects (Aim 2). The functional effect of HSATII expression will be further tested to determine its capacity to interact with additional nuclear regulatory proteins; this will be accomplished by identifying the suite of proteins binding to nuclear accumulations of HSATII RNA (Aim 3). It is anticipated that all of the proposed research will be conducted by undergraduate students under the close supervision of the PI. Thus, the proposed project promises to engage and train undergraduate researchers in innovative genomics, cytological and proteomics techniques, which will propel future careers in genomics and biomedical research.

Public Health Relevance

The goal of this project is to understand the basic structure and function of specific DNA sequences found near chromosomal centromeres ? constricted regions to which spindle fibers attach, enabling duplicated chromosomes to segregate to opposite poles of a dividing cell. Tandemly repeated DNA sequences reside within these regions of chromosomes and have been historically poorly studied due to difficulties in the genomic assembly of repetitive DNA sequences. In cancer cells, these repetitive sequences are misregulated such that they become expressed, and we aim to understand both their sequence diversity and consequences of their expression.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Academic Research Enhancement Awards (AREA) (R15)
Project #
Application #
Study Section
Molecular Genetics A Study Section (MGA)
Program Officer
Gaillard, Shawn R
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Swarthmore College
Schools of Arts and Sciences
United States
Zip Code