Exhaustive Analysis of Microsatellite Loci in the 1000 Genomes Project

Garner, Harold

Abstract

The study of repetitive DNA, microsatellites, a class of genomic variation which exhibits a 10,000 fold higher mutability than single nucleotide polymorphisms has been hampered by the lack of data at microsatellite- containing loci. That is, until now, with the emergence of data from the 1000 Genomes Project. We hypothesize that these hypervariable loci, once analyzed in depth will yield a new appreciation for their value and role in the genome as new biomarkers and functional elements. Baseline measurements of the variability at these loci in the substantial 1000 Genomes Project cohort will provide important information required to exploit these loci, both computationally and in the laboratory. The primary goal of the proposed research is to complete an exhaustive analysis and interpretation of the ~700,000 microsatellite loci using the -2,500 sets of genome sequence becoming available from the 1000 Genomes Project to measure their size, purity and motif dependent distributions and then overlay those data with metadata (gene ontologies, conservation and more) to create a resource where we an others can explore the significant, yet underappreciated role of microsatellite polymorphism in human variation and disease. We have demonstrated the techniques required and impactful preliminary results confirm feasibility and value and potential.
Specific aims 1) align all 1000 Genomes Project sequence data to the microsatellite containing loci to measure the allelic distribution, polymorphism rate, characteristics, quality of the sequence in these repetitive regions;inspect and characterize groups of motif lengths and families (AAT,AAAT,AATT, etc.) to look for evidence for selection pressure, bias and genome wide trends;2) compare the distributions with models for estimating polymorphism propensity as a function of specific sequence motifs, motif size, copies and purity (are there any SNPs), thus identifying any general replication or error correction mechanism bias, which we suspect;3) annotate each locus with ontology, conservation and other positional data to identify any process, functional or disease propensity correlations;and 4) create a web resource to distribute our findings and other reagents derived from this study so others can investigate microsatellite sequence variability at individual loci or across the genome.

Public Health Relevance

The human genome contains over 500,000 areas with repeated DNA sequence (e.g. CACACACACA) called microsatellites. They are extremely variable, cause numerous diseases, are used in forensics/ paternity testing and may alter many of our characteristics, but they are understudied and under- appreciated. The 1000 Genome Project data enables their thorough analysis en masse by our methods.

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Human Genome Research Institute (NHGRI)
Type: Research Project--Cooperative Agreements (U01)
Project #: 5U01HG005719-02
Application #: 8099068
Study Section: Special Emphasis Panel (ZHG1-HGR-M (J1))
Program Officer: Brooks, Lisa

Project Start: 2010-06-26
Project End: 2013-04-30
Budget Start: 2011-05-01
Budget End: 2013-04-30
Support Year: 2
Fiscal Year: 2011
Total Cost: $265,043
Indirect Cost

Institution

Name: Virginia Polytechnic Institute and State University
Department
Type: Organized Research Units
DUNS #: 003137015

City: Blacksburg
State: VA
Country: United States
Zip Code: 24061

Related projects


NIH 2011 U01 HG	Exhaustive Analysis of Microsatellite Loci in the 1000 Genomes Project Garner, Harold R. / Virginia Polytechnic Institute and State University	$265,043
NIH 2010 U01 HG	Exhaustive Analysis of Microsatellite Loci in the 1000 Genomes Project Garner, Harold R. / Virginia Polytechnic Institute and State University	$263,999

Publications

Tae, Hongseok; Karunasena, Enusha; Bavarva, Jasmin H et al. (2014) Large scale comparison of non-human sequences in human sequencing data. Genomics 104:453-8

Tae, Hongseok; Kim, Dong-Yun; McCormick, John et al. (2014) Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs. Bioinformatics 30:652-9

McIver, L J; McCormick, J F; Martin, A et al. (2013) Population-scale analysis of human microsatellites reveals novel sources of exonic variation. Gene 516:328-34

Tae, Hongseok; McMahon, Kevin W; Settlage, Robert E et al. (2013) ReviSTER: an automated pipeline to revise misaligned reads to simple tandem repeats. Bioinformatics 29:1734-41

Tae, Hongseok; Settlage, Robert E; Shallom, Shamira et al. (2012) Improved variation calling via an iterative backbone remapping and local assembly method for bacterial genomes. Genomics 100:271-6

Garner, H R (2011) Combating unethical publications with plagiarism detection services. Urol Oncol 29:95-9

McIver, L J; Fondon 3rd, J W; Skinner, M A et al. (2011) Evaluation of microsatellite variation in the 1000 Genomes Project pilot studies is indicative of the quality and utility of the raw data and alignments. Genomics 97:193-9

Galindo, Cristi L; McIver, Lauren J; Tae, Hongseok et al. (2011) Sporadic breast cancer patients' germline DNA exhibit an AT-rich microsatellite signature. Genes Chromosomes Cancer 50:275-83

Comments

Be the first to comment on Harold Garner's grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants: