Cotton has one of the lowest amounts of genetic diversity of any of the major crop plants in the U.S. Much of the natural genetic diversity contained in the wild was lost during the thousands of years of breeding since the original founding population. Developments in DNA sequencing technology have enabled new strategies to quantify and characterize this missing genetic diversity. This project will select and re-sequence 506 accessions of cotton (and its wild, diploid relatives). These DNA sequences will be used to identify individual plants that contain unique genetic diversity. The understanding of genetic diversity will illuminate basic biological processes in cotton and will provide genomic resources for improvement of the domestic cotton crop. This project will also provide summer research training internships for teachers (grades 7-12) at both Brigham Young University and Iowa State University. The program will introduce the conceptual and experimental foundations of genome analysis strategies. Teachers will then be able to share their knowledge of genome sequencing with their high school and middle school students.
The amount and apportionment of genetic diversity in domesticated plants and their wild relatives are key concerns for crop improvement. This project will re-sequence 506 accessions of cotton that include a broad representation of the commercially important allopolyploid (AD-genome) species G. hirsutum and G. barbadense as well as related wild species representing other allopolyploids and progenitor diploid (A, D) genomes. Sequence reads generated through high-coverage sequencing of DNA libraries for each accession will be mapped to the reference cotton genome using a variety of computational tools to begin to identify and quantify nucleotide variation along chromosomes of various accessions. Population genomics analyses will reveal unique alleles, single nucleotide polymorphisms (SNPs), copy number variants (CNVs) in accessions within the public cotton germplasm collection. Using this information, a pan-genome of cotton will be synthesized to better understand the scope, pace, and pattern of genomic variations that arise during speciation, population differentiation, and domestication and breeding. All project outcomes will be made available to the public through a project website. Sequence data will be accessible through long-term repositories such as GenBank, the NCBI's SRA, and Phytozome. Data will also be available long-term through CottonGen (www.cottongen.org/), a community genomics, genetics and breeding database for cotton.