Plant biologists need many completely sequenced and functionally annotated genomes within each species in order to fully exploit the power of evolution to understand how an organism functions and adapts to its environment. Researchers interested in natural variation in Arabidopsis propose to generate genomic DNA sequences from over 1000 inbred strains, driving technology developments in both hardware for the DNA sequencing itself and in software development to make sense of the DNA sequence data. The goal of this research project is to record the genetic variation in the entire genome of many strains of the reference plant Arabidopsis thaliana. We will develop and apply cutting edge DNA sequencing approaches using the reference plant A. thaliana to address questions of fundamental importance about plant evolution and gene function. The complete genome sequences for 200 accessions, produced as a result of this project, will provide the first complete view of haplotype structure for Arabidopsis thaliana and will allow future studies of epigenetic variation among different individuals in a population or within a species, a potential source of phenotypic diversity. The patterns of sequence and structural variation will reveal important insights into the dynamics of genome change and pinpoint potentially functionally important sources of genetic and epigenetic variation. Moreover, these data will enable subsequent mechanistic studies through experimental manipulation of Arabidopsis strains.
The 1001 Arabidopsis Genomes Project (http://1001genomes.org) will provide detailed genotyping data of wild strains that will complement the efforts of individual investigators to phenotype these same accessions for thousands of traits of interest. For example, this research has the potential for rapid advancement toward the mechanisms by which plants adapt to various climates, utilize soil nutrients and resist pathogen infection. The knowledgebase produced from the 1,001 Arabidopsis Genomes Project will yield direct and measurable outcomes for deployment of similar traits in economically important crops for a changing global environment.
Broader Impacts of the Proposed Research The impact of this project will be in two broad areas. First, the completion of the planned research will result in important new resources for the plant biology community: large-scale information on genetic variation among closely related genotypes. The very limited availability of whole genome sequence variation information has negatively impacted a variety of research endeavors such as the understanding of adaptive evolution or the development of association mapping. All of the DNA sequence data will be made freely and easily accessible to the research community. The long-term impact of these enabling tools and technologies on agriculture and forestry is expected to be profound, providing fundamental knowledge for the construction of new plant varieties with superior agronomic traits. An equally important aspect of this program is training, which will be provided at a variety of levels, including outreach to high school and undergraduate students as well as postdoctoral mentoring.
Summary of results. Understanding the relationship between genotype and the resulting phenotype is one of the greatest challenges in plant biology today. Complex environments are constantly challenging a plants ability to thrive in nature forcing them to locally adapt to their surroundings. Identifying genes that are important for adaption is paramount to our understanding of complex traits and eventual translation into crop biology. A complete catalog of genetic variation that exists within the plant kingdom would be an ideal place to begin to understand all phenotypic forms but is currently unrealistic. Instead, collaboration among plant scientists has formed and initiated the 1,001 genomes project. This 24-month project aimed to identify genetic variation that exists within the Arabidopsis thaliana species. Thousands of Arabidopsis thaliana accessions have been collected from a wide range of geographical locations. Furthermore, these accessions display a broad range of phenotypes that are shared among many species within the plant kingdom. Our laboratory joined the worldwide public effort to produce 1,001 Arabidopsis thaliana genomes. We contributed 171 genomes sequences and associated structural and nucleotide sequence variation information. In addition to genetic variation, epigenetic variation is clearly an influential factor in determining phenotype. We developed methods to determine genome-wide DNA methylation profiles at single-base resolution (methylomes). Because of the recent increase in sequencing capacity/instrument run, we were able generated DNA methylome and strand-specific transcriptome information for most accessions as an additional part of our plan to capture all possible forms of variation (the methylome and transcriptomes were not part of the original aims but due to the decreasing cost of sequencing were able to carryout these studies without additional funding). Publications Resulting from this NSF Award. Schmitz RJ, Schultz MD, Lewsey MG, O'Malley RC, Urich MA, Libiger O, Schork NJ, Ecker JR. Science. (2011) 334:369-73. Summary of description of Data, Samples, Physical collections and Products. We have completed sequencing from 171 accessions genomes and developed a new type of genome browser that displays SNP information which allows rapid community access and searching of data prior to deposition of finished sequences. All next-generation genome sequences have been deposited into Genbank SRA and have been made freely available on our web site: http://signal.salk.edu/atg1001/index.php and on the consortium web site: http://1001genomes.org