Intellectual merit: Plant biologists need many completely sequenced and functionally annotated genomes within each species to fully exploit the power of evolution and to understand how an organism functions and adapts to its environment. This project will contribute to an international effort aimed at generating genomic DNA sequences from over 1,000 inbred strains of Arabidopsis thaliana, while driving technology developments in both the hardware for the DNA sequencing itself and in software development to make sense of the sequence data. Specifically, this project will generate genome, transcriptome, and methylome sequences. The genome sequences will provide an unparalleled view of haplotype structure and structural variation for Arabidopsis thaliana. The transcriptome sequence data will assist in genome annotation and identification of naturally occurring splice isoforms. Lastly, the DNA methylome sequencing data will permit identification of epiallelic variation on a population-wide level. These three sequencing data sets (genetic, transcriptional and DNA methylomes) will dovetail nicely with each other to provide a comprehensive set of variants within the Arabidopsis thaliana population. Combining genetic, epigenetic and transcriptional variation from different individuals in a population or within a species will provide a potential source for analyzing phenotypic diversity via quantitative trait loci mapping or genome-wide association studies. Moreover, these data will enable subsequent mechanistic studies through experimental manipulation of Arabidopsis thaliana strains and will complement the efforts of individual investigators who are using the same accessions to catalog phenotypes for thousands of traits.
Broader impacts: The impact of this project will be in two broad areas. First, the completion of the planned research will result in important new resources for the plant biology community: large-scale information on genetic, epigenetic and transcriptional variation within a species. All of the DNA, cytosine methylation and RNA sequence data will be made freely and easily accessible to the research community. The long-term impact of these enabling tools and technologies on agriculture and forestry is expected to be profound, providing fundamental knowledge for the construction of new plant varieties with superior agronomic traits. An equally important aspect of this program is training, which will be provided at a variety of levels, including outreach to high school, undergraduate and graduate students as well as postdoctoral mentoring.
Natural epigenetic variation provides a source for the generation of phenotypic diversity, but to understand its contribution to such diversity, its interaction with genetic variation requires further investigation. Cytosine DNA methylation is a covalent base modification that can be stably transmitted through mitotic and meiotic cell divisions. DNA methylation has the capacity to alter proximal chromatin structure and transcriptional activity of the genome, depending on the location and sequence context of the methylated base. Base-resolution determination of methylation status is important for understanding the cellular pathways by which the genome modification is established and maintained. In plant cells, multiple molecular pathways mediate the methylation of cytosines in distinct sequence contexts (CG, CHG, CHH, where H = A, C, T). Genic CG methylation is associated with constitutively expressed loci, whereas regions of the genome targeted by CG and non-CG methylation are under active silencing by the RNA-directed DNA methylation pathway. Although rates of spontaneous variation in DNA methylation and mutation can be decoupled in the laboratory, in natural settings, these two features of genomes co-evolve to create phenotypic diversity onwhich natural selection can act. To understand the types and extent of natural DNA methylation variants in A. thaliana, over 1001 epigenomes for genotypically distinct, wild accessions, isolated from throughout the Northern Hemisphere, were determined using MethylC-sequencing, RNA-sequencing and genomic DNA-sequencing, in an international collaboration. Integration analysis of these data provides new insights in the population level interactions between genomic and epigenomic variation.