PI: Susan R. Wessler (University of California - Riverside) CoPIs: Jeffrey L. Bennetzen (University of Georgia), R. Kelly Dawe (University of Georgia), Ning Jiang (Michigan State University), Phillip SanMiguel (Purdue University)

Collaborator: Byron Freeman (University of Georgia, Georgia Museum of Natural History)

Transposable elements (TEs) are the most abundant component of all characterized genomes of higher eukaryotes and the genome of maize is recognized as having the most dynamic TE component. As such, it is the organism of choice for understanding how TEs contribute to gene and genome evolution. In addition to identifying all TEs in maize, this project will pay particular attention to the characterization of TEs such as Pack-MULEs and Helitrons that routinely capture and amplify gene fragments and thereby confound gene annotation. This project will generate a comprehensive and rigorously annotated TE database that will greatly assist all future maize genome annotations. Computational analysis will serve to identify candidate active TEs whose mobility will be validated using the transposon display technique in conjunction with a wide spectrum of maize genomic DNAs. This project will, for the first time, extend the analysis of TEs to the pericentromere, a region that in other sequenced genomes has been unfinished and/or unannotated.

The scientific goals of this project and the familiarity of maize provide outstanding opportunities for student training and for connections between the research community and the broader public. This project dedicates over 10% of its resources to this mission with the centerpiece being the development of web-based and traveling museum exhibits that describe the history of maize as a crop, as a model organism for genome research, and as a key component of many Native American cultures. To this end, collaborations have been established with the University of Georgia Museum of Natural History, the Smithsonian Institution and the U.S. Botanic Garden.

Access to Project Outcomes All information from this project will be made freely available to the Maize Genome Sequencing Project and to long-term repositories such as MaizeGDB (www.maizegdb.org/). Software and other tools generated will be freely available at the project website (accessible via www.plantbio.uga.edu/wesslerlab/).

Project Report

Transposable elements (TEs) are fragments of DNA that can insert into new chromosomal locations and often duplicate themselves in the process. With the advent of large-scale DNA sequencing, it has become apparent that TEs are the single largest component of the genetic material of most eukaryotes. They account for at least 50% of the human genome and 50-90% of some plant genomes. TEs were discovered in maize by Barbara McClintock more almost 75 years ago as the genetic agents responsible for the sectors of pigmentation on otherwise colorless mutant kernels. To this day, the genome of maize is recognized as having the most dynamic TE component. As such, it is the organism of choice for understanding how TEs contribute to gene and genome evolution. The Maize TE Project, initiated in 2006, coincided with the maize genome sequencing project and had two major goals that focused on giving meaning to that sequence. The first goal was to identify and characterize virtually all of the TEs in the genome. This required the development of new software tools and packages to tease apart the roughly 85% of the maize genome derived from TEs. The second goal was to organize the output of the computational analysis into a comprehensive and rigorously characterized TE database that, over subsequent years, has assisted ongoing genome annotation efforts. Taken together, this project performed all of the TE identification for the maize genome sequencing project published in 2009. Several software tools were developed and published during the course of this project to identify the numerous types of TEs in maize including TARGeT, DAWGPAWS, and HelSearch. These programs led to the discovery of several new TE types including 172 new LTR retrotransposon families (bringing the total to ~350), 29 new LINE families, 72 new MULE families (bringing the total to 137), ~1000 Pack-MULEs (fewer than expected), and ~2000 intact Helitrons (about 2% of the genome) classified into 8 families (5 previously unknown, with one family comprising 98% of all maize Helitrons). In addition, we found that members of the CACTA superfamily were the most numerous coding elements while the Tc1/mariner superfamily had the fewest elements. The output from these searches was organized into ~1500 TE "exemplars" that are being used world-wide to annotate virtually all maize TEs (http://maizetedb.org/~maize/). Finally, in collaboration with the NSF iPlant project, the TARGeT pipeline was added to the DNA Subway, which is a user-friendly search tool used by undergraduates and high school teachers in addition to researchers. The Maize TE Project also analyzed the frequency of gene fragment acquisition, a phenomena shown previously to occur for members of the Mutator and Helitron superfamilies. Early versions of maize genome annotation identified ~106,000 possible genes. Our project demonstrated that of these ~4061 were gene fragments acquired by Helitrons while 937 were associated with Pack-MULEs. In addition, ~13,000 other TE- derived sequences were found to have been incorrectly annotated as genes. With regard to Pack-MULEs the project discovered that they acquire GC-rich gene fragments that are preferentially inserted near the 5’ end of genes when they transpose. As a result, Pack-MULEs have been amplifying GC rich gene sequences, modifying the 5’ end sequence of genes and are at least partially responsible for the formation of the GC-rich 5’ ends of genes in grasses. Such a global impact had not been reported previously for any other TEs. Another focus was the characterization of TEs associated with centromeric and pericentromeric regions of the maize chromosomes. The project discovered that the CRM class of TEs targets centromere cores. Further, the fact that most of the CRM elements are very old (> 0.75 million years) suggests that centromere position and the associated epigenetic states are quite stable. Characterization of the CRMs allowed us to use CRM markers to determine the map positions of maize centromeres as part of the maize genome sequencing project. In the final phases of the project we characterized the interaction between TEs and other features of the chromatin, including DNA methylation and histone modifications. The results revealed that TEs are heavily methylated and that this DNA methylation can sometimes spread into genes and affect gene expression. TEs are often targeted for inactivation by RNA interference, which results in the production of small RNAs. Studies under this award revealed that only a small subset of the TEs close to genes are subject to this form of inactivation. TEs between genes lie in condensed regions known as heterochromatin, whereas the RNAi-susceptible TEs close to genes are not heterochromatic and are instead marked by histone modifications that closely resemble genes. These genome-wide studies were among the first to demonstrate that TEs can impact gene expression by modifying the DNA methylation and chromatin states of promoter areas.

Agency
National Science Foundation (NSF)
Institute
Division of Integrative Organismal Systems (IOS)
Application #
1118550
Program Officer
Diane Okamuro
Project Start
Project End
Budget Start
2010-10-01
Budget End
2014-08-31
Support Year
Fiscal Year
2011
Total Cost
$1,501,205
Indirect Cost
Name
University of California Riverside
Department
Type
DUNS #
City
Riverside
State
CA
Country
United States
Zip Code
92521