The genome of the model plant Arabidopsis thaliana has been annotated to show the presence of close to 30,000 genes. Many of these genes are both predicted by computer programs and supported by experimental evidence ' their similarity to characterized genes from plants, animals and microbes. However, 15-20 % of the gene annotations are based only on computer evidence. In addition, comparison of the Arabidopsis genome sequence with its close relative Brassica oleracea (the cabbage/cauliflower/Brussels sprouts family) reveals many regions of sequence conservation that are not at present annotated as genes. Preliminary experimental analysis of some of these regions indicates that a significant fraction do encode genes that have so far been unrecognized by the annotation process. Full-length cDNAs provide both the basis for greatly improved gene structure annotation by alignment of their sequences with genomic DNA, and also the reagents with which to analyze their function by expression and other kinds of analysis. The objective of this research is to generate full-length cDNAs for approximately 2,000 of these genes that represent the least well-understood genes in the genome. We will use 5 and 3 RACE to define the precise structure of each gene and then generate full-length cDNA clones for protein-coding genes (ORFs) in a recombination vector suitable for functional studies by the research community. We will generate clones and sequence at a rate of approximately 100 clones per month within three months of the start of the project with a goal of producing 2,000 novel and previously uncharacterized clones over the period of the project. Sequences of the clones will be submitted to GenBank as they are generated and will also be available from the TIGR ftp site. The clones themselves will be made freely available to the research community through the Arabidopsis Biological Resource Center (ABRC). Details of the project will be maintained at a project-specific web site www.tigr.org/tdb/e2k1/ath1/2010_cDNAs At the scientific level, this project will enhance the Arabidopsis genome annotation by increasing the total gene count in Arabidopsis by ~10% and will be a major step towards completing the identification and validation of all the genes in the first fully sequenced plant genome Arabidopsis thaliana. The cDNAs will be available as community resources. Integrated into the proposed research is a curriculum development component that will provide research experience to high school science educators in which they will work on the project and use project materials for the development of new curricular modules designed to provide exposure to fundamental concepts in plant genomics to high school students.