With the rapid development of next-generation sequencing technologies, the genome sequences of over 80 different plant species are available with several hundreds to thousands of individual plants within a species currently being sequenced. While the genome sequences provide the "parts lists" for tens of thousands of protein-coding genes in each species, the vast majority of genes in most plant species remain to be functionally annotated, which has become a major bottleneck for the whole plant biology field. A prerequisite for any effort to determine the functions of proteins on a genome-wide scale requires the construction of a set or library of bacterial clones with inserts representing only the protein coding regions or open reading frames (ORFs) of genes. These libraries are often referred to as an ORFeome. A significant hurdle for constructing an ORFeome library is that tens of thousands of clones must be sequenced individually, making the process extremely labor-intensive and cost-prohibitive since next-generation sequencing (NGS) technologies cannot be applied directly. This EAGER project aims to develop a massively parallel sequencing technology, called PLATE-seq (PCR mediated linkage of barcoded adapters to nucleic acid elements for sequencing), which will drastically increase the throughput and reduce the cost of large-scale sequencing efforts. As part of the proof-of-concept and to demonstrate the utility of PLATE-seq, the project will construct the first fully-sequenced single-colony rice (Oryza sativa, cv. Nipponbare) ORFeome library for ~3,000 genes. This EAGER project is interdisciplinary in nature and represents a mid-career reorientation of research interest into the field of plant science for the Principal Investigator, and will provide research training for undergraduate and graduate students in Science, Technology, Engineering and Mathematics (STEM) majors. All protocols will be accessible through publications, seminars, and training workshops. In addition, all ORF clones, E. coli and yeast strains, computational tools, and sequence data generated in association with this project will be made openly available.
Next-generation sequencing technologies require mixing tens of thousands of samples together to be sequenced en masse. Multiplexing strategies offer a partial solution to the need to track individual samples, but can be prohibitively expensive when sequences need to be matched to thousands of individual samples, which is the case for constructing an ORFeome library. For this reason, large-scale Sanger sequencing, albeit extremely expensive, is still necessary for many such applications. The PLATE-seq platform is based on an innovative but unproven design which requires large-scale nested stitch PCRs from thousands of E. coli or yeast colonies where the use of ~150 bp double-stranded DNA as a primer in one of the steps can significantly decrease the PCR efficiency. If successful, PLATE-seq technology will completely replace the need for large-scale Sanger sequencing, predicted to improve sequencing efficiency of ORFeome libraries ~1,000-fold over existing multiplexing NGS approaches, and even more so compared to traditional Sanger sequencing. For this reason, PLATE-seq has tremendous implications not only for constructing ORFeome libraries, but also for many functional genomics and reverse proteomics applications where it is essential that sequencing reads be traced back to individual samples, or where associations between samples need to be tracked such as in yeast two-hybrid (Y2H) or other genetic screens where pairs of DNA molecules are selected and identified.