The technology necessary to monitor gene expression in a single cell, within a tissue, or across an entire organism has developed tremendously over the past decade. As a direct result, there are now tens of thousands of publicly available data sets providing snapshots of how plants modulate the transcription of their genetic material to produce a phenotype. In order to appreciate the transcriptional complexity leading to phenotype, it is first necessary to understand the full composition of the transcriptome itself. Aside from protein-coding RNAs and small RNAs, a third class of transcript has recently been uncovered: long non-coding RNAs (lncRNAs). LncRNAs are emerging as key regulatory molecules impacting how plants respond to changes in their environment such as temperature and water abundance. Despite their many important roles, lncRNAs remain poorly annotated in plants. LncRNAs are difficult to predict from genomic sequence alone and often require extensive transcriptional information, and the capacity to process that data. To overcome difficulties in lncRNA annotation and functional classification, this project aims to mine all publicly available transcriptomic data for the fifteen most studied model and agriculturally significant plant species. LncRNAs will be identified, cross-species conservation will be determined, and putative functional pathways will be inferred in each of the fifteen species. These three data points (identification, conservation, and functional prediction) will not only provide a more holistic view of plant transcriptomes, but also help researchers studying the complex relationship between genome and phenome. This project initiates a novel bioinformatics and molecular biology training curriculum for undergraduates called LIVE for Plants, as well as a summer research component aimed at delivering an enhanced bioinformatic and molecular biology training to undergraduates.

Long non-coding RNAs (lncRNAs) function in numerous developmental pathways in eukaryotes. In plants, lncRNAs are known for their roles in regulating responses to different environmental stimuli. The extent to which characterized plant lncRNAs represent the total suite of functional classes remains an open question. Moreover, the extent to which plant transcriptomes are composed of lncRNAs is also unknown, hindering discovery of new functional types. These gaps in knowledge are in part due to poor sampling in plants, as well as lack of a consistently applied methodology among identification efforts. This project aims to document and annotate lncRNA repertoires from fifteen angiosperms by mining > 100 terabases of publicly available RNA-seq data. This computational curation will entail identifying lncRNAs for each species and then annotating them based on expression, conservation, and predicted biological process. The bioinformatics workflow developed as part of this project will benefit other groups attempting similar large-scale curations in their own systems. All data and metadata will be disseminated through CyVerse's Data Store, with visualization provided by popular public resources such as EPIC-CoGe and BAR's eFP Browsers. Taken together, the project aims will yield an innovative resource of accessible and integrated information about plant lncRNAs, as well as the information and tools to identify lncRNAs with important biological functions.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Agency
National Science Foundation (NSF)
Institute
Division of Integrative Organismal Systems (IOS)
Type
Standard Grant (Standard)
Application #
2021753
Program Officer
Gerald Schoenknecht
Project Start
Project End
Budget Start
2019-11-01
Budget End
2021-07-31
Support Year
Fiscal Year
2020
Total Cost
$224,448
Indirect Cost
Name
Boyce Thompson Institute Plant Research
Department
Type
DUNS #
City
Ithaca
State
NY
Country
United States
Zip Code
14853