The Carnegie Institute of Washington is awarded a grant to develop an existing model organism database into a resource for comparative plant genomics. The Arabidopsis Information Resource (TAIR) provides online access to genomic data for the reference plant Arabidopsis thaliana to over 35,000 plant biology researchers worldwide each month. Data housed in this resource include Arabidopsis gene structure and function data gathered from the extensive research literature for this plant which can be applied to better understand the genetic basis for traits in other plants, including food, feed and bioenergy crop species as well as ecologically important plant species. To promote such understanding and make the transfer of knowledge from Arabidopsis to other plant species more efficient, the project will modify the existing database and website to store and display genomic data from other plants and provide tools to compare genome maps, nucleotide and protein sequences, and orthologous genes from other plant genomes. The proposed work will also expand a pilot project in which academic journals partner with TAIR to efficiently incorporate newly discovered gene function data into the database, increasing the availability and impact of the new results.

Understanding basic plant biology is essential to meeting current and future worldwide food and energy needs. TAIR serves as a foundational component of the plant biology research and education infrastructure, providing access to fundamental plant biology datasets and tools for students and researchers at institutions of all types and sizes across the United States and around the globe. An email-based help desk handles over a thousand questions per year ranging from requests for basic plant biology facts by high school students and undergraduates to requests for highly specialized custom datasets from graduate students and postdoctoral fellows. An extensive outreach program includes conference workshops, tutorials and online help. TAIR can be accessed at http://arabidopsis.org.

Project Report

TAIR (http://arabidopsis.org) is a worldwide resource for Arabidopsis data and a leader in the field of biological data curation. TAIR’s team of professional curators and software developers organizes, integrates, curates and provides access to the most complete body of experimental data and biological resources available for any plant species. TAIR's overall goal is to provide a complete, consistent and computable gold standard dataset of Arabidopsis gene structure and function that will continue to serve as the most important reference dataset in plant biology. INTELLECTUAL MERIT: The TAIR10 Arabidopsis thaliana genome release, made public on November 17 2010, incorporated a wide range of high throughput experimental evidence as well as hand-curated gene models for some genes to ensure maximum quality. Since its release this dataset has been widely adopted by virtually every public resource for plant genomic data as the most complete and highest quality Arabidopsis dataset available. Over the five year period from September 2009 to August 2014 we associated 11,041 published research articles to 15,683 Arabidopsis genes using a computationally assisted curation pipeline. We also extracted experimentally validated information on gene function, subcellular localization, expression pattern, gene symbols, alleles and phenotypes and extended the number of Arabidopsis genes with at least one experimental gene function annotation from 8,794 to 11,761. TAIR curators also assisted researchers with finding and using data through the TAIR helpdesk. In the past 5 years 5,830 questions from researchers were received and answered by TAIR curators. BROADER IMPACTS: TAIR datasets are used extensively by plant biology researchers around the globe, as evidenced by TAIR's high usage of 60,000 unique visitors per month and its high citation rate. TAIR is frequently mentioned in plant biology research articles as a source of data, including 959 articles published in 2009 and 1460 articles published in 2014, an increase of 52% over the funding period. The cumulative impact over the 5 years from 2010 through 2014 is approximately 6250 articles. The highly significant impact of TAIR derives from its production of gold standard reference datasets including the highest quality plant genome available and the most up to date and complete gene function dataset for plants. These datasets are used to annotate new plant genomes and interpret the results of high throughput experiments in Arabidopsis and other plants. Additionally, the well organized and integrated data presented in TAIR saves researchers many hours of effort that would be required to gather a similar amount of information from the literature on their own, and ensures that new research is guided by knowledge of work done by others, speeding the progress of research and avoiding unnecessary duplication of work. TAIR data is used widely by agricultural biotechnology companies, evidenced by detection of usage from a large number of companies within TAIR's usage statistics and the willingness of these companies to support TAIR financially through sponsorships in 2010-2013 and through subscriptions beginning in 2014. Given the value placed on TAIR by these companies it is clear that the gold standard reference datasets produced by TAIR are important for development of new crop varieties and other agricultural products. Arabidopsis gene function data also serves as a reference for research on other species including humans. TAIR's work to gather and present Arabidopsis gene function data in a unified and integrated portal allows biologists working in a range of fields including human health to better understand the function of conserved genes that have been intensively studied in plants.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Application #
0850219
Program Officer
Peter H. McCartney
Project Start
Project End
Budget Start
2009-09-01
Budget End
2014-08-31
Support Year
Fiscal Year
2008
Total Cost
$4,170,595
Indirect Cost
Name
Carnegie Institution of Washington
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20005