Recent technical advances in large scale sequencing and genomics methods have triggered a scientific revolution with immense potential for extending biological knowledge. They have also posed an immense challenge: how to make optimal use of vast quantities of biological data. Without long-term high quality mechanisms for accessing and analyzing the data, the resources used in generating the data are in danger of going to waste. It is essential that databases to house the data, tools and interfaces to provide access to the data, and curatorial efforts to organize data for maximum utility be established and supported for the long term. For the model plant Arabidopsis, the successful effort to sequence the genome and the current effort of the 2010 Functional Genomics Initiative to understand the function of every Arabidopsis gene have produced great quantities of valuable data which can only be put to maximal use if the information is stored, interrelated, updated and made accessible to researchers for the long term. TAIR (the Arabidopsis Information Resource, www.arabidopsis.org) has taken up the challenge posed by this flood of data in its first five years of funding, developing a data structure to house the diversity of available data along with tools and interfaces needed to access and analyze it. The aim of the second phase of this project is to maintain the data and tools currently existing in TAIR and add new efforts in key areas, including maintaining and improving the genome annotation, extracting phenotype and gene expression data from the literature, and improving accessibility of TAIR's resources to all biologists, teachers, students and the general public through development of tutorials and improved site navigation. TAIR will continue to develop controlled vocabularies and standardized data exchange mechanisms for maximal interoperability with other biological databases and will provide data in explicitly defined and structured formats to facilitate programmatic data retrieval. TAIR will also continue to play a central role within the 2010 Functional Genomics Initiative as the site where researchers can browse existing projects, view lists of genes already under investigation by other groups, and search for genes with no known function. TAIR will incorporate the resulting functional data and develop a tool that tracks progress toward the 2010 goal of complete functional annotation.

Intellectual merit: Transformation of large quantities of data into useful knowledge is one of the biggest problems in today's post-sequencing era of biological research. Extraction of experimentally verified information from the corpus of research literature and encoding of the information using explicitly defined and computable ontologies representing biological concepts will facilitate the analysis and interpretation of large-scale data sets. In addition, creation and population of a data model for the complex relationships between biological objects will create a comprehensive computational framework for mining, exploring and retrieving the data. Development of standardized data curation methods also benefits the whole biological research community.

Broader Impact: The Internet is a widely accessed source of information for researchers and students. Creation of interactive, on-line tutorials and user guides for students, plant researchers with no or little Arabidopsis knowledge and Arabidopsis experts will provide educational materials that can be used in classrooms, laboratories, workshops and for self-paced learning. Dissemination of research methods and data through TAIR will benefit scientists all over the world and foster international collaboration. Continuous integration and annotation of data by TAIR's curators will ensure that researchers have access to current, accurate data that would otherwise require significant time and effort to amass individually. Through collaborations with educators, college and pre-college students will access research materials to generate authentic data as they learn about scientific inquiry and plant biology, which will be disseminated to the research community through TAIR.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Cooperative Agreement (Coop)
Application #
0417062
Program Officer
Peter H. McCartney
Project Start
Project End
Budget Start
2004-09-01
Budget End
2009-08-31
Support Year
Fiscal Year
2004
Total Cost
$7,988,952
Indirect Cost
Name
Carnegie Institution of Washington
Department
Type
DUNS #
City
Washington
State
DC
Country
United States
Zip Code
20005