PI: Lincoln Stein (Cold Spring Harbor Laboratory) CoPIs: Doreen Ware (Cold Spring Harbor Laboratory), Susan McCouch, Edward Buckler, and Pankaj Jaiswal (Cornell University; subawardee)
The Gramene database (www.gramene.org) is an online resource, jointly supported by NSF and the US Department of Agriculture's Agricultural Research Service. It integrates the genomic, genetic and phenotypic information in rice, maize and other cereals, thereby giving scientists and other end-users easy access to this integrated information. This project will provide for the enhancement of Gramene by incorporating biological pathway and genetic diversity information from maize, rice, wheat, sorghum, and other cereals into the resource. Comparative genomics tools will be developed, thereby allowing researchers to use knowledge gained in one plant species to identify and characterize functionally significant genes and other elements in the genomes of other species. Scientists can use the resource to make advances in our fundamental understanding of the plant processes of economic importance such as hybrid vigor, grain development, seed dormancy, drought tolerance, and resistance to diseases. In addition, the tools developed will allow for the estimation of the breeding value of individual genetic variants, thereby providing breeders with the ability to select the ideal combinations of seed stocks to create varieties that have desirable traits such as robustness, the ability to grow in marginal environments, or have high potential as a source for biofuels and other materials of high economic value.
Information resources developed, and being developed, through genomics efforts are key elements to advance our fundamental knowledge base for a future bio-based economy and to address the expected need for feeding an expanding world population. Many of the information resource are still underutilized because of the fragmentation of the datasets and the absence of tools to make meaningful connections among them. To fully unlock the potential of plant genome data, the diverse datasets must be integrated so that information is shared both within and between species. It is one of the goals of Gramene to provide that integration. Another goal of the project is to deliver the integrated dataset into the hands of plant geneticists, molecular biologists, evolutionary biologists and breeders by providing compelling, intuitive, user interfaces. Lastly, the project will reach out to students, the public, and to underrepresented minorities via a series of online tutorials and on-site workshops that involves a novel, and cost-effective, public/private partnership.
All the information resources generated by Gramene will be available to scientists, breeders, and members of the general public free of charge and without intellectual property restrictions.
Gramene is an integrated web resource to compare plant genomes and analyze genes and other genomic features to deepen our understanding on how plants function. Put another way, the biological information that this resource provides on crops and model species, enables plant researchers and breeders to make powerful comparisons across plant reference genomes and pathways. Thus enabling more accurate estimates of the breeding value of individual genetic variants, which is the basis of crop improvement through plant breeding, i.e., by selecting combinations of seed stocks to create varieties that have desirable traits, such as robustness, ability to grow in marginal environments or potential as a source for biofuels. Ultimately this knowledge will better equip our society for overcoming some of the global challenges that agriculture faces in the modern world, such as food, feed (e.g., biofuels), and fiber production for a growing population, with a reducing rural workforce in the light of climate change. From only a handful of fully sequenced plant genomes available at the time that this award was made in 2007, and tremendously furthered by NSF award # 1127112 (Gramene - Exploring Function through Comparative Genomics and Network Analysis) during the past 2.5 years, the Gramene database has grown to host 39 reference genomes, genetic or structural variation for 10 plant species, and pathway networks for selected plant species. In addition, comparative analyses are now available through whole-genome alignments, rendering gene family tree views, synteny views, lists of orthologous and paralogous genes, orthologous projections, etc. Throughout the life of the project, we fostered numerous international collaborations, key among which are those with the Ensembl Plants (plants.ensembl.org) and Reactome (reactome.org) projects. By adopting the Ensembl infrastructure for genome data and synergistically coordinating data releases, we minimized the redundancy of efforts and attained much more ambitious goals that originally intended. The inception of the Plant Reactome to represent curated plant pathways and perform pathways-based analyses was another major project achievement. It currently hosts over 200 curated rice pathways and orthologous pathway projections will be released for 33 plant species in January 2015. In addition, project outreach activities included the organization of workshops and community resources booths, as well as oral and poster presentations at an average of 10 scientific international meetings per year, and about 10 peer-reviewed publications, reviews or book chapters also per year. In the past 3 years, during its no-cost extension phase, this award had a pivotal role in numerous outreach activities including bioinformatics resources booths at prominent annual scientific conferences like Plant and Animal Genomes, the Maize Genetics Conference, and Plant Biology. Funding from this extension period also enabled the following 2 high-impact meetings: Genomes to Germplasm Workshop This workshop was held on 28 February - 2 March 2013 at the French National Institute of Agricultural Research (INRA) in Versailles, France, in collaboration with the EU-funded TransPlant project and INRA. It brought together 43 experts from 38 institutions from the European Union, the United States and international organizations, including researchers, breeders, bioinformaticians, data managers, and private sector participants. The participants discussed (i) how natural variation is being sampled, and the likely future applications of this data (ii) informatics needs and solutions: what infrastructure and data standards are available, what components are currently missing or underdeveloped, and how these deficiencies might be addressed, particularly in the context of globally distributed activities (iii) connections between germplasm resources and genomic databases and (iv) tools needed to practically apply these data for the purposes of plant breeding and crop improvement. High Performance Computing in Undergraduate Biology Education This meeting was held from 3-5 September 2014 at the Banbury Center of the Cold Spring Harbor Laboratory in New York, and brought together computer scientists and undergraduate biology educators who came to a surprising conclusion: there is plenty of time available on high performance computers for undergraduate research projects, but most teaching faculty don’t know how to access it. Representatives of the National Science Foundation’s (NSF) supercomputer system, XSEDE (Extreme Science and Engineering Discovery Environment) and two major cyberinfrastructure structure projects – NSF’s iPlant Collaborative and the Department of Energy’s KnowledgeBase (KBase) – agreed that they would make increased efforts to reach faculty at primarily undergraduate-serving institutions (PUIs). Project websites: Main Project website: www.gramene.org Gramene Genomes (Ensembl interface): http://ensembl.gramene.org Gramene Pathways - Plant Reactome: http://plantreactome.gramene.org Gramene Pathways - Cyc databases at iPlant: http://pathway.iplantcollaborative.org Internal Project Documentation: http://gwiki.gramene.org Project data repository: https://github.com/warelab/gramene-* Gramene Facebook page: www.facebook.com/Gramene