Collaborative grants have been awarded to the University of Maryland, the University of Iowa and St. Bonaventure University to develop a methodology that exploits the wealth of annotation knowledge, notably Gene Ontology (GO) and Plant Ontology (PO) annotations of Arabidopsis genes. Motivated by the availability of rich and as yet insufficiently tapped collections of gene annotations, the project aims to facilitate the discovery of hidden knowledge that could be the basis of further scientific research. The methodology will extract patterns of interest from annotation graphs (pattern discovery). Literature-based methods will extract sentences that validate the biological meaning underlying these patterns (pattern validation). To demonstrate the methodology, the PattArAn tool (Patterns in Arabidopsis Annotations) will be customized for Arabidopsis. PattArAn will provide the user with a graphical presentation of patterns of Arabidopsis genes and associated GO and PO CV terms. Graph data mining techniques and efficient algorithmic solutions to identify dense subgraphs (DSG) and to perform graph summarization (GS) will be developed. Algorithms to mine the literature for relevant sentences for an extracted pattern (referred to as the imprint) will be developed. PattArAn will enable iterative exploration and will incorporate allied steps such as consulting gene function prediction. The project will involve collaboration with biologists for building and refining annotation graphs, and validating patterns to ensure relevance to their research.
The project makes broad contributions to the Arabidopsis thaliana community. PattArAn may assist Arabidopsis curators to manage GO-PO annotations and complement existing tools such as Textpresso and AraNet. It can also be used to bootstrap an annotation database for other plant species given that their genome sequence information is available. The project offers significant research and educational experiences for graduate students (University of Maryland and Iowa) and undergraduate students (St. Bonaventure University). Team members will continue to mentor women and students from under-represented communities, participate in outreach activities, lead a Journal Club, etc. The outcomes from this research project will be disseminated via biology and bioinformatics venues. More information may be obtained at the project website: https://wiki.umiacs.umd.edu/clip/pattaran/.