This Small Business Innovation and Research Phase II project focuses on the development of the commercial software system (PubAssist), which will greatly simplify the access to and the analysis of protein and cellular function data and will assist in navigation through scientific literature. This system, in essence, constitutes an advanced literature search engine that is built around the ontology-based NLP pipeline MedScan developed and tested during the Phase I of this grant. The result of text processing by MedScan is a set of functional links (interaction, regulation, involvement in etc.) between normalized domain-specific concepts, such as cellular processes, cellular components, proteins, small molecules, diseases, functional groups etc. PubAssist will consist of two components: i) a server-side search engine that will extract, maintain, and index protein and cellular function-related information and provide an efficient access to it and to corresponding MEDLINE documents, and ii) a graphical client that will organize and visualize extracted data and documents. PubAssist server will support (i) the generic keyword-based search functionality; (ii) concept-based document retrieval (e.g., """"""""Find documents that describe a protein of interest"""""""") for domain-specific concepts; and most importantly, (iii) semantic searches (e.g., """"""""Find references that describe the transcriptional regulation of my gene in a specific tissue"""""""") not available elsewhere. The results of such queries can be either documents or graphical summaries presented in the ways adopted by research community (pathway diagrams, expression and genome maps). The latter represent snapshots of information extracted from the queried abstracts and greatly facilitate logical perception of the data. PubAssist Windows client will be sold as an inexpensive self-contained desktop product. It will redirect searches to public sites to retrieve abstracts or full-text articles for processing by MedScan. In addition, it will use Ariadne Genomics web service to interface with the semantic indexing engine deployed on our website. The client-server edition will be offered as an enterprise class solution. Customers who install the server-based semantic indexing engine in addition to the PubAssist client will be able to process their proprietary documents and customize the NLP algorithm.
Daraselia, Nikolai; Yuryev, Anton; Egorov, Sergei et al. (2007) Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks. BMC Bioinformatics 8:243 |