Clemson University and the University of Illinois at Chicago are awarded grants to develop a dynamically scalable collaboration community, G-SESAME Cloud, for biological knowledge discovery. The first aim of this project is to enhance the popular G-SESAME tools (http://bioinformatics.clemson.edu/G-SESAME) in terms of methodology, functionality, accuracy, efficiency, and scalability to address the immediate needs of researchers who utilize G-SESAME tools in their daily biological research. The ultimate goal is to build a community-based scalable cloud computing infrastructure (G-SESAME Cloud) to help the biological researchers disseminate their research results. This infrastructure will provide a set of Web-based tools for biological researchers to automatically (or semi-automatically) convert their GO-based biological programs developed in any programming languages under any platforms into Web services and publish them on the G-SESAME Cloud. Researchers can also use a configuration utility developed in this project to easily configure their computing facilities into the G-SESAME Cloud. This project will produce a complete set of Web services for measuring the functional similarity of biological entities using different methods, and for discovering biological knowledge based on such similarity values. The G-SESAME Cloud will provide a community-based, effective and self-scalable cloud computing environment in which researchers can easily publish their biological application software as a service (SaaS) and share their computing infrastructure as a service (IaaS). The G-SESAME Cloud and its GO-based Web services will release the burden of biological researchers from learning Web technologies and maintaining their own computing facilities so that they can focus on their research. The success of this project will set an example for building self-scalable community-based biological Cloud to promote resource sharing and SaaS, IaaS and PaaS (Platform as a Service) concepts in biological research community. This project will also be used to train Computer Science students, including women and minority students, on distributed computing, data mining, and Web technologies.

Project Report

The G-SESAME Web tools have been used by many researchers in biomedical fields. The technologies and methods have shown great impact in the related fileds. This can be shown by the number of citations (more than 350 according to Google scholar) that our original G-SESAME paper has obtained in past years. The new biomedical search engine, G-Bean, has started to attract researchers to use it to search the biomedical related papers. In last two months, the search engine has been used for more than 2000 times by people from the entire world. The ontology based approaches developed in this project can be easily adapted to other disciplines to solve the related problems. For instance, we can adapt the same method to measure the semantic similarity of natural language words, to measure the semantic similarity of ontological terms in economy, social sciences, etc. Our developed methods for mining of information networks are useful in many disciplines of science or engineering. For example, we have shown its applications in data cleaning and data integration, role discovery, ranking, classification and clustering in various kinds of social and information networks. Furthermore, by collaborating with the neuroscientists at Northwestern Memorial Hospital, the data mining methods developed have been applied to the f-MRI brain images to detect brain defects. By working with the Cheminformatic group at the University of Indiana, the network mining methods have been applied to drug discovery. By collaborating with the medical informatics group at UIC, the mining methods have be applied to aid the discovery of systematic reviews on randomized controlled trials to support evidence based medicine. Also by collaborating with the computational biology group on open science data cloud at the University of Chicago, the mining methods have been applied to discover efficiently the geometric patterns in genomic data, which can provide valuable information about biological function, such as the activation or repression of genes.

Agency
National Science Foundation (NSF)
Institute
Division of Biological Infrastructure (DBI)
Type
Standard Grant (Standard)
Application #
0960443
Program Officer
Peter H. McCartney
Project Start
Project End
Budget Start
2010-07-01
Budget End
2014-06-30
Support Year
Fiscal Year
2009
Total Cost
$322,555
Indirect Cost
Name
University of Illinois at Chicago
Department
Type
DUNS #
City
Chicago
State
IL
Country
United States
Zip Code
60612