In recent years, there have been many studies focusing on improving the accuracy of prediction of transmembrane segments, and many significant results have been achieved. In spite of these considerable results, the existing methods lack the ability to explain the process of how a learning result is reached and why a prediction decision is made. The explanation of a decision made is important for the acceptance of machine learning technology in bioinformatics applications such as protein structure prediction. While support vector machines (SVM) have shown strong generalization ability in a number of application areas, including protein structure prediction, they are black box models and hard to understand. In our current NSF project (CCR 0514750), rough sets data analysis has been proposed for data mining. In this project, we propose to extend our results in data mining from the current NSF project to bioinformatics. In particular, we propose to use an innovative approach to rule generation for understanding prediction of transmembrane segments by integrating the merits of rough set theory, SVMs and association rule based classifiers. We believe that the new approach can be used not only for transmembrane segments prediction, but also for understanding the prediction. The prediction and its interpretation obtained can be used for guiding biological experiments.

Intellectual Merits: The focus of this proposal is to combine rough set theory from our current NSF project, SVMs and association rule based classifiers to elucidate a new approach for transmembrane segments prediction and its understanding. While there exist several methods for the same purpose, the proposal seeks to achieve better performance with respect to accuracy and the number of generated patterns. It is hoped that the patterns generated can be easily understandable and biologically meaningful and can be used by biologists to guide their experiments. This collaborative approach draws upon the strengths of the PI in machine learning and the co-PI's expertise in transmembrane to validate our new method. The resulting software will not only be able to predict transmembrane segments, but moreover how the prediction is achieved.

Broader Impacts: While the focus of the this proposal is on transmembrane segments prediction and its understanding, it must be emphasized that the new approach and the software tools developed are completely generalizable and can be applied to other domains such as protein secondary structure prediction. The collaborators believe the process described in this proposal is as important as the end product as the proposal is inherently collaborative and cross-disciplinary. As such, the proposal lends itself immediately as a jumping point for increasing the interaction between computer scientists and biologists, important not only as part of modern research approaches to tackling difficult problems in cell biology and complex systems but in exposing students and researchers to both the cutting edge research and problems that are manifest in each of our respective fields.

Project Start
Project End
Budget Start
2006-09-01
Budget End
2008-02-29
Support Year
Fiscal Year
2006
Total Cost
$30,000
Indirect Cost
Name
Georgia State University Research Foundation, Inc.
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30303