This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator. The tsunami of data generated by genome sequencing projects, coupled with the rapid advancement in automated data collection instrumentation has necessitated the need for the design and development of novel informatics methodologies for functional data understanding. However, many of these datasets, both in organized databases and in unstructured domains, frequently suffer from what is commonly refered to as the 'curse of dimensionality.' The challenges associated with bioinformatics data significantly differ from those associated with traditional multi-dimensional databases, since the data is highly correlated and processes a much larger ratio of dimensional attributes compare to the samples. In this research, we have developed novel dimensionality reduction methods using computational principles of data mining. Three different, yet complementary algorithms with unique computational frameworks have been developed and applied to some exigent problems in protein structural mining. The superiority of these methods is demonstrated by the identification of discriminatory residue interaction patterns shared among proteins of the same family. These patterns can then be employed for both the structural and the functional annotation of proteins. We have subjected the approach to several experiments to evaluate the efficacy of the methods and have obtained more than 85% average classification accuracy with a significantly smaller feature vector than previous results in the area. In another approach, we have developed a unique algorithm that dynamically 'shrinks' dimensions of high-dimensional data, such as microarray databases, using spectral aggregation results. The mining of the adjusted data eventually leads to superior sensitivity and specificity in clustering applications. The presentation will include the results highlights, key outcomes, and our future plans.
Showing the most recent 10 out of 179 publications