This subproject is one of many research subprojects utilizing theresources provided by a Center grant funded by NIH/NCRR. The subproject andinvestigator (PI) may have received primary funding from another NIH source,and thus could be represented in other CRISP entries. The institution listed isfor the Center, which is not necessarily the institution for the investigator. Clustering algorithms provide us with an insight into the data, discovering memberships, potential hierarchical relationships, supplemented by record relationships. Due to the large number of algorithms in existence, and the computational unfeasibility to choose a single algorithm that would suit a variety of situations and data sets, there is a pressing need to identify ways to tackle this problem. This problem is exacerbated by the exponential growth of data which drive the need for new approaches to the problem. We work on designing new analytical and visual tools and techniques to provide insights into the single and multiple clustering algorithm results. Our work focuses on the tools that enable biomedical scientists to analyze results and aid data exploration, utilizing visual techniques to present the data. The basic tools of the suite we are developing enable visualizations (i.e. scatterplots, parallel coordinates) and data analyses (i.e. clustering, radviz). We designed and utilized the CDSM measure that displays clusters as an organized heatmap visualization of any combination of clusterings or a hierarchical tree layout based on this meta-data. The algorithm clusters data into meta-groups and determines the largest number of clusterings that each record shares with other records. Graph layouts (line-oriented or force-directed) show clusters of records using color, size, position and connectivity. We extend the two-dimensional displays into the third dimension using a number of indices and measures for inter- and intra-cluster tightness through a set of interaction methods. The techniques we present enable record comparison, cluster size comparison, proximity analysis, cluster consolidation and representation in 2D and 3D space. We show examples of these techniques using a biomedical data set.
Showing the most recent 10 out of 179 publications