Clustering Software for Biomedical Applications

Chen, Yu

Abstract

We propose to provide clustering software for very large databases and for categorical data. Investigators in virtually all areas of research seek to discover patterns and relationships in data. Computer intensive exploratory analysis, or data mining, is having a huge impact in science and industry (e.g. Berkhin 2002, Maitra 2002). However, the availability of software for obtaining partitions and for their visualization lags far behind the proliferation of proposed methods and the growth in size of available databases. We believe that implementing new algorithms for clustering of large datasets that may include non-numeric attributes, and visualizing cluster properties will open new opportunities for data analysis. ? ? In Phase I, we developed scalable implementations of clustering methods, including k-means and its extensions to categorical and mixed mode data, and demonstrated that we could discover things about data through a combination of clustering and visualization that neither alone could provide. Our ultimate goal in Phases II and III is to develop a modular addition to the S-PLUS language called S+CLUSTER that provides the following key features: ? ? - A suite of clustering algorithms suitable for large and possibly high-dimensional datasets that may include categorical attributes; ? - Extensive capabilities for visual data exploration of the results of clustering; and ? - Tools for validation and diagnostics facilitating objective assessment of clustering results. ? ? We intend to create software that is flexible and easy to use, and which should enable the analysis and understanding of data from a wide range of applications. Clustering or unsupervised classification has been used in genetics research, protein classification, psychiatric research, analysis of biomedical signals, segmentation of medical images, etc. The software will be part of an integrated environment for data analysis, and it will permit the customization of the clustering process, which will extend the ability of biomedical researchers to understand complex data. New insights into microarrays, epidemiological data and protein database may have high potential in drug discovery, disease diagnosis, and treatment. ? ? ?

Funding Agency

Agency: National Institute of Health (NIH)
Institute: National Center for Research Resources (NCRR)
Type: Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #: 5R44RR016386-03
Application #: 7294271
Study Section: Biomedical Computing and Health Informatics Study Section (BCHI)
Program Officer: Brazhnik, Olga

Project Start: 2001-07-01
Project End: 2008-09-29
Budget Start: 2007-09-30
Budget End: 2008-09-29
Support Year: 3
Fiscal Year: 2007
Total Cost: $364,606
Indirect Cost

Institution

Name: Insightful Corporation
Department
Type
DUNS #: 150683779

City: Seattle
State: WA
Country: United States
Zip Code: 98109

Related projects


NIH 2007 R44 RR	Clustering Software for Biomedical Applications Chen, Yu / Insightful Corporation	$364,606
NIH 2006 R44 RR	Mr. Vetro: A Collective Simulation Framework for Health Science Education Koperski, Krzysztof / Insightful Corporation	$359,770

Comments

Be the first to comment on this grant

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Related projects

Comments