Data Mining in the Presence of Quantitatively and Qualitatively Diverse Information

Denton, Anne

Abstract

Real data often show a more complex structure than is assumed in much of statistics, machine learning, or data mining. Objects may be characterized by diverse types of information such as numerical quantities, text, and properties of a network neighborhood. The goal of the project is to develop techniques to integrate information components that differ both quantitatively and qualitatively. Classification algorithms that are based on homogeneous attributes can be evaluated exclusively by their overall classification quality. In the presence of qualitatively and quantitatively diverse information, the search space of all possible combinations of techniques and parameters is too large to be evaluated by any reasonable amount of test data. Three goals are pursued: (1) defining intermediate, homogeneous attributes that allow effective use of uniform classification and clustering techniques; (2) developing robust criteria that allow identification of suitable intermediate attributes and do not exclusively rely on overall classification accuracy; and (3) developing efficient and effective approaches to generate intermediate attributes from data with network connectivity, time-dependent data, text and other types of data. Starting with a specific classification problem in bioinformatics, the project attempts to find solutions that are applicable to a wide range of data mining problems. The work is ideally suited to teach students a broad range of research activities from fundamental concepts to applications, both in thesis and course work. Results will be of relevance to a large number of practical applications in bioinformatics and other sciences. The project Web site (www.cs.ndsu.nodak.edu/~adenton/IDM/) is used for dissemination of up-to-date results, including software, demonstration of newly developed techniques, comprehensive examples based on biological data to benefit researchers from biological sciences, and generally understandable examples (such as for a movie database) will enhance outreach, and demonstrate generality of results.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0415190
Program Officer: Maria Zemankova

Project Start
Project End
Budget Start: 2005-07-01
Budget End: 2009-06-30
Support Year
Fiscal Year: 2004
Total Cost: $272,557
Indirect Cost

Data Mining in the Presence of Quantitatively and Qualitatively Diverse Information
Denton, Anne
North Dakota State University Fargo, Fargo, ND, United States

Abstract

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Funding Agency

Institution

Comments