Real data often show a more complex structure than is assumed in much of statistics, machine learning, or data mining. Objects may be characterized by diverse types of information such as numerical quantities, text, and properties of a network neighborhood. The goal of the project is to develop techniques to integrate information components that differ both quantitatively and qualitatively. Classification algorithms that are based on homogeneous attributes can be evaluated exclusively by their overall classification quality. In the presence of qualitatively and quantitatively diverse information, the search space of all possible combinations of techniques and parameters is too large to be evaluated by any reasonable amount of test data. Three goals are pursued: (1) defining intermediate, homogeneous attributes that allow effective use of uniform classification and clustering techniques; (2) developing robust criteria that allow identification of suitable intermediate attributes and do not exclusively rely on overall classification accuracy; and (3) developing efficient and effective approaches to generate intermediate attributes from data with network connectivity, time-dependent data, text and other types of data. Starting with a specific classification problem in bioinformatics, the project attempts to find solutions that are applicable to a wide range of data mining problems. The work is ideally suited to teach students a broad range of research activities from fundamental concepts to applications, both in thesis and course work. Results will be of relevance to a large number of practical applications in bioinformatics and other sciences. The project Web site (www.cs.ndsu.nodak.edu/~adenton/IDM/) is used for dissemination of up-to-date results, including software, demonstration of newly developed techniques, comprehensive examples based on biological data to benefit researchers from biological sciences, and generally understandable examples (such as for a movie database) will enhance outreach, and demonstrate generality of results.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0415190
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2005-07-01
Budget End
2009-06-30
Support Year
Fiscal Year
2004
Total Cost
$272,557
Indirect Cost
Name
North Dakota State University Fargo
Department
Type
DUNS #
City
Fargo
State
ND
Country
United States
Zip Code
58108