The goal of this project is to conduct an in-depth research on a series of fundamental, open, but very important issues leading to the development of a unified theory consisting of revolutionary methods on the general relational data community discovery and learning. The specific approaches include novel methods on statistical machine learning and data mining with unified views to link to the existing literature in the areas of machine learning and data mining. The intellectual merit includes the development of comprehensive understanding of the general unsupervised relational data clustering and learning as well as the expected breakthrough in the community discovery and learning methodologies.

The broader impacts include promoting the timely and effective knowledge dissemination on relational data mining and statistical machine learning, including project?s web site (www.fortune.binghamton.edu/nsf-iis-0812114.htm), as well as the technology development and transfer in a wide range of applications. Educational and outreach activities include providing educational and research experience for the university students. In addition, activities emphasize advancing and enhancing the high school education and syllabi in sciences and developing an integrated model for high schools' research and services for the local community and beyond.

Project Report

. The Intellectual merit of this project includes the revolutionized understanding of the literature on relational data knowledge discovery and learning with the achieved outcomes that have substantially advanced the literature in the related areas noticeably including data mining and machine learning. The broader impacts of this project are in two folds. Educationally, the development, the implementation, and the evaluation of the innovative community outreach activities in this project have promoted the timely and effective knowledge dissemination in the related areas and have further enriched the pedagogical literature; the disseminated knowledge to the collaborating high school has further advanced and enriched the high school's education services to the local community and their syllabi to better train their students at the school. Technologically, the achieved outcomes of the research in this project have led to the development of a suite of powerful technologies as the effective solutions to a wide spectrum of important real-world problems that shall generate substantial societal impacts. It is well-observed that the whole world is full of data and typically the data are highly related in terms of different types of data objects such as people, organizations, and events. In many real-world applications, it is intended to discover the hidden structures through such relationships involving different types of data objects. For example, in designing effective commercial sales promotion strategies, it is often needed to discover the connections between the specific consumer groups and the specific commercial products, the connections between the specific products and the specific product providers, and the connections between the product providers and the product manufacturers. The major outcomes of the research activities of this project are the development of a suite of novel theories on relational data community discovery and learning as well as the related solutions to a wide spectrum of real-world problems relevant to relational data learning with both academic and societal impacts. These problems range from the knowledge discovery from citation networks, to effective recommendations of scientific papers, to knowledge discovery and learning from one application domain to another, to automatic understanding and learning the textual description of the semantic content of non-textual documents such as imagery and video, to pattern change discovery from high-dimensional data. The major outcomes of the educational and community outreach activities of this project are the effective knowledge dissemination of the resulting findings to the staff of the collaborating partners, the successful completion of the training of the graduate, undergraduate, and high school students as well as the school teachers, and the development of the innovative high school student scientific independent study component into the existing high school curriculum. Two examples of the major research outcomes are the knowledge discovery from citation networks and the pattern change discovery from high-dimensional data. Citation networks mean the text documents with links and they are the very popular data existing almost everywhere in today's world. Examples of citation networks include emails, webpages, and scientific papers. This outcome enables the development of the technologies to automatically discover the topics as well as their distributions given a collection of textual documents, to automatically discover the relationships among the discovered topics of the text documents, and to automatically predict the future "hot" topics if the text documents evolve with time (e.g., to predict the hot research topics in physics in five years given all the physics literature at present). High-dimensional data mean the data that are represented in a high dimensional space and they also exist almost everywhere in today's world. Exanples of high-dimensional data include text documents such as news articles and scientific papers, imagery archives, and surveillance video. This outcome enables the development of the technologies to automatically detect the breaking news, to automatically detect the outliers from text or image archives, and to automatically detect events from surveillance video. Overall, this project is successfully completed with having substantially advanced the related literature generating significant academic impacts through publishing a large number of research papers at highly competitive and selective venues, with having generated a large number of powerful technologies as solutions to many important real-world problems, with having completed the training of a number of graduate, undergraduate, and high school students and teachers, and with the development of an innovative component of the existing high school curriculum.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0812114
Program Officer
Maria Zemankova
Project Start
Project End
Budget Start
2009-01-01
Budget End
2013-12-31
Support Year
Fiscal Year
2008
Total Cost
$443,000
Indirect Cost
Name
Suny at Binghamton
Department
Type
DUNS #
City
Binghamton
State
NY
Country
United States
Zip Code
13902