Automated analysis of public health data represents a critical need, but effective analysis must look beyond individual data points. Much of the data that is collected is structural, consisting not only of entities but also of relationships (e.g., spatial,temporal) between the entities. As a result, a need exists to develop methods for discovering knowledge and learning concepts specifically for this type of structural data. A graph-based data mining technique that can perform pattern discovery, concept learning, and hierarchical clustering on data represented as graphs. This approach, implemented in the Subdue system, has demonstrated success in a numbeof scientific and industrial databases. The proposed effort will investigate the viability ofgraph-based data mining approach as a foundation for representing and ministructural data found in public health databases and related applications.
The effort will contribute 1) an analysis of public health data that explores data points, relationships between the data points, and integration of data from related domains to strengthen the results, 2) design of a graph-based mining system that can handle streaming data in an online fashion, 3) development of a new approach to concept learning that processes training examples embedded n a single interconnected graph, and 4) construction of a toolset that can provide early detection and assessment of epidemics and other public health crises. The project depends on a strong partnership between computer scientists and an expert in public health. A collaboration between the University of Texas at Arlington and the University of North Texas Health Science Center has already received initial support from the two schools The collaboration will be fostered through monthly seminars and research meetings. The results of this project will thus have an impact on the computerscience community and an equal, if not greater, impact on the domain community The code and data will be available for general dissemination over the Internet, and results will be integrated into the classroom and into a book on graph-based data mining.