A number of scientific endeavors are generating data that can be modeled as graphs: high-throughput biological experiments on protein interactions, high-throughput screening of chemical compounds, social networks, ecological networks and food webs, database schemas and ontologies. Mining and analysis of these annotated and probabilistic graphs is crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. The goal of this research project is to develop a set of scalable querying and mining tools for graph databases by integrating techniques from the fields of databases, bioinformatics, machine learning, and algorithms. New algorithms are being developed, and these are being examined for their quality and running time on real datasets. The first set of algorithms addresses subgraph and similarity querying in graph databases. The second set considers the mining of significant subgraphs or motifs. A novel significance model which transforms graphs into histograms of primitive components and examines the significance of motifs in the transformed domain is being developed. The third set of algorithms targets the discovery of well-connected clusters in large probabilistic graphs. The project integrates research and education by introducing the results of the research into undergraduate and graduate courses. Robust open-source tools based on the developed algorithms will be released for other researchers. These will be helpful in the study of the structure and organization of large networks that are becoming increasingly common.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Type
Standard Grant (Standard)
Application #
0612327
Program Officer
Sylvia J. Spengler
Project Start
Project End
Budget Start
2006-07-15
Budget End
2010-06-30
Support Year
Fiscal Year
2006
Total Cost
$530,094
Indirect Cost
Name
University of California Santa Barbara
Department
Type
DUNS #
City
Santa Barbara
State
CA
Country
United States
Zip Code
93106