III: Small: Techniques for Integrated Analysis of Graphs with Applications to Cheminformatics and Bioinformatics

Singh, Ambuj

Abstract

The first research thrust examines primitives for graph data management and graph mining. A declarative query language for graphs is being investigated. This language is based on a formal language for graphs and a graph algebra, and separates the concerns of specification and implementation. Scalability of techniques for similarity search on graphs and mining for significant patterns is being investigated as a part of this thrust.

The second research thrust applies the developed techniques to the domain of Cheminformatics. Specific tasks that are being examined are search for similar compounds, mining for significant motifs, diversity analysis, and analysis of macromolecular complexes.

The final research thrust applies the developed methods to the domain of Bioinformatics. There has been an explosion of data of widely diverse biological data types, arising from genome-wide characterization of transcriptional profiles, protein-protein interactions, genomic structure, genetic phenotype, gene interactions, gene expression, proteomics, and other techniques. Techniques being developed can integrate and analyze data from multiple sources and models efficiently, while accelerating (interaction and function) prediction, and pathway discovery.

Further information about the project can be found at the project web page www.cs.ucsb.edu/~dbl/0917149.php.

Project Report

A number of scientific endeavors generate data that can be modeled as graphs: high-throughput biological experiments on protein interactions, high throughput screening of chemical compounds, social networks, ecological networks and food-webs, database schemas and ontologies. Access and analysis of the resulting annotated and probabilistic graphs are crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. This project developed a set of scalable querying and mining tools for graph databases by integrating techniques from databases and data mining. The research work was theoretical as well as empirical. New theoretical ideas and algorithms were developed and these were applied to the domains of Cheminformatics and Bioinformatics. We worked on the following two specific problems. 1. Analysis of global state networks: Global-state networks provide a powerful mechanism to model the increasing heterogeneity in data generated by current systems. Such a network comprises a series of network snapshots with dynamic local states at nodes, and a global network state indicating the occurrence of an event. These networks arise in biology (pathways implicated in a disease), learning (brain regions activated in a learning task), and social networks (sentiments of users). 2. Top-k representative queries on graph databases: We investigated the problem of top-k representative queries on graph databases. Such queries are useful when a user wants to obtain a quick summary of a large collection of graphs based on his/her definition of relevance.

Funding Agency

Agency: National Science Foundation (NSF)
Institute: Division of Information and Intelligent Systems (IIS)
Type: Standard Grant (Standard)
Application #: 0917149
Program Officer: Sylvia J. Spengler

Project Start
Project End
Budget Start: 2009-09-15
Budget End: 2013-08-31
Support Year
Fiscal Year: 2009
Total Cost: $509,261
Indirect Cost

III: Small: Techniques for Integrated Analysis of Graphs with Applications to Cheminformatics and Bioinformatics
Singh, Ambuj
University of California Santa Barbara, Santa Barbara, CA, United States

Abstract

Project Report

Funding Agency

Institution

Comments

Recent in Grantomics:

Recently viewed grants:

Recently added grants:

Abstract

Project Report

Funding Agency

Institution

Comments