Evidence chaining and "dipping" are powerful analytical paradigms to explore entities and their connections in semantically rich, multi-source databases. Starting from a small set of seeds such as known suspects or the result of some exploratory query, an analyst can draw from various data sources to explore the space of entities connected to these seeds via any number of relations. However, since there are often many relations, facts, attributes and transactions associated with each entity, they can be richly interconnected which can quickly lead to very large numbers of objects linked to the initial seeds.
Relationship simplification is one approach to reduce this space effectively and provide a more abstract view for analysts by (1) reducing the number of different relations through abstraction and normalization, and (2) focusing on strongly and relevantly connected objects by computing a measure of connection strength or relevance. To address (1) we propose to use knowledge representation and reasoning (KR&R) technology, which allows us to represent evidence at very high fidelity, utilize sophisticated ontologies and domain theories, have a natural means to represent abstraction and meta-knowledge, map easily between different representations and exploit powerful inference procedures to make implicit relationships explicit. To address (2) we need to consolidate and aggregate all relations between two objects, statistically contrast them with connections to and among other entities and compute a measure of closeness or interestingness to filter out irrelevant or uninteresting objects and connections. To dynamically compute connection strength, we propose to use an information theoretical model to determine the weight of each relation as well as to take the context of a relationship into account. This will allow us to aggregate all relations between objects and measure closeness between them to simplify a large data space and compress it into a more abstract, simplified view.
We will integrate the Relationship Simplifier with the BLACKBOOK system for uniform access to data and results and communication with other components. In addition, our components can also access relational data directly from one or more relational databases which is useful when dealing with very large datasets.