People routinely encounter and seek to make sense of large collections of data that include both unstructured reports or stories and loosely-structured logs or spreadsheets. In many cases, information of relevance is scattered about a large number of documents and it is the task of an analyst to read the documents and "put the pieces together." For instance, police investigators when sifting through a multitude of observations, case reports, and witness testimonies must develop a coherent view of the events that really occurred. Academic researchers investigating a new domain pour over large numbers of paper abstracts, citations, and articles to develop a better understanding of the state of work in that area. The process of connecting individual pieces of information such as those discussed above into a more coherent narrative is a component of investigative analysis, the main focus of this project. One common element of analytic sense-making activities is that they are cognitively very challenging, frequently involving large collections of data that tax a person's memory, deduction, reasoning, and general analytic capabilities. Investigative analysis today is made even more challenging by the ever-increasing torrent of data available in a world where one can access vast databases and conduct internet searches that in seconds return a quantity of documents no human can read and assimilate in a reasonable amount of time. But technological means of augmenting human memory and analytic reasoning hold great potential as investigative aids. This project explores the development of computational systems to make investigative analysts more effective and more efficient. The PI's approach centers on providing multiple visual representations of the individual pieces of data gathered during the investigation, to help highlight connections or potential connections among them and to help analysts determine the next pieces of data to examine from a large collection of evidence. The PI will draw upon his experience in information visualization and visual analytics to design and create a system to help analysts, and upon his experience in human-computer interaction to evaluate whether the system is effective. The work will include fundamental research on challenges such as the representation of reliability and uncertainty in a visualization display, the development of collaborative system capabilities so that analysts can work together, and the integration of sophisticated automated textual analysis capabilities with the human-directed exploration approach that visual interfaces provide. Careful evaluation of all the new analytic capabilities will accompany their design as well.

Broader Impacts: Investigative analysis is a fundamental activity in law enforcement and in intelligence activities that are important to our national security. This project will invent next-generation visual analytic techniques and technologies that can be used to develop investigative analysis systems in the future. Other domains such as news reporting, academic research, and business intelligence also require investigative analysis, so this project has the potential to impact those fields as well.

Project Report

Many different people and organizations routinely work with large collections of text documents and seek to make sense of them. The documents may be simple unstructured textual documents such as reports and stories or they may be loosely-structured documents such as logs and spreadsheets that combine narrative text with other data. In many cases, information of relevance is scattered about a large number of documents and it is the task of an analyst to read the documents and understand the big picture. For instance, police investigators who are sifting through a multitude of case reports, witness testimonies, and observations must develop a coherent view of the events that really occurred. Academic researchers investigating a new domain pour over large numbers of paper abstracts, citations, and articles to develop a better understanding of the state of work in that area. The process of connecting the individual pieces of information like those discussed above into a more coherent narrative is a component of investigative analysis, the main focus of this project. An investigative analyst gathers individual chunks of evidence that may range from incident reports filed by detectives in the field to open source news reports such as articles gathered via web searches. The analyst then must look for connections in the individual documents to link separate activities into a larger plot or scheme. Sometimes the connections can be relatively clear, for instance, a particular individual mentioned in different reports. Alternatively, the connections can be more difficult to discern, for instance, approximate overlaps in time or location. One common element of all these kinds of analytic sense-making activities is that they are cognitively very challenging, frequently involving large collections of data that tax a person’s memory, deduction, reasoning, and general analytic capabilities. In this project we developed new technologies and techniques to help investigative analysts more efficiently and more effectively explore collections of text documents. Our approach combines computational text analysis algorithms with interactive visualizations of the documents, their text, and the analysis results. This approach of combining computational analysis with visualization is called visual analytics. The developed algorithms and visualization techniques have been embodied in a visual analytics system called Jigsaw that we created. Jigsaw pairs computational analysis of the documents with a collection of visualizations that each portray different aspects of the documents, including connections between different entities in the documents’ text (such as persons, places, dates, and organizations). Thus, the system acts like a visual index onto a document collection, highlighting connections between entities and allowing the investigator to understand the context of events in a more timely and accurate manner. Jigsaw helps analysts "put the pieces together" and link initially unconnected activities into a more coherent story across a document collection. We have placed the Jigsaw system on the web for anyone to download and use. The project website is www.cc.gatech.edu/gvu/ii/jigsaw. This website includes links to the academic publications that we have written about Jigsaw, example videos of use on different document collections, a manual, as well as tutorial videos to assist in learning to use the system. Hundreds of people and organizations have downloaded the system and are using it in their own work. For example, analysts in domains such as law enforcement, intelligence, fraud investigations, genomics, investigative reporting, software development, product reviews, banking and finance, and many others have used the system. Additionally, we used Jigsaw while competing in a number of academic contests/challenges on document analysis and we received a number of awards for our performance on the challenges. The research project also included an evaluation study where we compared a hypothetical intelligence analysis scenario performed with Jigsaw and with other existing tools and techniques. Study participants using Jigsaw generally performed better and more accurately identified a hidden threat. We also interviewed six people who had been using the system for a time period of 2-14 months to better understand how they were using it, the aspects of system most helpful, and potential limitations or improvements that could be made in the system.

Agency
National Science Foundation (NSF)
Institute
Division of Information and Intelligent Systems (IIS)
Application #
0915788
Program Officer
Ephraim P. Glinert
Project Start
Project End
Budget Start
2009-08-01
Budget End
2013-07-31
Support Year
Fiscal Year
2009
Total Cost
$489,671
Indirect Cost
Name
Georgia Tech Research Corporation
Department
Type
DUNS #
City
Atlanta
State
GA
Country
United States
Zip Code
30332