The primary goal of this project is to provide a novel framework and software that will empower users to more effectively understand and apply information discovered in large distributed data environments. The approach is to provide tested and novel data analytics techniques, supported by a framework that integrates expertise and insight of human users. The novel combination of the skills, abilities and experience of human users with the sheer processing power of distributed computation provides an extraordinary synergistic information-processing potential -- termed Interactive Automation. This framework places the user explicitly in the center of the Design, Execution, and Analysis processing modes, allowing them to switch between these contexts fluidly as well as react and guide the runtime processing of complex, distributed and dynamic datasets.
A secondary goal is to provide an authoritative, accurate, anonymized and openly available set of ground truth data in the law enforcement domain. At present, no such ground truth dataset currently exists, and currently analytics tools are evaluated using proprietary, confidential or otherwise closed datasets. Stable releases of the system modules, documentation, related publications and results from usability studies are available at the project website (www.dimacs.rutgers.edu). The broader impacts of this work lie regionally with our partners (in law enforcement, medicine and education), and broadly through dissemination of the dataset and software in academia and industry. In addition, the availability of the dataset will support related research efforts by providing a foundation for objective, comparative and scientific analysis of law enforcement data analytics tools.
Advanced Analytics for Data Processing This NSF research project focused on developing new methods to automatically analyze data using data analytics technology. The foundational technology at the heart of the data analytics tools is Higher Order Learning (HOL), which has patents both granted and pending. HOL provides a set of capabilities that, when utilized in data analytics tools, demonstrate significant improvements in performance compared to other lauded tools in use today, including Support Vector Machines (SVMs). Notably, HOL also delivers this improved performance when provided with very little data, a common situation encountered in web-based as well as embedded systems. HOL has been applied in diverse domains including cyber-security, e-commerce and counter-terrorism. The patent-pending HOL technology and related components also serve as the foundation for a commercial product line of entity resolution, link analysis, prediction and other information processing tools. Higher Order Learning Higher Order Learning (HOL) has been developed through extensive core research and formal in situ evaluations. When compared with other prominent analytics tools (such as Naive Bayes and Support Vector Machines), HOL-enhanced algorithms demonstrate statistically significant improvements in performance across a number of sample sizes, especially on very small samples of data. In one set of experiments, a HOL-enhanced version of Naive Bayes was able to distinguish high quality discriminators down to a 5% sample size, far beyond the capabilities of standard Naive Bayes. The HOL framework’s acuity enables and delivers automated analytics more efficiently, across more data sources, and at a reduced knowledge engineering cost than traditional analytics technology. For example, when deployed on live data sets from a U.S. Air Force collaborator, HOL technologies achieved performance values more than double that of traditional approaches. Technology Commercialization HOL technology has been licensed by an early stage technology development company offering customer organizations advanced information management and decision support solutions, Intuidex, Inc. Intuidex has strong R&D and active projects with key organizations in the primary target markets of Law Enforcement, Defense, Intelligence and Fortune 500 businesses. Intuidex provides a special combination of strong working relationships, including software engineering and co-development with the Pacific Northwest National Laboratory, cutting-edge research with the CCICADA DHS Center of Excellence in Advanced Data Analysis (ccicada.org), and on-site design and development with Law Enforcement Agencies including the Port Authority of NY and NJ. Concurrently, this research effort was acknowledged by the National Science Foundation for increasing crime prevention capabilities through advanced analytics while preserving data privacy and civil liberties. A video interview with the Principal Investigator on this topic is featured on the NSF web site. HOL technology has also been incorporated into a real-time news recommender system developed by Intuidex that is now in use by a major news and media company on its website.