Pattern matching is a key function in many data intensive computing applications ranging from deep packet inspection, text processing, to genomic research. The explosive growth of digital information in the form of webpages, XML documents, network traffic and scientific data has put an enormous pressure on the performance requirements of large-scale pattern matching. This research will study the use of innovative algorithms and architectures on ASIC/FPGA and multi-core platforms to accelerate large-scale pattern matching for network security, data mining and filtering applications. Various types of pattern matching will be considered, including regular expression matching, dictionary-based string matching, and extended regular expression matching. The intellectual merit of this proposal includes the innovation in algorithms and architectures for matching large pattern sets against high bandwidth data input.
The proposed research will be conducted from two perspectives: (1) Novel algorithms and data structures for large-scale pattern matching; such as finite automata, dynamic search tree, formal language and graph theory. (2) Practical optimization techniques for pipelining, partitioning, scalable and modular designs on parallel architectures with ASICs/FPGAs, multi-core processors and general-purpose graphics processors (GPGPUs). Instead of producing heuristics specific to a particular input or pattern set, the proposed research aims to improve the fundamental understanding of large-scale pattern matching, and apply the understanding to both algorithmic and architectural innovations. This allows exploration of the design limits and tradeoffs in using practical optimizations on state-of-the-art computing platforms. The designs will be mapped onto parallel architectures based on both FPGA and multi-core technologies, including CPU-FPGA and CPU-GPU heterogeneous architectures.
The increasing complexity and heterogeneity of cyberspace has led to an increase in the sophistication of network attacks and in the number of malicious patterns in the Internet. Recent advances in computing technology can dramatically accelerate innovation to solve complex problems of pattern matching and filtering. These state-of-the-art computing platforms include multi-core and many-core processors, general-purpose graphics processing units (GPGPU), and field programmable gate arrays (FPGA). These platforms provide distinctive features of performance beneficial to network processing. This project aimed to leverage the disruptive potential of parallel computing platforms for efficient pattern matching in Network Instruction Detection (NID) or Deep Packet Inspection (DPI). The project developed formalisms for regular expression matching on parallel architectures that goes beyond state-of-the-art string matching schemes. The goal of the project was to produce conceptual designs that will enable scientists to leverage the disruptive potential of parallel computing platforms for efficient pattern matching. Successful research led to the following capabilities on state-of-the-art reconfigurable computing platforms: 1. A framework of scalable pattern matching architectures using Non-deterministic Finite Automata (NFA) 2. A context switching mechanism to match patterns for a large number of packet flows, based on the NFA framework 3. An NFA-based efficient pattern matching engine on state-of-the-art FPGA, achieving high throughput and resource efficiency. Through these engagements, the PI concluded that for the cybersecurity domain high performance could be achieved for pattern matching using efficient algorithms and reconfigurable hardware architectures.