Upload
jake
View
23
Download
0
Embed Size (px)
DESCRIPTION
Gnort: High Performance Intrusion Detection Using Graphics Processors. Giorgos Vasiliadis , Spiros Antonatos , Michalis Polychronakis , Evangelos Markatos , Sotiris Ioannidis Institute of Computer Science Foundation for Research and Technology Hellas. General Idea. - PowerPoint PPT Presentation
Citation preview
Gnort: High Performance Intrusion Detection Using Graphics Processors
Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos, Sotiris Ioannidis
Institute of Computer ScienceFoundation for Research and Technology Hellas
General Idea
• How to speed up the processing throughput of intrusion detection systems by offloading the pattern matching operations to the GPU.
2Giorgos Vasiliadis ICS-FORTH
Introduction• The problem
– Network Intrusion Detection Systems (NIDS) are based on String Matching for detecting and preventing from well-known attacks
– String Matching process accounts up to 75% of the total CPU processing• String Matching Algorithms
– Aho-Corasick• Specialized hardware devices (NP, FPGAs, ASICs)
– Complex to modify and program– Poor flexibility
• Graphics Cards– Easy to program– Powerful and ubiquitous– Researches have begun exploring ways to tap their power for non-graphics
applications
3Giorgos Vasiliadis ICS-FORTH
Why use the GPU ?
• The GPU is specialized for compute-intensive, highly parallel computation
4Giorgos Vasiliadis ICS-FORTH
NVIDIA GeForce SIMD Architecture• Many Multiprocessors• Each multiprocessor contains
many Stream Processors• Memory model
– Shared On-Chip Memory• 1 cycle
– Constant Memory• 400-600 cycles; 1 cycle if cached
– Texture Memory• 400-600 cycles; 1 cycle if cached
– Global Device Memory• 400-600 cycles
Siz
e
Giorgos Vasiliadis ICS-FORTH
GPU can be used as a general purpose processor, capable of executing many threads in parallel
The Aho-Corasick Algorithm• Used in most modern NIDSes
Scans for multiple patterns simultaneously
• Preprocess all patterns to build a state machine
• The state machine is used to scan for multiple patterns simultaneously at linear time Complexity is independent of
the number of patterns
Example: P={he, she, his, hers}
6Giorgos Vasiliadis ICS-FORTH
Mapping Aho-Corasick on GPU• How to represent the State Machine ?• Snort represent each state as an array of pointers
– It is difficult to map them on the GPU memory Transform to a 2D array
– Can easily bind to Texture Memory• Texture fetches are cached
• Aho-Corasick exhibits strong locality of references• Random access memory read
The usage of Texture Memory boosts GPU execution time about 19 %
7Giorgos Vasiliadis ICS-FORTH
Parallelizing Packet Searching (1/2)
• Assigning a Single Packet to each Multiprocessor
Each packet is copied to the shared memory of the Multiprocessor
Stream Processors search different parts of the packet concurrently
Overlapping computation• Matching patterns may span
consecutive chunks of the packet
Same amount of work per Stream Processor
• Stream Processors will be synchronized
8Giorgos Vasiliadis ICS-FORTH
Parallelizing Packet Searching (2/2)
• Assigning a Single Packet to each Stream Processor
Each packet is processed by a different Stream Processor
No overlapping computation Different amount of work per
Stream Processor• Stream processors of the same
Multiprocessor will have to wait until all have finished
9Giorgos Vasiliadis ICS-FORTH
Software Mapping
• Packets are transferred to the GPU in batches– Performs much better than making each transfer separately Packets are stored to a buffer that is copied to the GPU when gets full
• Use page-locked memory to store the packets– Higher transfer throughput from host to device– Copies are performed using DMA, without occupying the CPU
• CPU and GPU execution can overlap10Giorgos Vasiliadis ICS-FORTH
Evaluation (1/2)
• Scalability as a function of the number of patterns
11Giorgos Vasiliadis ICS-FORTH
• We ran Snort using random generated patterns• All patterns are matched
against every packet• Payload trace contained UDP
800-bytes packets of random payload
Throughput remains constant when #patterns increases
2.4x faster than the CPU
Evaluation (2/2)
• Throughput as a function of the packets size
12Giorgos Vasiliadis ICS-FORTH
• Ran Snort using 1000 random patterns• All patterns are matched against
every packet 2.3 Gbit/s for full packets 3.2x faster compared to the CPU
Both GPU implementations do not present significant differences in performance
Evaluation with real input and rules
• Experimental setup– Two PCs connected via a 1 Gbit/s Ethernet switch
• To directly compare with prior work [Jacob et al], we re-implemented the Knuth-Morris-Pratt (KMP) and Boyer-Moore (BM) algorithms on the GPU.
Giorgos Vasiliadis ICS-FORTH 13
Evaluation with real input and rules
14Giorgos Vasiliadis ICS-FORTH
• Snort loaded about 8000 patterns.• Preprocessors and PCRE were
disabled Original Snort (AC) cannot process
all packets in rates higher than 300 Mbit/s
GPU-assisted Snort (AC1, AC2) begins to loose packets at 600 Mbit/s 200% improvement
KMP and BM algorithms used from [Jacob et al] perform worse in all cases
Conclusion
• Graphics cards can be used effectively to speed up Network Intrusion Detection Systems.– Low-cost– Easy programming
• Future work includes– Transfer the packets directly from the NIC to the
GPU– Utilize multiple GPUs on multi-slot motherboards
15Giorgos Vasiliadis ICS-FORTH