Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed...

Building High-Performance Networked Systems with Innovative Hardware and Software Techniques

Kai Zhang University of Science and Technology of China

The Growth Trend of Business Data

The volume of business data worldwide is expected to double every 2 years

Source: Oracle

The Growth Trend of Network Speed

Source: IEEE 802.3 Higher Speed Study

Group - Tutorial

Linux, FreeBSD,

Solutions for Next Generation Networked Systems

NETWORKED SYSTEM

OPERATING SYSTEM

HARDWARE

Networked Intrusion Detection System

Firewall

IPSec Gateway

Load BalancerRouter

Key-Value Store

… …

TCP/IP Stack

NIC Driver

Socket

DPDK, Netmap, …New drivers, bypass the OS

How to utilize?

GPUs CPUs with Integrated GPUs Xeon Phi

Solutions for Next Generation Networked Systems

• Demonstration of Two Networked Systems

• Mega-KV• A key-value store system with the highest throughput• 100x higher throughput than Memcached

• Snort with DPDK• Enhance the efficiency of network I/O of Snort with DPDK• A cooperation between USTC and Intel for educational

purpose• To be a course lab for Advanced Computer Networks

Mega-KV: A Case for GPUs to Maximize the Throughput of In-Memory Key-Value Stores

Key-Value Stores๏ A simple but effective method to manage data where a data record

(or a value) is stored and retrieved with its associated key • variable type and length of record (value)• simple or no schema• easy software development for many applications

๏ Key-value stores have been widely deployed in data processing production systems:

Simple and Easy Interfaces of Key-Value Stores

• set (key, value)• value = get (key)

38john_age

Variable-length keys & values

Client

key-value store

GET Key: john_age Value: 38

Key-Value Stores: Examples

Keys Values

Amazon Customer ID Customer profile (e.g., credit card, buying history)

Facebook, Twiter User ID User profile (e.g., friends, photos, posts)

iCloud/iTunes Movie/song name Movie, Song

Distributed file Systems Block ID Block

Workflow of a Typical In-Memory Key-Value Store

Network Processing

Memory Management

Index Operations

Access Value

Workflow of a Typical In-Memory Key-Value Store

TCP/IP Processing

Query ParsingNetwork Processing

MemoryFull

MemoryNot Full

Evict Allocate Memory Management

Delete from Index

Insert into Index

Search in Index Index Operations

Read & Send Value Access Value

SETDELETE

Where does time go in KV-Store MICA [NSDI’14]

Four Data Sets

1) 128B Key 1024B Value

4) 8B Key 8B Value

Access Value

Index Operations

Network Processing (w/ DPDK) & Memory

Management

Index operation becomes one of the major bottlenecks12

Random Memory Accesses in In-Memory Key-Value Stores

QueryNetwork Processing & Memory Management

Access Value

Random Memory Accesses

Index Operation

Hash Table

Random Memory Accesses of Indexing are ExpensiveTi

Number of Memory Accesses1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Sequential memory accessRandom memory access

CPU: Intel Xeon E5-2650v2

Memory: 1600 MHz DDR3

Inabilities for CPUs to Accelerate Random Memory Accesses

2. Prefetch

1. Cache

3. Multithreading

• Not easy to predict next memory address

• Working set is large (~100 GB), CPU cache is small (~10 MB)

• Limited number of hardware threads• Limited number of Miss Status Holding Registers (MSHRs)

CPU spends a large portion of its time idling, waiting for data

Mega-KV addresses two issues: large number of queries and random memory access delay

Network Processing

Memory Management

Access Value

Index Operation To accelerate it by GPUs

DPDK, Multiget, UDP

Bitmap, Optimistic concurrent access

Prefetch

The Goal of Mega-KV

๏ Throughput is the critical issue in big data environment• Throughput measures the capability of a key-value store

system to process a growing amount of queries on an increasingly large data set

๏ Acceptable in-memory key-value store latency• < 1 millisecond, e.g. Facebook, Amazon, …

๏ Our goal: Maximize throughput subject to an acceptable latency

CPU vs. GPU

Intel Xeon E5-2650v2:2.3 billion Transistors

8 Cores59.7 GB/s memory bandwidth

Nvidia GTX 780:7 billion Transistors

2,304 Cores (12 SMXs)288.4 GB/s memory bandwidth

Massive ALUs

ControlCache

Two Advantages of GPUs for Key-Value Stores

GPU Core

cache miss

Thread A

Thread B

Thread C

Instruction Buffer memory request issued, switch to another threadmemory request issued, switch to another thread

1. Massive Processing Units to Address Large Number of Concurrent Queries

• KV Store — simple independent memory access operations• GPU — thousands of cores for parallel processing

2. Massively Hiding Memory Access Latency • KV Store — random memory accesses in index operations• GPUs can effectively hide memory access latency with massive hardware

threads and zero-overhead thread scheduling (a GPU hardware support)

Basic Design of Mega-KV

Pre-Processing

GPU Processing

Post-Processing

Index OperationsNetwork processing,Memory management Read & Send Value

RXDPDK

TXDPDK

Basic Design

Pre-Processing

Network processing,Memory management

Basic Design

Pre-Processing

Basic Design

Pre-Processing

Parallel Processing in GPUs

Basic Design

Pre-Processing

Post-Processing

Basic Design

Pre-Processing

Read & Send Value

Post-Processing

Basic Design

Pre-Processing

Read & Send Value

Challenges of Offloading Index Operations to GPUs

1. GPUs’ memory capacity is small: ~10 GB• Working set may be hundreds of gigabytes

2. Low PCIe bandwidth• PCIe is generally the bottleneck of GPUs if large bulk of data

needs to be transferred

3. Handling variable-length data is inefficient for GPUs• Imbalance load between GPU cores

Our Approach

C C C C

Input data (Keys) Index

GPU optimized cuckoo hash table thatstores key signatures and value locations

C C C C

Compress

Compressed fixed-length signatures

Address challenges 2, 3 (PCIe bandwidth and variable length data)

Address challenges 1, 3(GPU memory capacity and variable length data)

GPU Cuckoo Hash Table

Signaturecompress

Our Approach: Search Index

key comparison

Send to client

ValueKeyKV object

Evaluation - Hardware Setup

Intel Xeon E5-2650v2 octa-core, 2.6GHz

Total 16 CPU cores

Nvidia GTX 780, 2304 cores, 863MHz

Total 4608 cores

Intel dual-port 10Gbps NIC

Total 40 Gbps

Reaching a Record High ThroughputTh

020406080

100120140160180

Data Sets

8B key8B value

16B key64B value

32B key512B value

128B key1024B value

Fastest CPU-based KV store Mega-KV

2.1x1.9x

2.8x2.1x

LatencyC

Round Trip Time (microsecond)

0 60 120 180 240 300 360 420 480 540 600

160 MOPS

95th: 390

50th: 256

Compared withFacebook

1,200 (95th)

300 (50th)

Accelerating the Network I/O of Snort with DPDK

• Snort is a multi-mode packet analysis tool• Sniffer• Packet Logger• Forensic Data Analysis tool• Network Intrusion Detection System

• Snort is able to perform network traffic analysis both in real-time and for forensic post processing

• Snort “Metrics” • Fast (High probability of detection for an attack on high speed networks)• Configurable (Easy rules language, many reporting/logging options)

Snort Architecture

Packet I/O

Packet Decoder

Preprocessor

Detection Engine

Output

Incapability in Handling 10Gbps Network Traffic

Cycles Needed in Snort

1,200 400 - 25,000…+

Packet I/O Detection, etc.

Your Budget

10Gbps, min-sized packets, quad-core 2.66GHz CPUs

(in x86, cycle numbers are from RouteBricks [Dobrescu09] and PacketShader[Han10])

Incapability of Snort for High Speed Networks

• Snort was designed to detect attacks on 100Mbps links• Current network speed reaches 10Gbps and 40Gbps• Snort becomes incapable of detecting intrusions on current backbone

and data center network

• Accelerate network I/O with DPDK• Snort 2.9 introduces the Data Acquisition library (DAQ) for packet I/O• Current supported: pcap, AF_PACKET, netmap, ipfw• Our work: add DPDK support in DAQ

Opportunities from DPDK

Cycles Needed 1,200 400 - 25,000…+

Packet I/O Detection, etc.

EXPERIMENTAL SETUP

• Hardware• CPU: Intel(R) Xeon(R) CPU E5-2650 v3 • NIC: 82599ES 10-Gigabit SFI/SFP+ Network Cards

• Software • Linux 3.19.0 • Snort 2.9.8.0 • DPDK 2.1.0

Snort Modes

• Two Snort modes: Passive Mode and Inline Mode• Snort is bypassed in the Passive Mode• Snort filters packets in the Inline Mode

PassiveMode

SnortInlineMode

Snort Throughput (Inline Mode w/o DPI)

Throughput Stability of DPDK and Netmap

1 2 3 4 5 6

14.86 14.88 14.86 14.88 14.86 14.88

13.7213.90

netmap dpdk

Latency (Inline Mode w/o DPI)

Latency Stability of DPDK and Netmap

1 2 3 4 5 6

0.15 0.16 0.15 0.15 0.15 0.14

0.26 0.25

0.34 0.33

netmap dpdk

Snort Throughput (Passive Mode w/ DPI)

netmap

AF_PACKET

Throughput (Mpps)

0 1.25 2.5 3.75 5

Summary

๏ Mega-KV provides the highest throughput• Fastest in-memory key-value store on commodity processors• More than 100x faster than Memcached• Open source at http://kay21s.github.io/megakv/

๏ Integrating DPDK into Snort• Improving the latency and throughput of Snort• For educational purpose: To be a course assignment in USTC

Thanks!

Building High-Performance Networked Systems with ... · Incapability of Snort for High Speed...

Documents

UNIVERSITY OF JYVÄSKYLÄ “INK AND INCAPABILITY” Verbal

Snort - DEPARTAMENTO INFORMÁTICA · snort snort-common snort-common-libraries snort-rules-default ... plain Text INS . rules Devices Floppy Drive Computer Home Desktop Documents

Intrusion Detection - Snort...Snort configuration file • By default: /etc/snort/snort.conf – A long file (900+ lines of code) – Many . pre-processor. entries • Snort pre-processors

Snort asmt

FUZZY ESTIMATIONS OF PROCESS INCAPABILITY INDEX

Dissecting Snort - Pearson UKcatalogue.pearsoned.co.uk/samplechapter/157870281X.pdf · 44 Chapter 3 Dissecting Snort Figure 3.1 Snort component dataflow. Feeding Snort Packets with

Martin Roesch Sourcefire Inc.. Topics Background –What is Snort? Using Snort Snort Architecture

Snort & ACID

Snort IPS · Snort IPS TheSnortIPSfeatureenablesIntrusionPreventionSystem(IPS)orIntrusionDetectionSystem(IDS)for

The Snort Project April 26, 2010 - Huihoodocs.huihoo.com/snort/snort-2.8.6-manual.pdf · formatin the SnortCVS repositoryat/doc/snort_manual.tex.Smalldocumentationupdatesaretheeasiestwayto

Intrusion Detection using Snort Session E6academy.delmar.edu/Courses/ITSY2430/eBooks/Snort(IntrusionDetectionOverview).pdfOverview of Snort Snort Is . . . zA lightweight Network Intrusion

Snort & Windows 2000 - 8BallNews.com€¦ · Leverage 2000's crypto capabilities ... Grab “Snort.conf” -Install Create 3 Folders: "C:\Snort\" - "C:\Snort\Bin\" - "C:\Snort\Logs\"

Snort Linux

Snort Lecture10

Snort Tutorial

Original Article Antibiotics has incapability of reducing ... · Original Article Antibiotics has incapability of reducing unnecessary prostate biopsies: a meta-analysis involving

Snort Manual

Financial Incapability Structured Clinical Assessment done

Snort & Nmap

Implementasi snort