30
Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of IXP1200 Network Processor Packet Filtering Software and Parameterization for Higher Performance Network Processors Shyamal H. Pandya

Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

1

Implementation of IXP1200 Network Processor Packet Filtering Software and

Parameterization for Higher Performance Network Processors

Shyamal H. Pandya

Page 2: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

2

Agenda

• Introduction and Goal of the Thesis

• Brief description of IXP1200 Network Processor and the ENP-2505 ESB

• Software Environment

• Packet Filter Design

• Implementation

• Tests, Results and Parameterization

• Conclusion

Page 3: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

3

Introduction• Network Processors

– A class of programmable processors designed for applications – flexible and efficient alternative to ASICs and General

Purpose Processors– Employ several architectural features to achieve their design

goals:• A number of processing elements• Intelligent and fast memory units and buses• Instruction set architecture specifically tailored for packet

processing operations– Examples: Intel IXP1200, IBM PowerNP series, Vitesse

IQ2200

Page 4: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

4

IXP1200• Belongs to the IXP family of Network Processors from

Intel (IXP1200, IXP2400, IXP2800)

• Major Components

– Intel StrongARM core processor

– Six programmable RISC microengines

• 4 hardware contexts per microengine

• instruction set tailored to suit network applications

– Memory Units

• 32-bit SRAM unit supporting upto 8 MB

• 64-bit SDRAM unit supporting upto 256 MB

• 8 KB of 32-bit Scratchpad Memory

Page 5: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

5

Goal• Network Processors targeted towards network applications - e.g.

routers, VoIP, intrusion detection, packet filtering.

• These applications are characterized by the need to process packets at extremely fast rates to keep up with the speed of network traffic.

• Goal: to investigate the programmability of the IXP1200 through the design and implementation of a packet filter.

• Linux IP Tables - the Linux packet filtering framework, chosen as the basis of our packet filter.

• Parameterization - based on the experiences with packet filter implementation on the IXP1200, the architectural enhancements of the IXP2400 and higher performance network processor of the same family is analyzed to estimate its benefits.

Page 6: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

6

IXP1200 in more Detail

Page 7: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

7

IXP1200 in Operation

Page 8: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

8

ENP-2505

• ESB based on IXP1200

• Pluggable in a PCI slot of a host computer

• Supports 4 10/100 Mbps ethernet ports

• 8 MB SRAM, 256 MB SDRAM

• StrongARM core processor and Microengines operate at 232 MHz

• 8 MB of flash memory that holds a RAM disk.

ENP-2505 and Host Setup

Page 9: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

9

Programming Model• The ACE framework - A software framework to design

applications that consists of isolated software components performing well-defined tasks

– An ACE encapsulates the tasks or modules performing independent packet processing functions

– One or more input targets and one or more output targets

– Packets arrive at the input targets, are processed within the ACE and are transmitted through one of its output targets

– An ACE can be bound to another by binding its output target to the other’s input target

– An application is comprised of several ACEs bound to each other

Page 10: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

10

Example ACE Application (Packet Forwarder)

Page 11: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

11

MicroACE

• An extension to the ACE model: part of the ACE implemented on core processor, other part on the microengines

• Microblock performs fast path packet processing

• Core component a conventional ACE, manages the microblock

• MicroACE model can be exploited to divide the tasks between the microengines and the core processor

Forwarding Application using MicroACEs

Page 12: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

12

Packet Filter Design• IP Tables

– Packet filtering Infrastructure for the Linux OS

– A set of modules that maintain tables of rules

– A rule contains a specifications in terms of values that fields of a header must match and a target (ACCEPT/DROP)

– Tables correspond to the kind of manipulation a packet undergoes - e.g. filter table, NAT table etc.

– Table contains a number of chains, each chain to be traversed at particular points in the packets path, e.g INPUT, OUTPUT, FORWARD

– Extensibility - each rule has at a minimum specs for IP Header matching. More examination can be specified by adding match structures, e.g tcp_match structures has specifications for matching packet TCP headers.

Page 13: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

13

Packet Filter Design - Data Structures

Page 14: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

14

Packet Filter Design - Algorithm• For each rule in the chain of interest

– match packet IP header against the specs in the rule. If the match succeeds, look for other match structures in the rule.

– match the packet against each match structure found in the rule. If the packet satisfies all matches, the packet has successfully matched the rule.

– For a successful match, look at the target of the rule• if the target is ACCEPT, let the packet pass• if the target is DROP, drop the packet and free its

resources– For unsuccessful match, go to the next rule and repeat the

process • last rule matches all packets. Target specified is default policy

Page 15: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

15

Implementation

• Task Division between the core processor and the microengines

– Data Plane(Microengines): Ingress, Filtering, Forwarding, egress.

– Control Plane(Core): Filter table, route table management.

– Management Plane(Core): User Interface, Deployment

• Chains - INPUT, OUTPUT, FORWARD

– INPUT and OUTPUT chains are traversed infrequently– FORWARD chain is used most frequently, hence implemented

on microengines• Software Components

– Ingress, Egress, Forwarder MicroACEs and Stack ACE. Provided as part of SDK.

– PacketFilter MicroACE - Designed and Implemented as part of the thesis.

Page 16: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

16

Implementation

Application Design in terms of MicroACEs

Page 17: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

17

Implementation

• User Interface - iptables command

– used to manipulate filter table by adding, deleting, inserting, replacing rules

– an executable and libraries implement the user interface

– Algorithm

• parse the command line,validate all the options and arguments

• obtain a local copy of the filter table by making a cross-call to the PacketFilter core component

• modify the local copy according to the command

• make a cross-call to the PacketFilter core component to replace old filter table with the new one, passing the modified filter table as argument

Page 18: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

18

Implementation

• PacketFilter Core Component

– Initialization

• Control Data Structures, filter table allocation in SRAM, patching filter table address to microcode

– Cross-call Interface

• function do_replace, used by user interface to replace the current filter table with a new filter table in the SRAM

Page 19: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

19

Implementation

• Microcode - Each microengine can run more than one microblock

• Flow of control is governed by a dispatch loop running on each enabled microengine

• Microblock partitioning across microengines

Page 20: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

20

Implementation• Dispatch Loops - Microengine 0

– Initialize the Ingress and PacketFilter Microblocks

– In an infinite loop, do the following• Call Ingress Microblock

• If a packet has arrived, call the PacketFilter Microblock, else if there is an exception, queue the packet for Ingress core component, else continue from beginning of the loop

• If PacketFilter microblock returns ACCEPT, queue the packet for Microengine 2, running the Forwarder

• If PacketFilter microblock returns DROP, drop the packet

– Every SA_CONSUME_NUM times around the loop, poll the Core to ME packet queue for packets from core components. If there is a packet, determine its source (Ingress core or PacketFilter Core) and call the corresponding microblock

– SA_CONSUME_NUM - tunable parameter to control frequency of memory accesses w.r.t. Core to ME packet queue

Page 21: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

21

Implementation• Dispatch Loops - Microengine 2

– Initialize the Forwarder Microblock

– In an infinite loop, do the following• Poll the packet queue from Microengine 0 to see if there is a packet.

• If packet available, call the Forwarder microblock, else continue from the beginning

• If Forwarder microblock returns success, queue the packet for microengine 5 to be scheduled for output, else if it returns an exception, queue the packet for the core component, else drop the packet

– Poll the Core to ME packet buffer every SA_CONSUME_NUM times, and if there is a packet from the core component, call the microblock

Page 22: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

22

Implementation

• Dispatch Loops - Microengine 5– Initialize Egress microblock

– 4 output queues, contain packets for each output port

– Context 0 polls the 4 output queues in a round-robin manner

– Contexts 1-3 fill up the TFIFO with data from the current packet to be transmitted

• PacketFilter Microblock macros– PacketFilter() - main macro

– ip_packet_match() - called from PacketFilter()

– ipt_tcp_match() - TCP extension to core packet filtering code

Page 23: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

23

Implementation - Microengine Re-tasking• Triggered when the first rule specifying TCP match specs is added to the

table

• Implementation– Core component sends inter-thread signals to all threads of microengine 0

– Each time around the dispatch loop, each thread checks for a signal

– If signal is present, the thread stops its execution and sends interrupt to the StrongARM

– Interrupt Handler - when an interrupt is received from each of the 4 threads of microengine 0, it wakes up the process sleeping on the interrupt (PacketFilter core component)

– The core component disables microengine 0, reloads it with a new image containing ipt_tcp_match() macro and enables the microengine

• Above design makes sure that microengines are not interrupted while processing a packet thus preventing packet loss

Page 24: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

24

Tests and Results

• Test setup

• Packets sent from host machine to the notebook

• Libnet library used to build packets

• host machine runs tcpdump and windows laptop runs ethereal

Page 25: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

25

Tests and Results• Experiment 1 - Code size

• Experiment 2 - Packet filtering operations

– various commands to add, delete rules from the filter table

– packet filtering operations performed correctly from observations of packet transmission and reception from tcpdump and ethereal

Page 26: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

26

Tests and Results• Experiment 3 - performance penalty due to task partitioning

across microengines

• Experiment 4 - Microengine Re-tasking

– command to add a TCP match specs rule to the filter table

– Microengine 0 was re-tasked successfully and packet filtering operations continued

Page 27: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

27

Parameterization• IXP2400 Network Processor

– Higher performance network processor of same family, with significant architectural enhancements

• Microstore (4Kb v/s 16KB)– 1 K instructions limit - split tasks across 2 microengines– 4K instructions: not necessary, performance penalty

avoided

Page 28: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

28

Parameterization– Total number of microwords for

Ingress+PacketFilter+Forwarder = 1156

– extra instruction store space can be used for other components, UDP match, limit match, NAT, connection tracking

• Number of Microengines and Contexts– IXP1200 serves 8 ports with 16 contexts for input and 8

contexts for output to forward packets– Number of context per microengine is doubled, so each

microengine can serve 4 ports for the input process (2 contexts per port as in IXP1200)

– with 5 microengines for input and 3 for output, the number of ports service could be 20

Page 29: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

29

Parameterization• Next neighbor register set

– data sharing very fast, avoiding memory accesses

– Task partitioning between microengines = packet queues. Inter-microengine data communication - SRAM accesses, performance penalty

– IXP2400 - packet queues avoided, buffer handles shared through next neighbor registers. Performance penalty avoided.

• Memory

– ENP-2505 has 48 MB DRAM and 3 MB SRAM accessible to microengines

– SRAM could accommodate 9K rules of average size. Thus memory was enough for PacketFilter application

– Increase in memory in IXP2400 could benefit simultaneous execution of many memory hungry applications

Page 30: Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of

Shyamal Pandya

Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors

30

Conclusions• Successfully implemented Packet filter core code and TCP

header match extension

• Had to split filtering and forwarding across 2 microengines due to instruction store size limits

• MicroACE software framework was ideal for the design of the packet filter

• Microengine re-tasking complicated by the lack of smooth interface to microengine signals and interrupt handling

• Future work: investigating simultaneous operation of more than one application, more IP Tables extensions to the packet filter.

• Future work: incorporating interface to inter-thread signals and call-backs to MicroACE Framework