18
IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

Embed Size (px)

Citation preview

Page 1: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabricsFebruary 2005

A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

Page 2: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Application in NPU microcode

Network processor

ASICsMerchant siliconGen purpose processors

Virtual machine

PPL application

Virtualized packet processor

Premise: network processors will be a core building block of next-generation networking equipment

• Programmability• Versatility• Performance

Intel and others

But major obstacle is difficulty of programming network processorsAnd code is architecture and model specific

IP Fabrics

Empowering Network ProcessorsEmpowering Network Processors

Very-high-level packet processing language

Virtual machine abstracting NPU details

Built-in functionality for deep packet processing

Value proposition:• Faster time to market• Lower development and lifetime costs• Scaleable to new silicon• Portable to different architectures• Enable a larger community to use NPUs

Page 3: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

The NPU CatchThe NPU Catch

• Application software needs to manage interconnect, memory overlap, caching, etc• C programs still very low level, highly machine dependent

ME(Micro-engine)

256x16

256

128

512(128 x 4)

128

17

266

8

45

5541

8

12

59

41x16

“RunOfthe mill instruc-tions”

Local_CSR inst

Special form of the SRAM inst

CAPinst

MSF inst

PCI inst

CAPinst

Access to the transfer and local CSR registers of any other ME

What makes NPUs so powerful as solutions to networking-systems design is also what makes their software development a significant challenge

Parallelism (microengines, hardware threads)

Massive register resources (in IXP2800, 15,452 software visible registers not counting local mem and CAMs; 25,948 counting these)

Multiple memory types

Small program memory space

OSNo OS underneath

Cipher units, hash, CAMs, rings, signals, etc

Pages of programming architecture quirks, errata

Page 4: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Group1: Policy PATTERNS DATABASE($idslist) FIND(Rr1,0,Fuf)Intruderset: Policy ASSOCIATE NUMBER(10000) SEARCHKEYS(IP_SOURCE) TIMEOUT(10000)Intruders: Policy RECALL SEARCHKEYS(IP_SOURCE) LINKED(Intruderset)Secure1: Policy CRYPTO TRANSFORM(3DES,SHA) TIMEOUT(3000) TUNNEL(10.0.42.32)Diversion: Policy PACKET INSERT(PREP,header_size,0)Rule EQ(TCP_SYN,1) EQ(TCP_RST,1) DROP # Protocol anomalyRule EQ(TCP_SYN,1) EQ(TCP_FIN,1) DROP # Another protocol anomalyRule EQ(IP_SOURCE,MYIPADDR) DROP # Source spoofed packetRule EQ(IP_SOURCE,public) APPLY(Intruders)Rule EQ(ac,0) DROP # Previously detected intruder Rule NE(IP_DEST,MYIPADDR) EQ(ICMP_TYPE,ECHO) DROP # no pings to the insideRule EQ(IP_PROT,ICMP) EQ(IP_MF,1) DROP # fragmented ICMP is DoS attackRule SCAN(”|0D0A5B52504C5D3030320D0A|”) JUMP(found_subseven_trojan)Rule EQ(IP-DEST/24,190.10.10.0) SET(R0,192.68.0.0) ADD(R0,IP_DEST/0.0.0.255) SET(IP_DEST,R0) #Xlate 190.10.10.X to 192.68.0.XRule EQ(IP_DEST/24,boston_gateway) EQ(IP_SOURCE,portland_gateway/24) APPLY(secure1) FORWARD(1)Rule EQ(IP_DEST/24,190.10.10.0) APPLY(Group1 FORWARD(2)

Virtual Machine Approach to NPU SoftwareVirtual Machine Approach to NPU Software

PPL compiler

PPL virtual machine

IPSec VPNBasic firewall

Layer 4 load balance

Layer 7 content switch

Layer 3,4 DoS attacks

TCP offload

Two-way encryption gateway

Dynamic peephole firewall

. . .

Intrusion signature scans

Layer 7 bandwidth monitoring

Layer 7 protocol specific firewall

Dynamic intrusion blocking

SIP proxy/ offload

Encrypted content switch

Packetcable layer 7 traffic management

Content specific filters (e.g., email spam)

Lawful content listening

Content specific DoS attacks

Session Border Controller

NPU

Page 5: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Two RoutesTwo Routes

PPL: A very-high-level functional language to express packet processing Virtual machine on NPU fully exploits parallelism while hiding it PPL also includes very powerful primitives, e.g.,

• Scan packet payload• Match payload to regular expression• Encryption/authentication• Manage connections (e.g., TCP, SIP)• Manage “superpackets”• High-speed multi-pattern matching

PPL LanguagePPL Language

Virtual Machine

N P U??16 microengines

128 hardware threads

640 word local memories

Scratch rings

ALU instructions

A and B register banks

Next neighbor registers

No OS

DRAM transfer registers

SRAM transfer registers

Thread signals

Byte index register

Aligned accesses only

Dispatch loops

Errata

Instruction sequence restrictions

Register scope

Register lifetime

Context arbitration

Processor synchronization

90% of time spent on underlying tools, devices, details10% of time spent on application valueVery specific to NPU model and family

90% of time spent on application valueScaleablePortable

N P U

Inter-instruction timing

Page 6: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

PPL – a Fundamentally Different ApproachPPL – a Fundamentally Different Approach

NPU tools

PPL virtual-machine environment

Tools to help you write and debug microcode. And far removed from the world of packet processing. You still need to understand the NPU’s microcode environment, create the microcode, debug it, maintain it.

Application machine. You think about packet processing and express your application in a very-high-level application language. R&D focus is on the value-add in the application, not the many many details of the NPU.

Time/$ spent on application value

Time/$ spent on underlying tools/devices

Therefore huge benefits in • Time to market• Life cycle software costs• Number of NPU experts needed• Scalability to new silicon (up and down)

Time/$ spent

on applica-tion value

Page 7: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Comer Bump in the Wire ExampleComer Bump in the Wire Example

Write the data-plane code that examines each IP packet to determine if it isTCP and destined for port 80 (HTTP). Count them. And forward all packets.

Define port80counter=”Rg20”Event(0) Rule EQ(IP_PROT,TCP) EQ(L4_DPORT,80) ADD(port80counter,1) Rule FORWARD

Complete PPL program (the only code you write) is

A major undertaking if you sit down to attempt this in an assembly-language or C program.

The closest thing we know about (Agere’s FPL) was 76 FPL lines in Agere’s submission to Comer’s web site, and we found two serious bugs in Agere’s code that don’t exist in the PPL code:

• If a packet is a fragment, the Agere code can mistake it for something with a TCP header• If a packet’s layer 3 or 4 headers are malformed or malicious, behavior is unpredictable

Page 8: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

PPLPPL Powerful, easy to use, functional (not procedural) language Main elements - rules, policies, events

• Rule expression(s) action(s)• Event: rules that are processed together• Policies: major algorithms and state machines

Defines strong concurrency, yet hides all parallelism in the NPU• All rules are evaluated concurrently. The actions of true rules in an event are

processed sequentially.• Events are processed concurrently (i.e., rules in separate events are processed

concurrently).• Multiple instances of the same event also process concurrently.

PolicyPolicy…

EventRuleRule…

EventRuleRule…

EventRuleRule…

EventRuleRule…

EventRule…

Rules apply policies

PPL program

Logical port 82

Logical ports 4-7

Logical ports 0,1

Exceptions

Start up

Page 9: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Example of a RuleExample of a Rule

Rule EQ(IP_DEST/16,iptable(1)) EQ(TCP_SYNONLY,1) APPLY(tcpconn)

Means: If the upper 16 bits of the IP destination address match entry 1 in array iptable, andif the packet is a TCP packet with only the SYN flag set, apply the policy labeled tcpconn

Page 10: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Easy and PowerfulEasy and Powerful

Highly robust – prevents many errors and security holes Layer-2 interfaces are built in

• Ethernet, PoS, ATM, SPI4, CSIX, PCI Many powerful packet-processing elements built in, e.g.,

• Payload scanning (absolute and regular expression)• Automatic connection lookup/tracking (e.g., TCP, SIP)• Content-addressable tables• Rate computation• Encryption/authentication• High-speed, large database, multipattern matching• Header insertion/stripping• Management of, and operations on, superpackets• Interface to non-PPL programs in data-, control-, or mgmt plane

Page 11: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

PPL RulesPPL RulesRule expression expression … action action …

EQ(CX_STATE,ESTABLISHED) Is the packet’s connection state ESTABLISHED?

SCAN(“sync-1.01; andy; I’m just doing my job, nothing personal, sorry”,10)

Does the Mydoom.B signature string appear at offset 10 or beyond in the current packet?

SCAN(re”CALL_ID:.*? [0-9][0-9\-]{9,19}?”) Does the following regular expression occur in the current packet? The string CALL_ID: followed by 0 or more don’t care characters followed by a digit, followed by 9 to 19 digits and/or hyphens.

Expression examples

DROP Drop the current packet

FORWARD Do a layer-3 forwarding of the current packet

APPLY(intruder_list) Apply a policy

COMPUTE(CVB,Rr0q) Convert a character from IPv6 address in the payload to a binary 128-bit value

COMPUTE(SCSM,Re1) Compute an incremental checksum ala RFC1624.

Action examplesValue examples (used in expressions, actions, policies)

FFE0::2 IPv6 address constant

TCP_CSUM Packet field

CONTENT(sindx) Dynamic packet field

Rate_limits(Re4) Array element

PS_CONTENTSIZE Packet state

CX_STATE Connection state

Re0 Register

Intruder_list Statement label

Page 12: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

PPL PoliciesPPL PoliciesASSOCIATERECALLDISASSOCIATE

Define, insert, search, and remove entries from a content-addressable table

CONNECTIONS Defines a state-based table that is searched automatically for each arriving packet

RATE Maintains a time-based rate

QUEUE Defines and operates on (enqueue, dequeue, query) packet queues

CLASSIFY Multi-field, multi-criteria database search, useful for 5/6-tuple classification. Implementation can be mapped to a TCAM.

CONTROL Generic control function. The current function defines periodic (timed) event control

DEFRAG Collects related fragments until all are collected or a reassembly time is exceeded.

CIPHER Protocol-independent “light-weight” encryption / authentication

NEWPACKET Creates a new packet

PACKET Manipulates a packet (e.g., header insertion/stripping)

PATTERNS Heavy duty comparison of packet to a database of patterns. Uses improved Wu-Manber algorithm and Eatherton tree-bitmap algorithm optimized to NPU

MONITOR Defines how packets are monitored

PROGRAM Invokes a function outside of PPL (in IXP implementation can be in XScale, outside of PPL virtual machine in microcode, or remote program over PCI bus)

NEW-SUPERPACKET Creates a new superpacket (set of packets to be treated as one)

SUPERPACKET Performs functions on a superpacket

Page 13: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Complete ExampleComplete ExampleApplication:Examine all packets going to TCP port 80 to see if they are a GET HTTP transaction with a URL ending with ‘redirect.html’ and containing a session cookie. For each that is found, store its IP source address in a table (unless it previously exists in the table). Then forward the packet.

Define myregex = “re “”GET.*?redirect.html[[:space:]]*?HTTP/1.*?Cookie:”””Source_track: Policy ASSOCIATE NUMBER(100000) SEARCHKEYS(IP_SOURCE)Event(0)Rule EQ(IP_PROT,TCP) EQ(L4_DPORT,80) SCAN(myregex) APPLY(Source_track)Rule Forward Stop

This is the complete program – i.e., this is the entirety of what you’d have to write for the data plane of the Intel IXP 2xxx

Page 14: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

PPL DeviceMap StatementPPL DeviceMap Statement

DeviceMap

NPU(2850,1400)

AVAILABLE_PROCESSORS(1,15)

PPL_PROCESSORS(ER(10%),AE(70%))

PACKET_MEM(DRAM,128000)

CONNECTIONS_MEM(DRAM,16000)

ARRAY_MAP(SERVLIST,0,ext_$$pdkserv)

LINK(0,inout,GE_ON_SPI,0,1518,0,0, 0,0,IXF1010,0)

LINK(156,out,PCI)

PROG(excep_recorder,CONTROL)

How one describes their hardware to the virtual machine and controls configuration and mappings.

NPU is IXP2850 with clock speed of 1400 MHz

Microengines 1-15 are available to PPL virtual machine (meaning 0 is being reserved for something else)

Follow suggestion of allocating 10% of microengine cycles to Ethernet receive, 70% to PPL action processing, and best use of remaining 20%

Allow 128 MB for packet memory in DRAM.

Allow 16 MB for connection tables in DRAM.

For the array SERVLIST in the PPL program, physically map it to control-plane symbol ext_$$pdkserv)

Define a network interface as logical port 0; it is GigE SPI-4 port 0 and port 0 in MAC IXF1010

Define logical port 156 as an output only port over PXD

Define a control-plane interface name to which the PPL PROGRAM policy can invoke

Page 15: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

PPL program

Software on XScale control plane

Custom or customer NPU

microcode

Software on an IA host processor

• Send packet to PPL event• Send packet to anywhere PPL program can• Invoke PPL event (RPC)

• FORWARD packet• PROGRAM to invoke

remote program

• FORWARD packet• PROGRAM to invoke

XScale program (RPC)• Share memory

• Share memory• Enqueue on PPL VM input ring

• Enqueue on a ring• Share memory

• Share memory• Enqueue on PPL VM input ring

Interfacing to Outside ProgramsInterfacing to Outside Programs

Intel Portability Framework and

NPF APIs

Page 16: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

PPL SummaryPPL Summary Powerful, easy to use, functional (not procedural) language Main elements are rules, policies, events Defines strong concurrency, yet hides all parallelism in the NPU Highly robust – prevents many errors and security holes Many powerful packet-processing elements built in, e.g.,

• Payload scanning (absolute and regular expression)• Automatic connection lookup/tracking (e.g., TCP, SIP)• Content-addressable tables• Rate computation• Encryption/authentication• High-speed, large database, multipattern matching• Header insertion/stripping• Management of, and operations on, superpackets• Interface to non-PPL programs in data-, control-, or mgmt plane

Page 17: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Complete Software SolutionComplete Software Solution

NPU data-plane microengines XScale

PPL virtual machine

PPL system initialization, PPL

debug, logging, stats

Control plane interfaces (ie,NPF

APIs)

PPL applications

“Pentium”

PXD high-speed packet interface

Windows or Linux computer

PPL compiler

PPL debug GUI

PPL transactor

Be running in, literally, days No need to use Intel SDK, Intel microcode, learn the IXP programming

details, etc unless you want to write low-level microcode

Extensions for high-speed multi-pattern searching, IPSec, superpackets, PXD, etc

Receivers/transmitters for Ethernet, CSIX, PCI, POS/PPP, …

e.g., signature analysis, IPv4/v6 translation, layer 7 content switch, encryption gateway, … Customer control

plane software

Customer PPL

Customer mgmt plane software

Page 18: IP Fabrics February 2005 A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye

IPFabrics

Translated to Time and CostTranslated to Time and Cost

0 5 10 15 20 25

0 1 2 3 4 5 6

0 1 2 3 4 5 6

$ million

$ million

Months

Develop NPU hardware and data-plane software from scratch

Deploy off-the-shelf NPU hardware and PPL for data-plane software

Functional, measurable, live prototype available

Time to Market

NPU Software Development Cost

NPU Software Life-Cycle Cost*

* includes maintenance, product enhancement, one port to different NPU model

Subscription and royalty