Upload
jonathan-bennett
View
215
Download
3
Embed Size (px)
Citation preview
IPFabricsFebruary 2005
A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye
IPFabrics
Application in NPU microcode
Network processor
ASICsMerchant siliconGen purpose processors
Virtual machine
PPL application
Virtualized packet processor
Premise: network processors will be a core building block of next-generation networking equipment
• Programmability• Versatility• Performance
Intel and others
But major obstacle is difficulty of programming network processorsAnd code is architecture and model specific
IP Fabrics
Empowering Network ProcessorsEmpowering Network Processors
Very-high-level packet processing language
Virtual machine abstracting NPU details
Built-in functionality for deep packet processing
Value proposition:• Faster time to market• Lower development and lifetime costs• Scaleable to new silicon• Portable to different architectures• Enable a larger community to use NPUs
IPFabrics
The NPU CatchThe NPU Catch
• Application software needs to manage interconnect, memory overlap, caching, etc• C programs still very low level, highly machine dependent
ME(Micro-engine)
256x16
256
128
512(128 x 4)
128
17
266
8
45
5541
8
12
59
41x16
“RunOfthe mill instruc-tions”
Local_CSR inst
Special form of the SRAM inst
CAPinst
MSF inst
PCI inst
CAPinst
Access to the transfer and local CSR registers of any other ME
What makes NPUs so powerful as solutions to networking-systems design is also what makes their software development a significant challenge
Parallelism (microengines, hardware threads)
Massive register resources (in IXP2800, 15,452 software visible registers not counting local mem and CAMs; 25,948 counting these)
Multiple memory types
Small program memory space
OSNo OS underneath
Cipher units, hash, CAMs, rings, signals, etc
Pages of programming architecture quirks, errata
IPFabrics
Group1: Policy PATTERNS DATABASE($idslist) FIND(Rr1,0,Fuf)Intruderset: Policy ASSOCIATE NUMBER(10000) SEARCHKEYS(IP_SOURCE) TIMEOUT(10000)Intruders: Policy RECALL SEARCHKEYS(IP_SOURCE) LINKED(Intruderset)Secure1: Policy CRYPTO TRANSFORM(3DES,SHA) TIMEOUT(3000) TUNNEL(10.0.42.32)Diversion: Policy PACKET INSERT(PREP,header_size,0)Rule EQ(TCP_SYN,1) EQ(TCP_RST,1) DROP # Protocol anomalyRule EQ(TCP_SYN,1) EQ(TCP_FIN,1) DROP # Another protocol anomalyRule EQ(IP_SOURCE,MYIPADDR) DROP # Source spoofed packetRule EQ(IP_SOURCE,public) APPLY(Intruders)Rule EQ(ac,0) DROP # Previously detected intruder Rule NE(IP_DEST,MYIPADDR) EQ(ICMP_TYPE,ECHO) DROP # no pings to the insideRule EQ(IP_PROT,ICMP) EQ(IP_MF,1) DROP # fragmented ICMP is DoS attackRule SCAN(”|0D0A5B52504C5D3030320D0A|”) JUMP(found_subseven_trojan)Rule EQ(IP-DEST/24,190.10.10.0) SET(R0,192.68.0.0) ADD(R0,IP_DEST/0.0.0.255) SET(IP_DEST,R0) #Xlate 190.10.10.X to 192.68.0.XRule EQ(IP_DEST/24,boston_gateway) EQ(IP_SOURCE,portland_gateway/24) APPLY(secure1) FORWARD(1)Rule EQ(IP_DEST/24,190.10.10.0) APPLY(Group1 FORWARD(2)
Virtual Machine Approach to NPU SoftwareVirtual Machine Approach to NPU Software
PPL compiler
PPL virtual machine
IPSec VPNBasic firewall
Layer 4 load balance
Layer 7 content switch
Layer 3,4 DoS attacks
TCP offload
Two-way encryption gateway
Dynamic peephole firewall
. . .
Intrusion signature scans
Layer 7 bandwidth monitoring
Layer 7 protocol specific firewall
Dynamic intrusion blocking
SIP proxy/ offload
Encrypted content switch
Packetcable layer 7 traffic management
Content specific filters (e.g., email spam)
Lawful content listening
Content specific DoS attacks
Session Border Controller
NPU
IPFabrics
Two RoutesTwo Routes
PPL: A very-high-level functional language to express packet processing Virtual machine on NPU fully exploits parallelism while hiding it PPL also includes very powerful primitives, e.g.,
• Scan packet payload• Match payload to regular expression• Encryption/authentication• Manage connections (e.g., TCP, SIP)• Manage “superpackets”• High-speed multi-pattern matching
PPL LanguagePPL Language
Virtual Machine
N P U??16 microengines
128 hardware threads
640 word local memories
Scratch rings
ALU instructions
A and B register banks
Next neighbor registers
No OS
DRAM transfer registers
SRAM transfer registers
Thread signals
Byte index register
Aligned accesses only
Dispatch loops
Errata
Instruction sequence restrictions
Register scope
Register lifetime
Context arbitration
Processor synchronization
90% of time spent on underlying tools, devices, details10% of time spent on application valueVery specific to NPU model and family
90% of time spent on application valueScaleablePortable
N P U
Inter-instruction timing
IPFabrics
PPL – a Fundamentally Different ApproachPPL – a Fundamentally Different Approach
NPU tools
PPL virtual-machine environment
Tools to help you write and debug microcode. And far removed from the world of packet processing. You still need to understand the NPU’s microcode environment, create the microcode, debug it, maintain it.
Application machine. You think about packet processing and express your application in a very-high-level application language. R&D focus is on the value-add in the application, not the many many details of the NPU.
Time/$ spent on application value
Time/$ spent on underlying tools/devices
Therefore huge benefits in • Time to market• Life cycle software costs• Number of NPU experts needed• Scalability to new silicon (up and down)
Time/$ spent
on applica-tion value
IPFabrics
Comer Bump in the Wire ExampleComer Bump in the Wire Example
Write the data-plane code that examines each IP packet to determine if it isTCP and destined for port 80 (HTTP). Count them. And forward all packets.
Define port80counter=”Rg20”Event(0) Rule EQ(IP_PROT,TCP) EQ(L4_DPORT,80) ADD(port80counter,1) Rule FORWARD
Complete PPL program (the only code you write) is
A major undertaking if you sit down to attempt this in an assembly-language or C program.
The closest thing we know about (Agere’s FPL) was 76 FPL lines in Agere’s submission to Comer’s web site, and we found two serious bugs in Agere’s code that don’t exist in the PPL code:
• If a packet is a fragment, the Agere code can mistake it for something with a TCP header• If a packet’s layer 3 or 4 headers are malformed or malicious, behavior is unpredictable
IPFabrics
PPLPPL Powerful, easy to use, functional (not procedural) language Main elements - rules, policies, events
• Rule expression(s) action(s)• Event: rules that are processed together• Policies: major algorithms and state machines
Defines strong concurrency, yet hides all parallelism in the NPU• All rules are evaluated concurrently. The actions of true rules in an event are
processed sequentially.• Events are processed concurrently (i.e., rules in separate events are processed
concurrently).• Multiple instances of the same event also process concurrently.
PolicyPolicy…
EventRuleRule…
EventRuleRule…
EventRuleRule…
EventRuleRule…
EventRule…
Rules apply policies
PPL program
Logical port 82
Logical ports 4-7
Logical ports 0,1
Exceptions
Start up
IPFabrics
Example of a RuleExample of a Rule
Rule EQ(IP_DEST/16,iptable(1)) EQ(TCP_SYNONLY,1) APPLY(tcpconn)
Means: If the upper 16 bits of the IP destination address match entry 1 in array iptable, andif the packet is a TCP packet with only the SYN flag set, apply the policy labeled tcpconn
IPFabrics
Easy and PowerfulEasy and Powerful
Highly robust – prevents many errors and security holes Layer-2 interfaces are built in
• Ethernet, PoS, ATM, SPI4, CSIX, PCI Many powerful packet-processing elements built in, e.g.,
• Payload scanning (absolute and regular expression)• Automatic connection lookup/tracking (e.g., TCP, SIP)• Content-addressable tables• Rate computation• Encryption/authentication• High-speed, large database, multipattern matching• Header insertion/stripping• Management of, and operations on, superpackets• Interface to non-PPL programs in data-, control-, or mgmt plane
IPFabrics
PPL RulesPPL RulesRule expression expression … action action …
EQ(CX_STATE,ESTABLISHED) Is the packet’s connection state ESTABLISHED?
SCAN(“sync-1.01; andy; I’m just doing my job, nothing personal, sorry”,10)
Does the Mydoom.B signature string appear at offset 10 or beyond in the current packet?
SCAN(re”CALL_ID:.*? [0-9][0-9\-]{9,19}?”) Does the following regular expression occur in the current packet? The string CALL_ID: followed by 0 or more don’t care characters followed by a digit, followed by 9 to 19 digits and/or hyphens.
Expression examples
DROP Drop the current packet
FORWARD Do a layer-3 forwarding of the current packet
APPLY(intruder_list) Apply a policy
COMPUTE(CVB,Rr0q) Convert a character from IPv6 address in the payload to a binary 128-bit value
COMPUTE(SCSM,Re1) Compute an incremental checksum ala RFC1624.
Action examplesValue examples (used in expressions, actions, policies)
FFE0::2 IPv6 address constant
TCP_CSUM Packet field
CONTENT(sindx) Dynamic packet field
Rate_limits(Re4) Array element
PS_CONTENTSIZE Packet state
CX_STATE Connection state
Re0 Register
Intruder_list Statement label
IPFabrics
PPL PoliciesPPL PoliciesASSOCIATERECALLDISASSOCIATE
Define, insert, search, and remove entries from a content-addressable table
CONNECTIONS Defines a state-based table that is searched automatically for each arriving packet
RATE Maintains a time-based rate
QUEUE Defines and operates on (enqueue, dequeue, query) packet queues
CLASSIFY Multi-field, multi-criteria database search, useful for 5/6-tuple classification. Implementation can be mapped to a TCAM.
CONTROL Generic control function. The current function defines periodic (timed) event control
DEFRAG Collects related fragments until all are collected or a reassembly time is exceeded.
CIPHER Protocol-independent “light-weight” encryption / authentication
NEWPACKET Creates a new packet
PACKET Manipulates a packet (e.g., header insertion/stripping)
PATTERNS Heavy duty comparison of packet to a database of patterns. Uses improved Wu-Manber algorithm and Eatherton tree-bitmap algorithm optimized to NPU
MONITOR Defines how packets are monitored
PROGRAM Invokes a function outside of PPL (in IXP implementation can be in XScale, outside of PPL virtual machine in microcode, or remote program over PCI bus)
NEW-SUPERPACKET Creates a new superpacket (set of packets to be treated as one)
SUPERPACKET Performs functions on a superpacket
IPFabrics
Complete ExampleComplete ExampleApplication:Examine all packets going to TCP port 80 to see if they are a GET HTTP transaction with a URL ending with ‘redirect.html’ and containing a session cookie. For each that is found, store its IP source address in a table (unless it previously exists in the table). Then forward the packet.
Define myregex = “re “”GET.*?redirect.html[[:space:]]*?HTTP/1.*?Cookie:”””Source_track: Policy ASSOCIATE NUMBER(100000) SEARCHKEYS(IP_SOURCE)Event(0)Rule EQ(IP_PROT,TCP) EQ(L4_DPORT,80) SCAN(myregex) APPLY(Source_track)Rule Forward Stop
This is the complete program – i.e., this is the entirety of what you’d have to write for the data plane of the Intel IXP 2xxx
IPFabrics
PPL DeviceMap StatementPPL DeviceMap Statement
DeviceMap
NPU(2850,1400)
AVAILABLE_PROCESSORS(1,15)
PPL_PROCESSORS(ER(10%),AE(70%))
PACKET_MEM(DRAM,128000)
CONNECTIONS_MEM(DRAM,16000)
ARRAY_MAP(SERVLIST,0,ext_$$pdkserv)
LINK(0,inout,GE_ON_SPI,0,1518,0,0, 0,0,IXF1010,0)
LINK(156,out,PCI)
PROG(excep_recorder,CONTROL)
How one describes their hardware to the virtual machine and controls configuration and mappings.
NPU is IXP2850 with clock speed of 1400 MHz
Microengines 1-15 are available to PPL virtual machine (meaning 0 is being reserved for something else)
Follow suggestion of allocating 10% of microengine cycles to Ethernet receive, 70% to PPL action processing, and best use of remaining 20%
Allow 128 MB for packet memory in DRAM.
Allow 16 MB for connection tables in DRAM.
For the array SERVLIST in the PPL program, physically map it to control-plane symbol ext_$$pdkserv)
Define a network interface as logical port 0; it is GigE SPI-4 port 0 and port 0 in MAC IXF1010
Define logical port 156 as an output only port over PXD
Define a control-plane interface name to which the PPL PROGRAM policy can invoke
IPFabrics
PPL program
Software on XScale control plane
Custom or customer NPU
microcode
Software on an IA host processor
• Send packet to PPL event• Send packet to anywhere PPL program can• Invoke PPL event (RPC)
• FORWARD packet• PROGRAM to invoke
remote program
• FORWARD packet• PROGRAM to invoke
XScale program (RPC)• Share memory
• Share memory• Enqueue on PPL VM input ring
• Enqueue on a ring• Share memory
• Share memory• Enqueue on PPL VM input ring
Interfacing to Outside ProgramsInterfacing to Outside Programs
Intel Portability Framework and
NPF APIs
IPFabrics
PPL SummaryPPL Summary Powerful, easy to use, functional (not procedural) language Main elements are rules, policies, events Defines strong concurrency, yet hides all parallelism in the NPU Highly robust – prevents many errors and security holes Many powerful packet-processing elements built in, e.g.,
• Payload scanning (absolute and regular expression)• Automatic connection lookup/tracking (e.g., TCP, SIP)• Content-addressable tables• Rate computation• Encryption/authentication• High-speed, large database, multipattern matching• Header insertion/stripping• Management of, and operations on, superpackets• Interface to non-PPL programs in data-, control-, or mgmt plane
IPFabrics
Complete Software SolutionComplete Software Solution
NPU data-plane microengines XScale
PPL virtual machine
PPL system initialization, PPL
debug, logging, stats
Control plane interfaces (ie,NPF
APIs)
PPL applications
“Pentium”
PXD high-speed packet interface
Windows or Linux computer
PPL compiler
PPL debug GUI
PPL transactor
Be running in, literally, days No need to use Intel SDK, Intel microcode, learn the IXP programming
details, etc unless you want to write low-level microcode
Extensions for high-speed multi-pattern searching, IPSec, superpackets, PXD, etc
Receivers/transmitters for Ethernet, CSIX, PCI, POS/PPP, …
e.g., signature analysis, IPv4/v6 translation, layer 7 content switch, encryption gateway, … Customer control
plane software
Customer PPL
Customer mgmt plane software
IPFabrics
Translated to Time and CostTranslated to Time and Cost
0 5 10 15 20 25
0 1 2 3 4 5 6
0 1 2 3 4 5 6
$ million
$ million
Months
Develop NPU hardware and data-plane software from scratch
Deploy off-the-shelf NPU hardware and PPL for data-plane software
Functional, measurable, live prototype available
Time to Market
NPU Software Development Cost
NPU Software Life-Cycle Cost*
* includes maintenance, product enhancement, one port to different NPU model
Subscription and royalty