Upload
roderick-lynch
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Introduction Introduction
2/9 - 2005
INF5061:Multimedia data communication using network processors
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Overview
Course topic and scope
Background software-based network systems challenges and new requirements evolution of network processors
(Very) short overview of some example network processors
INF5061:The Course
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Lecturers
Carsten Griwodzemail: griff @ ifi
Pål Halvorsenemail: paalh @ ifi
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
About INF5061: Topic & Scope
Content: The course gives … … an overview of network processor cards
(architectures and use)
… an introduction of how to program Intel IXP network processors
… some ideas of how to use network processors
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
About INF5061: Topic & Scope Lab-assignments:
An important part of the course are lab-assignments where the students should make a program for the Intel IXP2400 network processor
1. wwpingbump – download and run
2. protocol statistics – extend the wwpingbump to give processor, interface and protocol statistics
3. packet bridge with ARP support – forward packet to correct interface (of 3 available)
4. transparent load balancer – balance load and forward packets to the right machine in a cluster of two with same IP address
5. HTTP protocol translator – add support in the transparent load balancer for HTTP streaming having an RTSP/RTP server
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
About INF5061: Exam (10sp) Prerequisite – mandatory assignments:
lab assignment 2: protocol statistics presentation of a relevant paper
Graded assignments: lab assignment 4: transparent load balancer
deliver code short demo/explanation of code (to lecturers only)
lab assignment 5: HTTP protocol translator deliver code and a short report present and demonstrate to the class at the end of the course
Final exam: oral exam (???/12-2005) selected chapters from the Comer book and IXP
documentation lecture slides (including slides from presented papers) content of lab assignments
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
About INF5060: “Exam” (5sp)
Mandatory assignment:
lab assignment 5: HTTP protocol translator deliver code and a short report present and demonstrate to the class at the end of the
course
approved assignment gives a “passed” course (INF5060)
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Available Resources Book:
Douglas E. Comer: “Network Systems Design using Network Processors – Intel IXP2xxx Version”, Pearson Prentice Hall, 2004
Other resources will be placed at http://www.ifi.uio.no/~paalh/INF5061 Login: inf5061 Password: ixp
Manuals for IXP2400: …/~paalh/INF5061/IXP2400
Code: …/~paalh/INF5061/code
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Disclaimer In the field of network processors, I am a tyro
Definition: Tyro \Ty’ro\, n.; pl. Tyros. A beginner in learning; one who is in the rudiments of any branch of study; a person imperfectly acquainted with a subject; a novice
Then, by definition, in the field of network processors, we are all tyros
In our defense, when it comes to network processors, everyone is a tyro
Background and Motivation
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Uses conventional, shared hardware (e.g., a PC)
Software runs the entire system allocates memory controls I/O devices performs all protocol processing
First generation network systems:
Software-Based Network System
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Review of General Data Path on Conventional Computer Hardware Architectures
communicationsystem
application
user space
kernel space
sending:
communicationsystem
application
receiving:
communicationsystem
application
forwarding:
transport (TCP/UDP)
network(IP)
link
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Review of Conventional Computer Hardware Architectures
CPU socket
RDRAM connectors
PCI connectors
I/O Controller Hub
Memory Controller Hub
Intel D850MD Motherboard - Intel Hub Architecture (850 Chipset) :
system bus
RDRAM interface
hub interface
PCIbus
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Pentium 4Processor
registers
cache(s)
Forwarding Example for an Intermediate Node: Intel Hub Architecture
I/Ocontroller
hub
memorycontroller
hub
RDRAM
RDRAM
RDRAM
RDRAM
PCI slots
PCI slots
PCI slots
system bus(64-bit, 400/533 MHz)
hub interface(four 8-bit, 66 MHz)
PCI bus(32-bit, 33 MHz)
RAM interface(two 64-bit, 200 MHz)
network card
communication system
application
communicationsystem
applicationuser space
kernel spaceNote:- one single average MPEG-II DVD stream require ~330-660 packets per second of 1500 Bytes (4-8 Mbps)
- then use smaller packets, add concurrent clients, other applications, …
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Main Packet Processing Costs Copying: used when moving a packet from one memory
location to another expensive (proportional to packet size) should be avoided whenever possible (use pointers)
Checksuming: used to detect errors expensive (proportional to packet size) transport layer: payload + header network layer: header
Fragmentation/reassembly: needed when packet is larger than smallest MTU generate headers + header checksum receiving many small data fragments
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Question: Which is growing faster?
network bandwidth processing power
Note: if network bandwidth is growing faster CPU may be the bottleneck need special-purpose hardware conventional hardware will become irrelevant
Note: if processing power is growing faster no problems with processing network/busses will be bottlenecks
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Growth Of TechnologiesMbps
year
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Packet Rates and Software Processing
Packet rates (packets per second):
Packet processing (MIPS, assuming 5K instructions per packet):
the Comer book uses 10K instructions as an upper bound per packet
it varies according to which protocols are used, implementation, data size, etc.
more if moved through a fire wall
engineering rule: 1GHz general purpose CPU = 1Gbps network data rate
Note; this is only processing – time must be added to handle interrupts and move data into memory
Thus, software running on a general-purpose processor is insufficient to handle high-speed networks because the aggregate packet rate exceeds the capabilities of the CPU
64 B 1500 B
10BASE-T (10 Mbps)
19.531
833
1000BASE-T (1 Gbps)
1.953.125 83.333
OC-192 (9.95 Gbps)
19.439.453
829.416
64 B 1500 B
10BASE-T (10 Mbps)
97,65 4,17
1000BASE-T (1 Gbps)
9.765,63 416,67
OC-192 (9.95 Gbps)
97.197,274.147,0
8
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
The Network System Challenges Data rates in general keep increasing
Network rate > CPU rate > memory, busses and I/O interfaces
Protocols and applications keep evolving
System design, implementation and testing is time consuming and expensive
Systems often contain errors
Special-purpose hardware (ASIC) designed for one type of system can usually not be reused
Host machine must inspect all incoming packets
… Challenge: find ways to improve the design and
manufacture of complex networking systems
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Statement of Hope If there is hope, it lies in …
1990: … faster CPUs
1995: … the application specific integrated circuit (ASIC) designers
2002: … the programmers!
Programmability we need a programmable device with more capability than a
conventional CPU key to low-cost hardware for next generation network systems compared to ASIC designs, it is more flexible, easier and faster
to upgrade, and thus, less expensive
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
First Generation General idea: To optimize
computation, move operations that account for the most CPU time from software into hardware
Onboard… address recognition and
filtering onboard buffering DMA buffer and operation chaining
Add hardware to NIC off-the-shelf chips for layer 2 ASICs for layer 3
Allows each NIC to operate independently effectively a multiprocessor total processing power
increased dramatically
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Second Generation (early 1990s) Designed for greater scale Decentralized architecture
additional computational power on each NIC
NIC implements classification and forwarding
High-speed internal interconnection mechanism interconnects NICs provides fast data path
Multiple network interfaces High-speed hardware
interconnects NICs General-purpose processor
only handles exceptions Sufficient for medium speed
interfaces (100 Mbps)
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Third Generation (late 1990s) Almost all packet processing
off-loaded from CPU Special-purpose ASICs handle
lower layer functions Embedded (RISC) processor
handles layer 4 CPU only handles low-demand
processing
Functionality partitioned further
Additional hardware on each NIC
Onboard… classification forwarding traffic policing monitoring and statistics …
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Third Generation (late 1990s) Enough, are third generation sufficient??
Almost!! But not quite! ;-(
What’s the problem?
high cost
long time to market
difficult to test
expensive and time-consuming to change even trivial changes require silicon respin 18-20 month development cycle
little reuse across products and versions
require in-house expertise (ASIC designers)
…
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Network Processors: The Idea in a Nutshell
Devise new hardware building blocks, but make them programmable
Include support for protocol processing and I/O General-purpose processor(s) for control tasks Special-purpose processor(s) for packet processing and table
lookup
Include functional units for tasks such as checksum computation, hashing, …
Integrate as much as possible onto one chip
Call the result a network processor
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Review of Conventional Computer Hardware Architectures
CPU socket
RDRAM connectors
PCI connectors
I/O Controller Hub
Memory Controller Hub
Intel D850MD Motherboard - Intel Hub Architecture (850 Chipset) :
system bus
RDRAM interface
hub interface
PCIbus
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Network Processors: Main Idea
Traditional system:- slow- resource demanding- shared with other operations
Network processors:- a computer within the computer- special, programmable hardware- offloads host resources
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Designing a Network Processor Depends on
operations network processor will perform role of network processor in overall system
Goals generality: sufficient for all protocols, all protocol
processing tasks and all possible networks high speed: scale to high bit rates and high packet
rates
Key point: A network processor is not designed to process a specific protocol or part of a protocol. Instead, designers seek a minimal set of instructions that are sufficient to handle an arbitrary protocol processing task at high speed
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Where to Place Network Processors
software onconventional
prosessor
networkprocessors
ASICdesigns
performance
cost
Thus, network processors is somewhere in the middle
Goal: increase performance and reduce costs
Increase performance: known issues: – must partition packet processing into separate functions – to achieve highest speed, must handle each function with separate hardware unknown issues: – which functions to choose – what hardware building blocks to use – how to interconnect building blocks
Decrease costs: Economics driving a gold rush – NPs will dramatically lower production costs for network systems – good NP designs worth lots of $$
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Explosion of Commercial Products 1990 2000: network processors transformed from
interesting curiosity to mainstream product used to reduce both overall costs and time to market
2002: over 30 vendors with a vide range of architectures
e.g., Multi-Chip Pipeline (Agere) Augmented RISC Processor (Alchemy) Embedded Processor Plus Coprocessors (Applied Micro Circuit
Corporation) Pipeline of Homogeneous Processors (Cisco) Pipeline of Heterogeneous Processors (EZchip) Configurable Instruction Set Processors (Cognigine) Extensive And Diverse Processors (IBM) Flexible RISC Plus Coprocessors (Motorola) Internet Exchange Processor (Intel) …
Agere PayloadPlusPayloadPlus:A Short Overview
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Agere PayloadPlus (APP) Agere PayloadPlus (APP)
consists of both programmable hardware and software consists of both data and control planes (i.e., slow and
fast plane)
APP defines HW architectures, SW mechanisms, interconnection mechanisms and interfaces, BUT does not specify how to implement them.
Several versions of APP exist differing in the number and types of functional units, degree of parallelism and internal bandwidth (2. generation: 5 models)
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP Conceptual Pipeline
Classifier extract packets from ingress classify packet send statistics to state engine reassemble blocks pass packet to forwarder
together with classification decision
State engine initiate, configure and control
classifier and traffic manager receives control from classifier update statistics (e.g., packet
count) check packets against profiles
(and inform classifier)
Forwarder get packet from classifier perform traffic shaping and
management fragment packet (if necessary) modify headers (if necessary)
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Chip
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Chip
Memory interfaces:- two types of physical memory
- fast cycle RAM (FCRAM) for fast memory accesses- double data rate SRAM (DDR-SRAM) for high throughput
- the different memory types are usually used like this:
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Chip
Media interfaces:- several to form fast data paths- two external connections:
- cell-oriented (ATM)- packet-oriented (Ethernet)
-
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Chip
PCI bus interfaces:- allows communication with host CPU- mainly to control the whole operation
Scheduling interface interfaces:- an external scheduling interface- external logic can use information about queues
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Chip
Coprocessor interfaces:- APP550 should be able to process a packet- BUT, to accommodate special cases, e.g., adding additional headers a co-processor interface is provided
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Chip
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Chip
Pattern Processing Engine- patterns specified by programmer- programmable using a special high-level language- only pattern matching instructions- parallelism by hardware using multiple copies and several sets of variables- access to different memories
State Engine- gather information (statistics) for scheduling - verify flow within bounds- provide an interface to the host- configure and control other functional units
Traffic Manager- schedule packets and shape traffic flow- programmable via scripts- sends packets to output interface- according to implemented policy:
- discard packets - choose queue
Packet (protocol data unit) assembler- collect all blocks of a frame- not programmable
Stream Editor (SED)- two parallel engines- modify outgoing packets (e.g., checksum, TTL, …)- configurable, but not programmable
Reorder Buffer Manager - transfers data between classifier and traffic manager - ensure packet order due to parallelism and variable processing time in the pattern processing-
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
APP550 Full Duplex Clock rate for APP550 is 233 MHz One chip cannot manage packet at wire speed in
both directions – often two in parallel (one each direction)
all features needed in both direction? classification only one direction
checks outgoing packets and enqueues using special queue
Intel IXP1200 / 2400IXP1200 / 2400:A Short Overview
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXA: Internet Exchange Architecture
IXA is a broad term to describe the Intel network architecture (HW & SW, control- & data plane)
IXP: Internet Exchange Processor processor that implements
IXA
IXP1200 is the first IXP chip (4 versions)
IXP2xxx has now replaced the first version
IXP1200 basic features 1 embedded 232 MHz
StrongARM 6 packet 232 MHz µengines onboard memory 4 x 100 Mbps Ethernet ports multiple, independent busses low-speed serial interface interfaces for external memory
and I/O busses …
IXP2400 basic features 1 embedded 600 MHz XScale 8 packet 600 MHz µengines 3 x 1 Gbps Ethernet ports …
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP1200 Architecture
Serial line:- connects to the RISC- intended for control and management- rate 38 Kbps
PCI bus:- allow IXP to connect to I/O devices - enable use of host CPU- rate 2.2 Gbps
IX (Intel eXchange) bus:- enable higher rates compared to PCI- form fast path (IXP and high-speed interfaces)- interface to other IXP cards- 4.4 Gbps
SRAM bus:- shared bus (several external units)- usually control rather than data- rate 3.71 Gbps
SDRAM bus:- provide access to external SDRAM memory used to store packets- can also pass addresses, control/store operations, etc.- rate 7.42 Gbps
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP1200 ArchitectureRISC processor:- StrongARM running Linux- control, higher layer protocols and exceptions- 232 MHz
Microengines:- low-level devices with limited set of instructions- transfers between memory devices - packet processing- 232 MHz
Access units:- coordinate access to external units
Scratchpad:- on-chip memory- used for IPC and synchronization
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP1200 Processor Hierarchy
General-Purpose Processor:- used for control and management- running general applications
RISC processor:- chip configuration interface (serial line)- control, higher layer protocols and exceptions
I/O processors (microengines):- transfers between memory devices - packet processing
Coprocessors:- real-time clock and timers- IX bus controller- hashing unit- ...
Physical interface processors:- implement layer 1 & 2 processing
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP1200 Memory Hierarchy
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP1200 Memory Hierarchy
Different memory types… …are organized into different addressable data units (words or longwords)…have different access times…connected to different bussesTherefore, to achieve optimal performance, programmers must understand the organization and allocate items from the appropriate type
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP Performance Improvement: Forwarding
The forwarding latency improvement itself may only be relevant to very time-sensitive interactive applications
Offloading at least equally important
Linux 2.4 vs. IXP 1200 Intel P4 host machine
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP1200 IXP2400
SRAM
FLASH
MEMORYMAPPED
I/O
DRAM
SRAMaccess
SDRAMaccess
SCRATCHmemory
PCIaccess
IXaccess
EmbeddedRISK CPU
(StrongARM)
PCI bus
IX bus
DRAM bus
SRAM bus
microengine 2
microengine 1
microengine 5
microengine 4
microengine 3
microengine 6
multipleindependent
internalbuses
IXP1200
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP2400
IXP2400 Architecture
microengine 8
SRAM
coprocessor
FLASH
DRAM
SRAMaccess
SDRAMaccess
SCRATCHmemory
PCIaccess
MSFaccess
EmbeddedRISK CPU(XScale)
PCI bus
receive bus
DRAM bus
SRAM bus
microengine 2
microengine 1
microengine 5
microengine 4
microengine 3
multipleindependent
internalbusesslowport
access
…
transmit bus
RISC processor:- StrongArm XScale- 233 MHz 600 MHz
Microengines- 6 8- 233 MHz 600 MHz
Media Switch Fabric- forms fast path for transfers- interconnect for several IXP2xxx
Receive/transmit buses- shared bus separate busses
Slowport- shared inteface to external units- used for FlashRom during bootstrap
Coprocessors- hash unit- 4 timers- general purpose I/O pins- external JTAG connections- several bulk cyphers (IXP2850 only)- checksum (IXP2850 only)- …
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP2400 Architecture Memory
generally more of everything
generally larger gap between CPUs and memory access in terms of cycles
local memory on each microengine saving temporary results private per packet processor small (2560 bytes) low latency (one cycle) accessed through special registers
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP2400 Packet Processing
microengine 8
SRAM
coprocessor
FLASH
DRAM
SRAMaccess
SDRAMaccess
SCRATCHmemory
PCIaccess
MSFaccess
EmbeddedRISK CPU(XScale)
PCI bus
receive bus
DRAM bus
SRAM bus
microengine 2
microengine 1
microengine 5
microengine 4
microengine 3
multipleindependent
internalbusesslowport
access
…
transmit bus
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
IXP2400 Use
Easier to use and understand
Pure Linux environment (except if workbench)
More stable
Faster to reset
2005 Carsten Griwodz & Pål HalvorsenINF5061 – multimedia data communication using network processors
Summary The network challenges are manyChallenge: find ways to improve the design and
manufacture of complex networking systems
Hope (2002 version) lies in the programmers and network processors
We will use Intel IXP2400 as an example which offers… …embedded processor plus parallel packet processors …connections to external memories and buses
Next time: how to start programming these monsters