Upload
andrea-rice
View
218
Download
2
Embed Size (px)
Citation preview
WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
[email protected] http://www.arl.wustl.edu/~fredk
(SPC) Port-Level Processing:the MSR Kernel
Fred KuhnsWashington University
Applied Research Laboratory
2WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Overview• Introduction to hardware environment
• APIC core processing and buffer management
• Overview of SPC kernel software architecture and processing steps
• Plugin environment and filters
• Command Facility
3WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
ControlProcessor
Switch Fabric
AT
M S
wit
ch C
ore
Por
t P
roce
ssor
s
FPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
PFPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
P
Line Cards (link interfaces)
Port Processors: SPC and/or FPX
4WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
APIC
IPClassifier
DQModule
NID
X.1
Z.2
shim
Act
ive
pro
cess
ing
SPC FPX
Flow Control
Shim contains results of classification step
Using Both and FPX and SPC
5WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Focus on SPC as Port Processor
ControlProcessor
Switch Fabric
. . .Flow/Route
Lookup
Dist. Q. Ctl.Dist. Q. Ctl.
OutputPortProc.
FlowLookup
InputPortProc.
Flow/RouteLookup
Dist. Q. Ctl.Dist. Q. Ctl.
FlowLookup
SPC SPC
6WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
The SPC: an Embedded Processor
Switch Interface
Link Interface
Serial Ports
APIC CPU Module
PCI Bus
System FPGA
DRAM
7WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Typical Pentium PC Architecture
CPU North-BridgeCache DRAM
SouthBridge (PIIX3)(PIC, PIT, …)
PCI Bus
ISA Bus
PCIDevices
ISADevices BIOSSuper-IO BIOS
RT
CU
art
sK
bd
/Ms
eF
lop
py
Pa
rall
el
...
Addr/Data Ctrl
Ctrl
Addr/Data/Ctrl
Intr
NM
IIN
IT
8WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
System FPGA
Intel Embedded Module
SPC Hardware Architecture
CPU North-BridgeCache DRAM
PCI Bus
APIC
Addr/Data Ctrl
Ctrl
Addr/Data/Ctrl
Intr
NM
IIN
IT
PITPICRTC’
BIOS ROMUART1 Interface
UART2 Interface
UART1
UART2Link Interface
Switch Interface
9WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
SPC Components• APIC - PCI Bus Master• Pentium Embedded Module
– 166 MHz MMX Pentium Processor• L1 Cache: 16KB Data, 16KB Code• L2 cache: 512 KB
– NorthBridge• 33 MHz, 32 bit PCI Bus• PCI Bus Master
• System FPGA - PCI Bus Slave– Xilinx XC4020XL-1 FPGA– 20K Equivalent Gates, ~ 75% used
10WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
SPC Components (continued)
• Memory– EDO DRAM– 64MB (Max for current design)– SO DIMM
• Switch Interface - 1 Gb Utopia
• Link Interface - 1 Gb Utopia
• UART– Two Serial Ports
• NetBSD system console
• TTY port
11WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Overview• Introduction to hardware environment
• APIC core processing and buffer management
• Overview of SPC kernel software architecture and processing steps
• Plugin environment and filters
• Command Facility
12WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
APIC Descriptors• APIC uses a data structure called a
descriptor to describe available buffers and their status.
• The hardware and software follow a well defined protocol for jointly managing the descriptors.
• The APIC controls one or more Free Descriptor chains, with each chain representing buffers available for Rx for a predefined set (one or more) of RX channels.
13WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
APIC Descriptors and Buffers
Match/Checksum - - - -
BufAddrLo
BufAddrHi
BufLen NextDesc
V I SOECLX T Y
081624 31
Physical Addressof Data Buffer
Index into Desc TableBuffer Length orAmount Left Unused
Flags: O - Read Only,E - EOF, C - CRC OK, T - Type, Y - Valid Bits
• Frame must be multiple of 48 B. • Buffers are 2048 B.• Max size = 2016 B, or 42 cells.• Reserve 8 B for shim and 8 B for
trailer• IP Datagram MTU must be 2000 B• At output port, max 2016 B frame
received, offset 8 bytes in buffer. • At most the 2024 B of buffer are
used.• 24 B at end of buffer not used.
Fragment offsetVersion H-len TOS Total length
Identification flagsTTL protocol Header checksum
Source AddressDestination Address
Options ??
Type (08.00)OUI (00.00)OUI (00)LLC (AA.AA.03)
IP data (transport header and transport data)
AAL5 padding (0 - 40 bytes)CPCS-UU (0) CPCS-UU (0) Length (IP packet + LLC/SNAP)
CRC
Shim
MSR Buffer (2048B)
24 Bytes Not Used
ShimNot Used
IP Datagram
AAL5 Paddingand Trailer
(Shim Offset Not used on Egress)
14WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Descriptor Notes
V = Volatile BufferI = Interrupt/Notify on Read·S = SAM EnableO = Read OnlyE = End of FrameC = CRC OK, RXL = Loss Priority (CLP of last cell), RXX = Congestion indication from last cell's PTI, RXT = BufType, 0 -> Data; 1 -> RM; 2 -> segment OAM; 3 end-2-end OAMY = Sync = 0 -> Done, Valid Link; 1 -> Done, InValid Link 2 -> Not Ready; 3 -> Ready
Possible values for First WordCAFE0083 = Tx, EoF, Ready (Driver)CAFE0080 = Tx, EoF, DoneValidLink (APIC)CAFE0002 = Tx, NotReadyCAFE0003 = Rx, Ready, No Interrupt on ReadCAFE0403 = Rx, Ready, Interrupt on ReadXXXX00C0 = Rx, EoF, CRC OK, DoneValidLinkXXXX0040 = Rx, CRC OK, DoneValidLinkXXXX00C1 = Rx, EoF, CRC OK,
DoneInValidLinkXXXX0041 = Rx, CRC OK, DoneInValidLink
Physical Addressof Data Buffer
Index into Desc Table
Match/Checksum - - - -
BufAddrLo
BufAddrHi
BufLen NextDesc
V I SOECLX T Y
081624 31
MatchFlags
SizeNext
Low32Addr
15WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
APIC Descriptors
0xCAFE - - - -
BufAddrLo
BufAddrHi
2016 NextDesc
V I SO0
E0
C0LX T
00Y11 0xCAFE - - - -
BufAddrLo
BufAddrHi
2016 NextDesc
V I SO0
E0
C0LX T
00Y11
buffer buffer
Pool X Chain Head
Free Descriptor chain used by APIC during receive,each descriptor contains the physical address of an available buffer.
16WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Descriptors on a Receive Queue
checksum - - - -
BufAddrLo
BufAddrHi
0 NextDesc
V I SO0
E0
C0LX T
00Y00 checksum - - - -
BufAddrLo
BufAddrHi
1016 NextDesc ???
V I SO0
E1
C1LX T
00Y00
VC 101’ Queue
17WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
RX Descriptor to Buffer Mapping
Match/Checksum - - - -
BufAddrLoBufAddrHi
BufLen NextDescV I SOECLX T Y
Match/Checksum - - - -
BufAddrLoBufAddrHi
BufLen NextDescV I SOECLX T Y
Match/Checksum - - - -
BufAddrLoBufAddrHi
BufLen NextDescV I SOECLX T Y
Match/Checksum - - - -
BufAddrLoBufAddrHi
BufLen NextDescV I SOECLX T Y
Match/Checksum - - - -
BufAddrLoBufAddrHi
BufLen NextDescV I SOECLX T Y
Buffer
Buffer
Buffer
j
0
j+1
1j+2
j+3
j+N
N
2KBDescriptors
Buffers (replace Mbufs)
18WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Descriptor Layout
aal5rx_start
aal5tx_startaal5rx_end
aal5rx_end
aal5_count
aal5_count
aal0rx_startaal0rx_endaal0tx_start
aal0tx_end
aal0_count
aal0_count
RX channel 0, aal0_count_vci
RX channel 1, aal0_count_vci
TX channel 0, aal0_count_vci
TX channel 1, aal0_count_vci
local_count
Index Starting address := desc_area
unallocated
msr
_des
cr_c
oun
t
RX/TXShared
IP PacketBuffers
RX - CellBuffers
TX - CellBuffers
*aal5_pool
*aal0_pool
aal0
_cou
nt
local_start
local_end
Invalid Descriptor
19WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Descriptor & Buffer Relationships
Rx
Tx
DescriptorTable (DT)
currentrx offset
Descriptors
MSRBuffers (MB)
Buffers
notification
processing
ATM hdrconn status
current descresumepacing
Tx
processingconn status
current desc
Rxchannel
Globalregisters
APICpo
rt 0
port
1 port
2
Rx desc bound(same offset) to specific buffer
Tx Offset
same asrx offset
TX desc allocated dynamically
and bound to the RX desc and
bufferMSR
Buffer HeadersBuf Hdrs
same asbuf offset
20WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Receiving a Packet
Rx
Tx
DescriptorTable (DT)
indxDT base
MSRBuffers (MB)
indx
MB base
notification
processing
ATM hdrconn status
current descresumepacing
Tx
processingconn status
current desc
Rxchannel
Globalregisters
APICpo
rt 0
port
1 port
2
Driverand
IP code
cellcell
cell
2) APIC writes Rx’edAAL5 frame to buffer
referenced by newRx desc.
1) AAL5 frame is received:APIC allocates and reads desc from RX pool. Then the previous Rx desc
is written back (updated).
21WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Completing the Receive
Rx
Tx
DescriptorTable (DT)
indxDT base
IP hdr
IP data
MSRBuffers (MB)
indx
MB base
notification
processing
ATM hdrconn status
current descresumepacing
Tx
processingconn status
current desc
Rxchannel
Globalregisters
APICpo
rt 0
port
1 port
2
Driverand
IP code
3) Last: Assert Interrupt
1) APIC writes (updates)
current desc.
2) A
PIC
upd
ates
noti
fica
tion
reg
iste
r
APIC disables interrupts on Rx channel
22WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Sending Packet
Rx
Tx
DescriptorTable (DT)
indxDT base
IP hdr
IP data
MSRBuffers (MB)
indx
MB base
notification
processing
ATM hdrconn status
current descresumepacing
Tx
processingconn status
current desc
Rxchannel
Globalregisters
APICpo
rt 0
port
1 port
2
IPLookupTable
Driverand
IP code
2) a) write to current desc’s next indexb) Write to resume Tx channel register
cellcell
cell
3) APIC sends (reads) packet and interrupts
when done
1) allocate Tx desc and bind to Rx desc
and buffer
23WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Overview• Introduction to hardware environment
• APIC core processing and buffer management
• Overview of SPC kernel software architecture and processing steps
• Plugin environment and filters
• Command Facility
24WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02Broadcast Report
pluginpluginPlugin
Plugin Environment
Exa
ct M
atch
Gen
eral
Mat
chClassifier
Route Lookup(FIPL, Simple)
interrupt
...
SP1
SP2
SPN
Commands
DQ Reports
VOQ 0
VOQ 1
...
VOQ 7
APIC TX Qs:DQ AdjustsVOQ Pacing
Sub Port 1
Sub Port 0
Sub Port 2
Sub Port 3
Paced APIC TX queues
DRRService
...
handler():send budgetper interval
Ingress/Egress ?
CP command processorand debug message
SPC Software Architecture
commands
command reply
debug messages
periodic callbackinterrupt (D sec)
Read DQ Report Cells
AP
IC
AP
IC
Ingress
EgressIP
pro
cess
ing
inse
rt/p
roce
ss s
him
AP
IC S
peci
fic
Dri
ver
Cod
e
handler(): read cells, set pacing, broadcast report
DQ Service
25WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
SPC Data Path - Simplified View
...DQ/ In Queuing
pluginpluginPlugin
Plugin Environment
...DRR/Out Queuing
Flow Classifier/(channel map)
Route Lookup(Shim, FIPL, Simple, cache)
...
Fra
me/
Bu
ffer
an
d I
P P
roce
ssin
g
Ingress/Egress ?
NMFilter
26WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
DQ ReportsDistributed Queuingcallback: read cells, set pacing, broadcast report
VOQ 0
VOQ 1
...
VOQ 7
APIC TX Qs:DQ AdjustsVOQ Pacing
SPC Input (Ingress) Processing
periodic callbackinterrupt (D sec)
Read DQ Report Cells
IP P
roce
ssin
g:In
sert
Int
erP
ort
Shi
m
AP
IC S
peci
fic
Dri
ver
Cod
e
AP
IC
Flow Classifier/(channel map)
Route Lookup(FIPL, Simple)
NMFilter
PCU Framework
X.1 Z.1
W.1
Manage
X.2 Y.1Z.2Local Resource
Manager and PCU Interface
Plugin Environment
IP Options
Rep
lace
In
traS
him
wit
h I
nte
rSh
im.
Up
dat
e tr
aile
r an
d
IP h
ead
er
SP1
SP2
SP4
SP3
interrupt
AP
IC
Broadcast Report
27WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
DQ ReportsDistributed Queuingcallback: read cells, set pacing, broadcast report
VOQ 0
VOQ 1
...
VOQ 7
APIC TX Qs:DQ AdjustsVOQ Pacing
periodic callbackinterrupt (D sec)
Read DQ Report Cells
IP P
roce
ssin
g:In
sert
Int
erP
ort
Shi
m
AP
IC S
peci
fic
Dri
ver
Cod
e
AP
IC
Flow Classifier/(channel map)
Route Lookup(FIPL, Simple)
NMFilter
PCU Framework
X.1 Z.1
W.1
Manage
X.2 Y.1Z.2Local Resource
Manager and PCU Interface
Plugin Environment
IP Options
Rep
lace
In
traS
him
wit
h I
nte
rSh
im.
Up
dat
e tr
aile
r an
d
IP h
ead
er
SP1
SP2
SP4
SP3
interrupt
AP
IC
Broadcast Report
APIC RXAAL5 Frame
MSR Buffer (2KB)
IPdgram
trailerpadding
Rx offset
shim
IP
trailerpadding
Insert Shim
Output VINInput VINStream IdentifierNot UsedFlags
Intra/Inter Port Shim
AFNR OPUK XIntra Port Shim Flags
PN (10 bits) SPI (6 bits)
VIN Format
filter 1
filter 2
filter 3
filter 4
filter 5
filter 6
filter 7
filter 8
filter 9
filter 10
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
SearchInvoke instance handler
General Match Filter: Linear search using the 5-tuple {src_addr, dst_addr, src_port, dst_port, proto}, match
maps a flow to one or more plugin instancesSet input and output VIN in Shim,Calculate aal5 length, decrement ipttl, calculate IP header checksum. Placein APIC TX queue.
Input (Ingress) Processing
Fragment offsetVersion H-length TOS Total length
Identification FlagsTTL Protocol Header checksum
Source AddressDestination Address
IP data (transport header and transport data)
AAL5 padding (0 - 40 bytes)
CPCS-UU (0) CPCS-UU (0) Length (IP packet + LLC/SNAP)CRC (APIC calculates and sets)
8 Bytes
Source Port Destination Port
hash of ip header
Hash Field widths and offsets are configurable: msr/msr_classify.h
Flow Table
flow flow
hash
route cached inflow entry. If none call ip lookup (fipl/simple)
Exact Match Classifier:
28WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
interrupt
Classifier
IP p
roce
ssin
g:pr
oces
s sh
im
AP
IC S
peci
fic
Dri
ver
Cod
e
...
SP1
SP2
SPN
AP
IC
Sub Port 1
Sub Port 0
Sub Port 2
Sub Port 3
Paced APIC TX queues
DRRService
...
handler: send budget
per flow
AP
IC
periodic callbackinterrupt (D sec)
PCU Framework
X.1 Z.1
W.1
Manage
X.2 Y.1Z.2Local Resource
Manager and PCU Interface
Plugin Environment
IP Options
Flo
w C
lass
ifie
r/(c
han
nel
map
)NM
FilterD
eter
min
e O
ut
VC
R
emov
e S
him
up
dat
e A
AL
5 tr
aile
r an
d I
P h
ead
er
Output Port (Egress) Processing
DQreport Tx
queue lengths
29WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
interrupt
Classifier
IP p
roce
ssin
g:pr
oces
s sh
im
AP
IC S
peci
fic
Dri
ver
Cod
e
...
SP1
SP2
SPN
AP
IC
Sub Port 1
Sub Port 0
Sub Port 2
Sub Port 3
Paced APIC TX queues
DRRService
...
handler: send budget
per flow
AP
IC
periodic callbackinterrupt (D sec)
PCU Framework
X.1 Z.1
W.1
Manage
X.2 Y.1Z.2Local Resource
Manager and PCU Interface
Plugin Environment
IP Options
Flo
w C
lass
ifie
r/(c
han
nel
map
)NM
FilterD
eter
min
e O
ut
VC
R
emov
e S
him
up
dat
e A
AL
5 tr
aile
r an
d I
P h
ead
er
Output Port (Egress) Processing
DQreport Tx
queue lengths
APIC RXAAL5 Frame
MSR Buffer (2KB)Rx offset
shim
IP
trailerpadding
Verify Shim andadjust buffer andheader references
General and Exact matchclassifier same as ingress,
except route is obtained fromoutput VIN in Shim
•Adjust buffer•update trailer•update ip hdr
Remove Shim for TX
TX offset shim
IP
trailerpadding
Place in DRR queuefor this flow (referencedby flow entry).
Every D sec the DRR handler is executed. It sends up to MAX bytes per period (minus backlog)sharing available BW amongthe active flows.
APIC Output channels arepaced such that their sum isthe the effective link bandwidth.
30WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
What about Ethernet?
Host 1
Host 2
Host N
MSR
Router
Ethernet Switch
Router
Router
31WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
GigE Link Interface - Egress
ARP Table(M Entries)
MACIP
IP1 MAC1
IPM MACM
......
NH Table(4 entries)
IPVC
VC1 IP1
VC4 IP4
......
65 = SP1
66 = SP2
...
64+N = SPN
to NH
64 = SP0
to ES
if VC != 64,Lookup VC in
NH tablereturns IP used for ARP lookup(support N = 4)
if VC = 64,Lookup IP destinationaddress in
packet header
IP Header
data
AAL5 trailer IP Header
data
Ethernet
Add Ethernet header using
destination address from ARP table.
Add our Ethernet source address.
Maintain ARP table by snooping, sending ARPs and responding to ARP
broadcasts.
Software createsNH table at boot
time.
Fro
m F
PX
/SP
C
To
Nex
t H
op o
r E
nd
stat
ion
In development
32WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
GigE Link Interface - IngressARP Table(M Entries)
MACIP
IP1 MAC1
IPM MACM
......
64 = SP0
to FPX/SPC
IP Header
data
AAL5 trailer
IP Header
data
Ethernet
Fro
m N
ext
Hop
or
En
dst
atio
n
To
FP
X/S
PC
If source MAC in table then verify else addIf broadcast and ARP, process ARPelse if broadcast and IP broadcast goto Deliverelse if multicast and IP multicast goto Deliverelse if not our destination MAC address dropelse if IP unicast DeliverRemove Ethernet HeaderEncapsulate in AAL5 frameSend to switch on default VC (VC = 64)
In development
33WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Overview• Introduction to hardware environment
• APIC core processing and buffer management
• Overview of SPC kernel software architecture and processing steps
• Plugin environment and filters
• Command Facility
34WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Packet Classification & Plugins• Classification provides and opportunity to bind
flows to registered plugin instances.
• General classifier - Network Management– classification using 5-tuple
• <saddr, sport, daddr, dport, proto> ,
• Prefix match on address, exact match port and proto
• 0 is a wildcard for all fields
– input and output ports– filters added/removed via the command facility
35WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Flow Bound to a Plugin
...DQ/ In Queuing
pluginpluginPlugin
Plugin Environment
...DRR/Out Queuing
Flow Classifier/(channel map)
Route Lookup(Shim, FIPL, Simple, cache)
...
Fra
me/
Bu
ffer
an
d I
P P
roce
ssin
g
Ingress/Egress ?
NMFilter
instance->handle_packet(instance, packet, flags)
Call packet handler for bound instance with pointer to IP packet (struct ip *).
AAL5 Frame
Fragment offsetVersion H-len TOS Total length
Identification flagsTTL protocol Header checksum
Source AddressDestination Address
Options ??
IP data (transport header and transport data)
AAL5 padding (0 - 40 bytes)
CPCS-UU (0) CPCS-UU (0) Length (IP packet + LLC/SNAP)CRC
Shimpkt (struct ip *)
handle_packet(inst, pkt, flags) { /* Plugin may read and/or * modify content but not * delete it unless COPY. * On return the framework * forwards packet */ ... return;}
Rule 1Rule 2Rule 3Rule 4Rule 5Rule 6Rule 7Rule 8Rule 9
Rule 10
i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5
SearchInvoke instance handler
General Match Classifier: Linear search of {src_addr, dst_addr, src_port, dst_port, proto}. General Classifier options: {First, Last, All}
Rule Actions: {Deny, Permit, Active}.Rule flags {All, Copy, Stop}
Send packet to exactmatch classifier
Flow Table
flow flow
hash Instance 1{Active}
Flow entry to plugin has a one-to-one relationship.
Exact Match Classifier: Hash{src_addr, dst_addr, src_port, dst_port},
then linear search for flow spec. Exact Match Classifier options: None.
Rule Actions: {Deny, Permit, Active, Reserve}.Rule flags {Pinned, Idle, Remove}
Exact Match: active processing same as general match. The AAL5 length is and IP header checksum are calculated so plugin does not have to perform these operations.
36WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Search
Rule 1Rule 2Rule 3Rule 4Rule 5Rule 6Rule 7Rule 8Rule 9
Rule 10
i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5
Invoke instance handler
General Match Classifier: Linear search of {src_addr, dst_addr, src_port, dst_port, proto}
• General Classifier options: {First, Last, All} • Rule Actions: {Deny, Permit, Active}. • Rule flags {All, Copy, Stop}
General Match Classifier Notes
37WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Flow Table
flow flow
hash Instance 1{Active}
Flow entry to plugin has a one-to-one relationship
General Match Classifier: Linear search of - {src_addr, dst_addr, src_port, dst_port, proto}.
• Exact Match Classifier options: None.• Rule Actions: {Deny, Permit, Active, Reserve}.• Rule flags {Pinned, Idle, Remove}
Exact Match Classifier Notes
38WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Instance 1{Active}
Instance 2{Active, All}
Instance 1{Deny}
Rule N
General/Exact Match Classifier
Class A“plugin x”
Class B“plugin y”
Class C“plugin z”
Rule P
Instance 1{Active}
•Plugin instance maps to at most one rule/filter.•General classifier: rule maps to at most 5 instances.•Exact match classifier: rule maps to at most 1 instance.
Active Processing Environment
39WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Creating an InstanceClass A
classid = 100inst_t *create_instance(class_t *, inst_id)
Instance of Class A - (Base Class extended by Developer)
<Fields defined by the Base Class>class_t*classinst_t *nextinst_id idfid_t bound_fidvoid (*handle_packet) (inst_t *, ip_t *, flag32_t);void (*bind_instance) (inst_t *);void (*unbind_instance) (inst_t *);void (*free_instance) (inst_t *);int (*handle_msg) (inst_t *, buf_t *, flag8_t, seq_t,
len_t *)<Class Specific Data>
...
create class instance
Return referenceto instance
create_instance() Called by PCU framework in response to
receiving command.
struct my_inst { inst_t base;subclass defs};
40WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Plugin Class Specific Interface• All plugins belong to a class. At run time a class (i.e.
plugin) must be instantiated before it vcan be referenced.• Plugin is passed its instance pointer (like c++) as the first
argument.• Developer may extend the base class (struct rp_instance) to
include additional fields which are local to each instance. • Plugin developer must implement the following methods:
– void(*handle_packet)(struct rp_instance *, struct ip *, u_int32_t);
– void(*bind_instance)(struct rp_instance *);
– void(*unbind_instance)(struct rp_instance *);
– void(*free_instance)(struct rp_instance *);
– int (*handle_msg)(struct rp_instance *, void *, u_int8_t, u_int8_t, u_int8_t);
41WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Plugin Framework Enhancements
• Integrated with Command framework– send command cells to PCU:
• create instance, free instance, bind instance to filter, unbind instance
– Send command cells to particular plugin instances– Send command cells to plugin base class
• Enhanced interface to address limitation noticed in crossbow:– instance access to: plugin class, instance id, filter id– pcu reports describing any loaded classes, instances
and filters
42WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Overview• Introduction to hardware environment
• APIC core processing and buffer management
• Overview of SPC kernel software architecture and processing steps
• Plugin environment and filters
• Command Facility
43WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Command Facility Highlights• Overview
• High level description - Application Layer
• MSR Command Interface Overview
• Cell format and field definitions
• Example
44WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Definitions• Session: Open connection between the CP
and a specific SPC. Intended to represent open connections and command state
• Transaction: Represent a complete command. A transaction terminates with either an EOF is received by the CP or and error occurs.
• EOF: End of File is returned to CP with the last bit of command data is returned or in response to a Cancel message (or an error occurs)
45WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Overview - Cmd Interface on CP
• Synchronous Request/Response protocol
• Timeout can be specified as well as the number of retries - Per session option– Essentially provides a reliable service– Issue: if no reply, cmd/reply msg lost in port,
channel or CP. Retries may be a bad thing.
• Address - MSR Port and Command – <MSR_Port, MSR_Command>
• Message destination - Callback function within the Port’s kernel (implements command)
46WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Command Interface on CP• Types of messages:
– New Command, Get Next set of reply data Command, Cancel Command
– Error Reply, EOF Reply, Continued Reply
• Message Identifiers - Only requires a sequence number initialized to 0 for each New Command:– One sending entity on CP, – One outstanding command for each port,– Ports send exactly one reply msg per command msg,– Command must fit within one cell,– Replies may span multiple cells.
47WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Command Interface on Port• Callback function registered with MSR kernel
and called under 3 cases:– New Command
• Flags = Command; Sequence = 0; Length = valid bytes in buffer; Buffer = application data
– Next Command • Flags = Command | Next; Sequence = previous+1; Length
= valid bytes in buffer; Buffer = application data
– Cancel Command • Flags = Command | Cancel; Sequence = previous+1;
Length = 0; Buffer contains no valid data
48WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Command Interface on Port• Callback function must:
– Read from/Write to supplied buffer– Set length = Bytes written to buffer (in/out param)– Indicate if an error occurred (return -1)– Whether more data exists (return 0 => EOF, return > 0
=> Not EOF, return < 0 => ERROR | EOF)
• Framework: – generates reply message using same Command value
and Sequence number.– sets flags indicating status (EOF, Error etc)
49WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Failure Modes• Library support for lost messages:
– if (timeout > 0, Replies > 0), then CP API library will re-send with RETRY flag set.
– if (timeout > 0, Replies = 0 or all replies failed), then API library returns error to application
– If (timeout = 0 - No Timeout), then send operation blocks indefinitely.
• Lost Command message - – if (timeout > 0 and retries > 0), CP resends command;
same sequence number but RETRY flag set. Command buffer and flags passed to callback fn.
50WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Failure Modes• Lost Reply message,
– if no retries, Any issues?– if retries then CP resends
• New Command - Port knows this is a duplicate command (RETRY flag). Application responsible for handling retries. If an issue can use unique message ids. Extreme case use a history (last reply message).
• Next Command - Port receives Command w/Sequence > 0, w/RETRY flag. Passed to application which chooses the correct course of action. The intent is to ensure there are no holes in the reply data received by the CP.
• Cancel message - same as Next command.
51WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Possible Enhancements• Support asynchronous messaging:
– Multiple outstanding commands per port– Asynchronous I/O on CP– Speed up boot process and dynamic configuration– Facilitates implementing port monitoring (ping or
heartbeat) for fault detection and recovery. – two methods for reporting results:
• upcall - function registered by application is called when results arrive
• poll - application periodically polls library for results.
• Support Broadcast and/or Multicast
52WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Layer• Simple messaging facility optimized for MSR.
• Command message (CP sends):– Sent by CP to a specific MSR port (unicast)– Must fit within one AAL0 cell.– Message header, includes:
• protocol version
• Command
• Sequence number
• flags
– Application data follows header– Library implements Request/Reply protocol.
53WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Layer• Reply Message (Port sends):
– Port must send reply message in response to a Command message.
– Reply message Header:• version and sequence number: same as command msg.
• Includes application data and flags indicating if command was successful and if more data exists (EOF).
– Application registers command specific callback function at port.
– Callback function must conform to specified interface.
54WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Overvew
• Command Protocol description– Control Processor sends command messages to a
specific port and expects to receive a reply message indicating either Success or Failure. This is termed a Command Cycle.
– There is the notion of a Command Transaction which may include one or more command cycles. A command transaction is terminated when the target (port) responds with a reply msg containing an EOF
55WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Overvew• Command Protocol description, continued
– CP processing of Reply msg, depends on EOF flag:• If EOF is set then no further reply data is available and
the command transaction is closed.
• If EOF is not set then there is remaining data and the command transaction is still open.
– If remaining data (Not EOF), then CP must follow with a either a Next or Cancel command message.
• Sequence number indicates the “chunk” of data to be returned.
• Command indicates the message’s destination
• sequence number = previous + 1
56WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
APIC Cell FormatM
SR C
omm
and
Mes
sage
81624 0cidldccpoutx xxpin
clgfc vpi vci pti
x x x x x x x x x x x x x
ver length command/status sequence number flags
• Cell payload contains the MSR Command• Command header is 4 Bytes, leaving 44 Bytes
for sub-commands and data.
57WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
ATM/APIC Header
• pin (Ports-In) - Port cell arrived– Tx not used (set to 000b)– Rx: port cell arrived on (is the below correct?)
• 001 Port 0, 010 Port 1, 100 Port 2, etc.
• pout (Ports-Out) - Set of output ports.– Tx: Command library sets:
• 001 Fiber/Link, 010 Ribbon/Switch, 011 Both• 101 Loopback MV0, 110 Loopback MV1
– Rx: Set by VCXT, see pin above.
cidldccpoutx xxpin
clgfc vpi vci pti
x x x x x x x x x x x x x
58WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
ATM/APIC Cell Format
• cc (Control Cell Indicator) - Not used, set to 0b
• ld (Low Delay) - Not used, set to 0b. – Should we use low delay?
• cid (Connection Identifier) - set to vci value.
• gfc (Generic Flow Control) - set to 0000b.
cidldccpoutx xxpin
clgfc vpi vci pti
x x x x x x x x x x x x x
59WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
ATM/APIC Cell Formatcidldccpoutx xxpin
clgfc vpi vci pti
x x x x x x x x x x x x x
• vpi (Virtual Path Identifier) - Set to 0x0.
• vci (Virtual Circuit Identifier) - Equal to cid. – See presentation on MSR configurations for a
complete list of VCI assignments.
• pti (Payload Type) - Set to 000b (data cell)
• cl (Cell Loss Priority) - Set to 0b (High Priority)
60WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Header
• Version (2 bits) - Protocol version. Allows for at most 4 versions. Current version set to 0.– field width was a trade off with the length field.
• Length (6 bits) - Number of valid data bytes.– 0 <= Length <= 44, so 6 bits sufficient.– This field is indirectly set by the application or
command implementation. The CP library and kernel interfaces allow for applications to pass a buffer pointer and indicate the number of valid data bytes.
ver length command/status sequence number flags
61WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Header
• Command/Status (8 Bits) CP inserts command value, SPC/port inserts status information.– Valid Commands are listed in $SYS/msr/msr_ctl.h,
also see $MSR/utils/command/*.{c,h}– Library API on CP accepts Command as argument.
implementation in kernel - array of function pointers, uses Command as index
– Reply msg Status indicating success or an error code (Upcall, ATM, Cmd Invalid, Cmd Not Implemented, or Other Cmd Error).
ver length command/status sequence number flags
62WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Header
• Sequence Number (8 Bits) - Is of primary use by the applications. – When command message first sent, sequence = 0.– If the reply does not include an EOF flag, then CP
increments sequence by one for each subsequent command message.
– When EOF is received the Command Transaction is complete and the sequence number is reset to 0.
ver length command/status sequence number flags
63WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Command Header
• Flags (8 bits) - Bit field, valid flags are:– Invalid - flag = 0, should not occur– CMD - cell contains a valid command from CP– REPLY - cell contains reply from Port– ERROR - Reply only, error processing on Port – EOF - No reply data remains, end of cmd transaction– NEXT - get next set of reply data– CANCEL - cancel current cmd transaction– RETRY - set if cp resend a command after it was lost
ver length command/status sequence number flags
64WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
CP Library API• Library API for application on CP,
– int sendcmd(int sid, int cmd, char *data, int flags, int *dlen)
• sid = session id, • cmd - Command to execute on port• data = buffer pointer, • flags =
– RETRY (reply timeout),
– CANCEL (cancel current command),
– Next (get next set of reply data)
65WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
MSR Kernel API• MSR kernel interface $SYS/msr/msr_ctl.{h,c}• Callback function signature:
– msr_ctl_<cmd> (void *buf, u_int8_t flags, u_int8_t seq, u_int8_t *dlen)
– buf = command buffer w/application data,
– flags = • CMD,
• NEXT,
• RETRY or
• CANCEL,
– seq = sequence number indicating reply data set, and
– dlen is input/output parameter, data length in bytes.
66WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Kernel State Diagram
Command
Closed
Next Retry
Idle
Retry
Com
man
d
Proto
Error
CommandEO
F
Proto Error
Cance
l
67WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
CP Library State Diagram
Wait (for reply)
Closed
Next Retry
Idle
Proto
col E
rror Command
EOF
Protocol Error
Open Session
Result of a timeout
68WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/8/02
Example Sending Cmd to Port
CP
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
wugsP0
P1
P2
P3
P4
P5
P6
P7
192.168.200.X
192.168.201.X
192.168.202.X
192.168.203.X
192.168.204.X
192.168.205.X
192.168.206.X
192.168.207.XSPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
DQ
DQ
DQ
DQ DQ
DQ
DQ
DQ
192.168.203.2
192.168.202.2
sendcmd();create plugin instance:
port id = 0,PluginID = 200
cmddata
cell hdr
msr_ctl
reply();plugin instance created:
Status,Instance ID
Report command completion status
to application.
Lookup sub-commandperform function call
then report results