68
Washington WASHINGTON UNIVERSITY IN ST LOUIS [email protected] http://www.arl.wustl.edu/~fredk (SPC) Port-Level Processing: the MSR Kernel Fred Kuhns Washington University Applied Research Laboratory

Washington WASHINGTON UNIVERSITY IN ST LOUIS [email protected] (SPC) Port-Level Processing: the MSR Kernel Fred Kuhns

Embed Size (px)

Citation preview

WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

[email protected] http://www.arl.wustl.edu/~fredk

(SPC) Port-Level Processing:the MSR Kernel

Fred KuhnsWashington University

Applied Research Laboratory

2WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Overview• Introduction to hardware environment

• APIC core processing and buffer management

• Overview of SPC kernel software architecture and processing steps

• Plugin environment and filters

• Command Facility

3WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

ControlProcessor

Switch Fabric

AT

M S

wit

ch C

ore

Por

t P

roce

ssor

s

FPX

SPC

LC

IPP

OP

P

FPX

SPC

LC

IPP

OP

P

FPX

SPC

LC

IPP

OP

PFPX

SPC

LC

IPP

OP

P

FPX

SPC

LC

IPP

OP

P

FPX

SPC

LC

IPP

OP

P

Line Cards (link interfaces)

Port Processors: SPC and/or FPX

4WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

APIC

IPClassifier

DQModule

NID

X.1

Z.2

shim

Act

ive

pro

cess

ing

SPC FPX

Flow Control

Shim contains results of classification step

Using Both and FPX and SPC

5WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Focus on SPC as Port Processor

ControlProcessor

Switch Fabric

. . .Flow/Route

Lookup

Dist. Q. Ctl.Dist. Q. Ctl.

OutputPortProc.

FlowLookup

InputPortProc.

Flow/RouteLookup

Dist. Q. Ctl.Dist. Q. Ctl.

FlowLookup

SPC SPC

6WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

The SPC: an Embedded Processor

Switch Interface

Link Interface

Serial Ports

APIC CPU Module

PCI Bus

System FPGA

DRAM

7WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Typical Pentium PC Architecture

CPU North-BridgeCache DRAM

SouthBridge (PIIX3)(PIC, PIT, …)

PCI Bus

ISA Bus

PCIDevices

ISADevices BIOSSuper-IO BIOS

RT

CU

art

sK

bd

/Ms

eF

lop

py

Pa

rall

el

...

Addr/Data Ctrl

Ctrl

Addr/Data/Ctrl

Intr

NM

IIN

IT

8WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

System FPGA

Intel Embedded Module

SPC Hardware Architecture

CPU North-BridgeCache DRAM

PCI Bus

APIC

Addr/Data Ctrl

Ctrl

Addr/Data/Ctrl

Intr

NM

IIN

IT

PITPICRTC’

BIOS ROMUART1 Interface

UART2 Interface

UART1

UART2Link Interface

Switch Interface

9WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

SPC Components• APIC - PCI Bus Master• Pentium Embedded Module

– 166 MHz MMX Pentium Processor• L1 Cache: 16KB Data, 16KB Code• L2 cache: 512 KB

– NorthBridge• 33 MHz, 32 bit PCI Bus• PCI Bus Master

• System FPGA - PCI Bus Slave– Xilinx XC4020XL-1 FPGA– 20K Equivalent Gates, ~ 75% used

10WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

SPC Components (continued)

• Memory– EDO DRAM– 64MB (Max for current design)– SO DIMM

• Switch Interface - 1 Gb Utopia

• Link Interface - 1 Gb Utopia

• UART– Two Serial Ports

• NetBSD system console

• TTY port

11WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Overview• Introduction to hardware environment

• APIC core processing and buffer management

• Overview of SPC kernel software architecture and processing steps

• Plugin environment and filters

• Command Facility

12WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

APIC Descriptors• APIC uses a data structure called a

descriptor to describe available buffers and their status.

• The hardware and software follow a well defined protocol for jointly managing the descriptors.

• The APIC controls one or more Free Descriptor chains, with each chain representing buffers available for Rx for a predefined set (one or more) of RX channels.

13WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

APIC Descriptors and Buffers

Match/Checksum - - - -

BufAddrLo

BufAddrHi

BufLen NextDesc

V I SOECLX T Y

081624 31

Physical Addressof Data Buffer

Index into Desc TableBuffer Length orAmount Left Unused

Flags: O - Read Only,E - EOF, C - CRC OK, T - Type, Y - Valid Bits

• Frame must be multiple of 48 B. • Buffers are 2048 B.• Max size = 2016 B, or 42 cells.• Reserve 8 B for shim and 8 B for

trailer• IP Datagram MTU must be 2000 B• At output port, max 2016 B frame

received, offset 8 bytes in buffer. • At most the 2024 B of buffer are

used.• 24 B at end of buffer not used.

Fragment offsetVersion H-len TOS Total length

Identification flagsTTL protocol Header checksum

Source AddressDestination Address

Options ??

Type (08.00)OUI (00.00)OUI (00)LLC (AA.AA.03)

IP data (transport header and transport data)

AAL5 padding (0 - 40 bytes)CPCS-UU (0) CPCS-UU (0) Length (IP packet + LLC/SNAP)

CRC

Shim

MSR Buffer (2048B)

24 Bytes Not Used

ShimNot Used

IP Datagram

AAL5 Paddingand Trailer

(Shim Offset Not used on Egress)

14WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Descriptor Notes

V = Volatile BufferI = Interrupt/Notify on Read·S = SAM EnableO = Read OnlyE = End of FrameC = CRC OK, RXL = Loss Priority (CLP of last cell), RXX = Congestion indication from last cell's PTI, RXT = BufType, 0 -> Data; 1 -> RM; 2 -> segment OAM; 3 end-2-end OAMY = Sync = 0 -> Done, Valid Link; 1 -> Done, InValid Link 2 -> Not Ready; 3 -> Ready

Possible values for First WordCAFE0083 = Tx, EoF, Ready (Driver)CAFE0080 = Tx, EoF, DoneValidLink (APIC)CAFE0002 = Tx, NotReadyCAFE0003 = Rx, Ready, No Interrupt on ReadCAFE0403 = Rx, Ready, Interrupt on ReadXXXX00C0 = Rx, EoF, CRC OK, DoneValidLinkXXXX0040 = Rx, CRC OK, DoneValidLinkXXXX00C1 = Rx, EoF, CRC OK,

DoneInValidLinkXXXX0041 = Rx, CRC OK, DoneInValidLink

Physical Addressof Data Buffer

Index into Desc Table

Match/Checksum - - - -

BufAddrLo

BufAddrHi

BufLen NextDesc

V I SOECLX T Y

081624 31

MatchFlags

SizeNext

Low32Addr

15WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

APIC Descriptors

0xCAFE - - - -

BufAddrLo

BufAddrHi

2016 NextDesc

V I SO0

E0

C0LX T

00Y11 0xCAFE - - - -

BufAddrLo

BufAddrHi

2016 NextDesc

V I SO0

E0

C0LX T

00Y11

buffer buffer

Pool X Chain Head

Free Descriptor chain used by APIC during receive,each descriptor contains the physical address of an available buffer.

16WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Descriptors on a Receive Queue

checksum - - - -

BufAddrLo

BufAddrHi

0 NextDesc

V I SO0

E0

C0LX T

00Y00 checksum - - - -

BufAddrLo

BufAddrHi

1016 NextDesc ???

V I SO0

E1

C1LX T

00Y00

VC 101’ Queue

17WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

RX Descriptor to Buffer Mapping

Match/Checksum - - - -

BufAddrLoBufAddrHi

BufLen NextDescV I SOECLX T Y

Match/Checksum - - - -

BufAddrLoBufAddrHi

BufLen NextDescV I SOECLX T Y

Match/Checksum - - - -

BufAddrLoBufAddrHi

BufLen NextDescV I SOECLX T Y

Match/Checksum - - - -

BufAddrLoBufAddrHi

BufLen NextDescV I SOECLX T Y

Match/Checksum - - - -

BufAddrLoBufAddrHi

BufLen NextDescV I SOECLX T Y

Buffer

Buffer

Buffer

j

0

j+1

1j+2

j+3

j+N

N

2KBDescriptors

Buffers (replace Mbufs)

18WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Descriptor Layout

aal5rx_start

aal5tx_startaal5rx_end

aal5rx_end

aal5_count

aal5_count

aal0rx_startaal0rx_endaal0tx_start

aal0tx_end

aal0_count

aal0_count

RX channel 0, aal0_count_vci

RX channel 1, aal0_count_vci

TX channel 0, aal0_count_vci

TX channel 1, aal0_count_vci

local_count

Index Starting address := desc_area

unallocated

msr

_des

cr_c

oun

t

RX/TXShared

IP PacketBuffers

RX - CellBuffers

TX - CellBuffers

*aal5_pool

*aal0_pool

aal0

_cou

nt

local_start

local_end

Invalid Descriptor

19WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Descriptor & Buffer Relationships

Rx

Tx

DescriptorTable (DT)

currentrx offset

Descriptors

MSRBuffers (MB)

Buffers

notification

processing

ATM hdrconn status

current descresumepacing

Tx

processingconn status

current desc

Rxchannel

Globalregisters

APICpo

rt 0

port

1 port

2

Rx desc bound(same offset) to specific buffer

Tx Offset

same asrx offset

TX desc allocated dynamically

and bound to the RX desc and

bufferMSR

Buffer HeadersBuf Hdrs

same asbuf offset

20WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Receiving a Packet

Rx

Tx

DescriptorTable (DT)

indxDT base

MSRBuffers (MB)

indx

MB base

notification

processing

ATM hdrconn status

current descresumepacing

Tx

processingconn status

current desc

Rxchannel

Globalregisters

APICpo

rt 0

port

1 port

2

Driverand

IP code

cellcell

cell

2) APIC writes Rx’edAAL5 frame to buffer

referenced by newRx desc.

1) AAL5 frame is received:APIC allocates and reads desc from RX pool. Then the previous Rx desc

is written back (updated).

21WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Completing the Receive

Rx

Tx

DescriptorTable (DT)

indxDT base

IP hdr

IP data

MSRBuffers (MB)

indx

MB base

notification

processing

ATM hdrconn status

current descresumepacing

Tx

processingconn status

current desc

Rxchannel

Globalregisters

APICpo

rt 0

port

1 port

2

Driverand

IP code

3) Last: Assert Interrupt

1) APIC writes (updates)

current desc.

2) A

PIC

upd

ates

noti

fica

tion

reg

iste

r

APIC disables interrupts on Rx channel

22WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Sending Packet

Rx

Tx

DescriptorTable (DT)

indxDT base

IP hdr

IP data

MSRBuffers (MB)

indx

MB base

notification

processing

ATM hdrconn status

current descresumepacing

Tx

processingconn status

current desc

Rxchannel

Globalregisters

APICpo

rt 0

port

1 port

2

IPLookupTable

Driverand

IP code

2) a) write to current desc’s next indexb) Write to resume Tx channel register

cellcell

cell

3) APIC sends (reads) packet and interrupts

when done

1) allocate Tx desc and bind to Rx desc

and buffer

23WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Overview• Introduction to hardware environment

• APIC core processing and buffer management

• Overview of SPC kernel software architecture and processing steps

• Plugin environment and filters

• Command Facility

24WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02Broadcast Report

pluginpluginPlugin

Plugin Environment

Exa

ct M

atch

Gen

eral

Mat

chClassifier

Route Lookup(FIPL, Simple)

interrupt

...

SP1

SP2

SPN

Commands

DQ Reports

VOQ 0

VOQ 1

...

VOQ 7

APIC TX Qs:DQ AdjustsVOQ Pacing

Sub Port 1

Sub Port 0

Sub Port 2

Sub Port 3

Paced APIC TX queues

DRRService

...

handler():send budgetper interval

Ingress/Egress ?

CP command processorand debug message

SPC Software Architecture

commands

command reply

debug messages

periodic callbackinterrupt (D sec)

Read DQ Report Cells

AP

IC

AP

IC

Ingress

EgressIP

pro

cess

ing

inse

rt/p

roce

ss s

him

AP

IC S

peci

fic

Dri

ver

Cod

e

handler(): read cells, set pacing, broadcast report

DQ Service

25WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

SPC Data Path - Simplified View

...DQ/ In Queuing

pluginpluginPlugin

Plugin Environment

...DRR/Out Queuing

Flow Classifier/(channel map)

Route Lookup(Shim, FIPL, Simple, cache)

...

Fra

me/

Bu

ffer

an

d I

P P

roce

ssin

g

Ingress/Egress ?

NMFilter

26WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

DQ ReportsDistributed Queuingcallback: read cells, set pacing, broadcast report

VOQ 0

VOQ 1

...

VOQ 7

APIC TX Qs:DQ AdjustsVOQ Pacing

SPC Input (Ingress) Processing

periodic callbackinterrupt (D sec)

Read DQ Report Cells

IP P

roce

ssin

g:In

sert

Int

erP

ort

Shi

m

AP

IC S

peci

fic

Dri

ver

Cod

e

AP

IC

Flow Classifier/(channel map)

Route Lookup(FIPL, Simple)

NMFilter

PCU Framework

X.1 Z.1

W.1

Manage

X.2 Y.1Z.2Local Resource

Manager and PCU Interface

Plugin Environment

IP Options

Rep

lace

In

traS

him

wit

h I

nte

rSh

im.

Up

dat

e tr

aile

r an

d

IP h

ead

er

SP1

SP2

SP4

SP3

interrupt

AP

IC

Broadcast Report

27WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

DQ ReportsDistributed Queuingcallback: read cells, set pacing, broadcast report

VOQ 0

VOQ 1

...

VOQ 7

APIC TX Qs:DQ AdjustsVOQ Pacing

periodic callbackinterrupt (D sec)

Read DQ Report Cells

IP P

roce

ssin

g:In

sert

Int

erP

ort

Shi

m

AP

IC S

peci

fic

Dri

ver

Cod

e

AP

IC

Flow Classifier/(channel map)

Route Lookup(FIPL, Simple)

NMFilter

PCU Framework

X.1 Z.1

W.1

Manage

X.2 Y.1Z.2Local Resource

Manager and PCU Interface

Plugin Environment

IP Options

Rep

lace

In

traS

him

wit

h I

nte

rSh

im.

Up

dat

e tr

aile

r an

d

IP h

ead

er

SP1

SP2

SP4

SP3

interrupt

AP

IC

Broadcast Report

APIC RXAAL5 Frame

MSR Buffer (2KB)

IPdgram

trailerpadding

Rx offset

shim

IP

trailerpadding

Insert Shim

Output VINInput VINStream IdentifierNot UsedFlags

Intra/Inter Port Shim

AFNR OPUK XIntra Port Shim Flags

PN (10 bits) SPI (6 bits)

VIN Format

filter 1

filter 2

filter 3

filter 4

filter 5

filter 6

filter 7

filter 8

filter 9

filter 10

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

i1 i2 i3 i4 i5

SearchInvoke instance handler

General Match Filter: Linear search using the 5-tuple {src_addr, dst_addr, src_port, dst_port, proto}, match

maps a flow to one or more plugin instancesSet input and output VIN in Shim,Calculate aal5 length, decrement ipttl, calculate IP header checksum. Placein APIC TX queue.

Input (Ingress) Processing

Fragment offsetVersion H-length TOS Total length

Identification FlagsTTL Protocol Header checksum

Source AddressDestination Address

IP data (transport header and transport data)

AAL5 padding (0 - 40 bytes)

CPCS-UU (0) CPCS-UU (0) Length (IP packet + LLC/SNAP)CRC (APIC calculates and sets)

8 Bytes

Source Port Destination Port

hash of ip header

Hash Field widths and offsets are configurable: msr/msr_classify.h

Flow Table

flow flow

hash

route cached inflow entry. If none call ip lookup (fipl/simple)

Exact Match Classifier:

28WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

interrupt

Classifier

IP p

roce

ssin

g:pr

oces

s sh

im

AP

IC S

peci

fic

Dri

ver

Cod

e

...

SP1

SP2

SPN

AP

IC

Sub Port 1

Sub Port 0

Sub Port 2

Sub Port 3

Paced APIC TX queues

DRRService

...

handler: send budget

per flow

AP

IC

periodic callbackinterrupt (D sec)

PCU Framework

X.1 Z.1

W.1

Manage

X.2 Y.1Z.2Local Resource

Manager and PCU Interface

Plugin Environment

IP Options

Flo

w C

lass

ifie

r/(c

han

nel

map

)NM

FilterD

eter

min

e O

ut

VC

R

emov

e S

him

up

dat

e A

AL

5 tr

aile

r an

d I

P h

ead

er

Output Port (Egress) Processing

DQreport Tx

queue lengths

29WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

interrupt

Classifier

IP p

roce

ssin

g:pr

oces

s sh

im

AP

IC S

peci

fic

Dri

ver

Cod

e

...

SP1

SP2

SPN

AP

IC

Sub Port 1

Sub Port 0

Sub Port 2

Sub Port 3

Paced APIC TX queues

DRRService

...

handler: send budget

per flow

AP

IC

periodic callbackinterrupt (D sec)

PCU Framework

X.1 Z.1

W.1

Manage

X.2 Y.1Z.2Local Resource

Manager and PCU Interface

Plugin Environment

IP Options

Flo

w C

lass

ifie

r/(c

han

nel

map

)NM

FilterD

eter

min

e O

ut

VC

R

emov

e S

him

up

dat

e A

AL

5 tr

aile

r an

d I

P h

ead

er

Output Port (Egress) Processing

DQreport Tx

queue lengths

APIC RXAAL5 Frame

MSR Buffer (2KB)Rx offset

shim

IP

trailerpadding

Verify Shim andadjust buffer andheader references

General and Exact matchclassifier same as ingress,

except route is obtained fromoutput VIN in Shim

•Adjust buffer•update trailer•update ip hdr

Remove Shim for TX

TX offset shim

IP

trailerpadding

Place in DRR queuefor this flow (referencedby flow entry).

Every D sec the DRR handler is executed. It sends up to MAX bytes per period (minus backlog)sharing available BW amongthe active flows.

APIC Output channels arepaced such that their sum isthe the effective link bandwidth.

30WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

What about Ethernet?

Host 1

Host 2

Host N

MSR

Router

Ethernet Switch

Router

Router

31WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

GigE Link Interface - Egress

ARP Table(M Entries)

MACIP

IP1 MAC1

IPM MACM

......

NH Table(4 entries)

IPVC

VC1 IP1

VC4 IP4

......

65 = SP1

66 = SP2

...

64+N = SPN

to NH

64 = SP0

to ES

if VC != 64,Lookup VC in

NH tablereturns IP used for ARP lookup(support N = 4)

if VC = 64,Lookup IP destinationaddress in

packet header

IP Header

data

AAL5 trailer IP Header

data

Ethernet

Add Ethernet header using

destination address from ARP table.

Add our Ethernet source address.

Maintain ARP table by snooping, sending ARPs and responding to ARP

broadcasts.

Software createsNH table at boot

time.

Fro

m F

PX

/SP

C

To

Nex

t H

op o

r E

nd

stat

ion

In development

32WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

GigE Link Interface - IngressARP Table(M Entries)

MACIP

IP1 MAC1

IPM MACM

......

64 = SP0

to FPX/SPC

IP Header

data

AAL5 trailer

IP Header

data

Ethernet

Fro

m N

ext

Hop

or

En

dst

atio

n

To

FP

X/S

PC

If source MAC in table then verify else addIf broadcast and ARP, process ARPelse if broadcast and IP broadcast goto Deliverelse if multicast and IP multicast goto Deliverelse if not our destination MAC address dropelse if IP unicast DeliverRemove Ethernet HeaderEncapsulate in AAL5 frameSend to switch on default VC (VC = 64)

In development

33WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Overview• Introduction to hardware environment

• APIC core processing and buffer management

• Overview of SPC kernel software architecture and processing steps

• Plugin environment and filters

• Command Facility

34WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Packet Classification & Plugins• Classification provides and opportunity to bind

flows to registered plugin instances.

• General classifier - Network Management– classification using 5-tuple

• <saddr, sport, daddr, dport, proto> ,

• Prefix match on address, exact match port and proto

• 0 is a wildcard for all fields

– input and output ports– filters added/removed via the command facility

35WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Flow Bound to a Plugin

...DQ/ In Queuing

pluginpluginPlugin

Plugin Environment

...DRR/Out Queuing

Flow Classifier/(channel map)

Route Lookup(Shim, FIPL, Simple, cache)

...

Fra

me/

Bu

ffer

an

d I

P P

roce

ssin

g

Ingress/Egress ?

NMFilter

instance->handle_packet(instance, packet, flags)

Call packet handler for bound instance with pointer to IP packet (struct ip *).

AAL5 Frame

Fragment offsetVersion H-len TOS Total length

Identification flagsTTL protocol Header checksum

Source AddressDestination Address

Options ??

IP data (transport header and transport data)

AAL5 padding (0 - 40 bytes)

CPCS-UU (0) CPCS-UU (0) Length (IP packet + LLC/SNAP)CRC

Shimpkt (struct ip *)

handle_packet(inst, pkt, flags) { /* Plugin may read and/or * modify content but not * delete it unless COPY. * On return the framework * forwards packet */ ... return;}

Rule 1Rule 2Rule 3Rule 4Rule 5Rule 6Rule 7Rule 8Rule 9

Rule 10

i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5

SearchInvoke instance handler

General Match Classifier: Linear search of {src_addr, dst_addr, src_port, dst_port, proto}. General Classifier options: {First, Last, All}

Rule Actions: {Deny, Permit, Active}.Rule flags {All, Copy, Stop}

Send packet to exactmatch classifier

Flow Table

flow flow

hash Instance 1{Active}

Flow entry to plugin has a one-to-one relationship.

Exact Match Classifier: Hash{src_addr, dst_addr, src_port, dst_port},

then linear search for flow spec. Exact Match Classifier options: None.

Rule Actions: {Deny, Permit, Active, Reserve}.Rule flags {Pinned, Idle, Remove}

Exact Match: active processing same as general match. The AAL5 length is and IP header checksum are calculated so plugin does not have to perform these operations.

36WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Search

Rule 1Rule 2Rule 3Rule 4Rule 5Rule 6Rule 7Rule 8Rule 9

Rule 10

i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5i1 i2 i3 i4 i5

Invoke instance handler

General Match Classifier: Linear search of {src_addr, dst_addr, src_port, dst_port, proto}

• General Classifier options: {First, Last, All} • Rule Actions: {Deny, Permit, Active}. • Rule flags {All, Copy, Stop}

General Match Classifier Notes

37WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Flow Table

flow flow

hash Instance 1{Active}

Flow entry to plugin has a one-to-one relationship

General Match Classifier: Linear search of - {src_addr, dst_addr, src_port, dst_port, proto}.

• Exact Match Classifier options: None.• Rule Actions: {Deny, Permit, Active, Reserve}.• Rule flags {Pinned, Idle, Remove}

Exact Match Classifier Notes

38WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Instance 1{Active}

Instance 2{Active, All}

Instance 1{Deny}

Rule N

General/Exact Match Classifier

Class A“plugin x”

Class B“plugin y”

Class C“plugin z”

Rule P

Instance 1{Active}

•Plugin instance maps to at most one rule/filter.•General classifier: rule maps to at most 5 instances.•Exact match classifier: rule maps to at most 1 instance.

Active Processing Environment

39WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Creating an InstanceClass A

classid = 100inst_t *create_instance(class_t *, inst_id)

Instance of Class A - (Base Class extended by Developer)

<Fields defined by the Base Class>class_t*classinst_t *nextinst_id idfid_t bound_fidvoid (*handle_packet) (inst_t *, ip_t *, flag32_t);void (*bind_instance) (inst_t *);void (*unbind_instance) (inst_t *);void (*free_instance) (inst_t *);int (*handle_msg) (inst_t *, buf_t *, flag8_t, seq_t,

len_t *)<Class Specific Data>

...

create class instance

Return referenceto instance

create_instance() Called by PCU framework in response to

receiving command.

struct my_inst { inst_t base;subclass defs};

40WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Plugin Class Specific Interface• All plugins belong to a class. At run time a class (i.e.

plugin) must be instantiated before it vcan be referenced.• Plugin is passed its instance pointer (like c++) as the first

argument.• Developer may extend the base class (struct rp_instance) to

include additional fields which are local to each instance. • Plugin developer must implement the following methods:

– void(*handle_packet)(struct rp_instance *, struct ip *, u_int32_t);

– void(*bind_instance)(struct rp_instance *);

– void(*unbind_instance)(struct rp_instance *);

– void(*free_instance)(struct rp_instance *);

– int (*handle_msg)(struct rp_instance *, void *, u_int8_t, u_int8_t, u_int8_t);

41WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Plugin Framework Enhancements

• Integrated with Command framework– send command cells to PCU:

• create instance, free instance, bind instance to filter, unbind instance

– Send command cells to particular plugin instances– Send command cells to plugin base class

• Enhanced interface to address limitation noticed in crossbow:– instance access to: plugin class, instance id, filter id– pcu reports describing any loaded classes, instances

and filters

42WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Overview• Introduction to hardware environment

• APIC core processing and buffer management

• Overview of SPC kernel software architecture and processing steps

• Plugin environment and filters

• Command Facility

43WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Command Facility Highlights• Overview

• High level description - Application Layer

• MSR Command Interface Overview

• Cell format and field definitions

• Example

44WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Definitions• Session: Open connection between the CP

and a specific SPC. Intended to represent open connections and command state

• Transaction: Represent a complete command. A transaction terminates with either an EOF is received by the CP or and error occurs.

• EOF: End of File is returned to CP with the last bit of command data is returned or in response to a Cancel message (or an error occurs)

45WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Overview - Cmd Interface on CP

• Synchronous Request/Response protocol

• Timeout can be specified as well as the number of retries - Per session option– Essentially provides a reliable service– Issue: if no reply, cmd/reply msg lost in port,

channel or CP. Retries may be a bad thing.

• Address - MSR Port and Command – <MSR_Port, MSR_Command>

• Message destination - Callback function within the Port’s kernel (implements command)

46WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Command Interface on CP• Types of messages:

– New Command, Get Next set of reply data Command, Cancel Command

– Error Reply, EOF Reply, Continued Reply

• Message Identifiers - Only requires a sequence number initialized to 0 for each New Command:– One sending entity on CP, – One outstanding command for each port,– Ports send exactly one reply msg per command msg,– Command must fit within one cell,– Replies may span multiple cells.

47WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Command Interface on Port• Callback function registered with MSR kernel

and called under 3 cases:– New Command

• Flags = Command; Sequence = 0; Length = valid bytes in buffer; Buffer = application data

– Next Command • Flags = Command | Next; Sequence = previous+1; Length

= valid bytes in buffer; Buffer = application data

– Cancel Command • Flags = Command | Cancel; Sequence = previous+1;

Length = 0; Buffer contains no valid data

48WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Command Interface on Port• Callback function must:

– Read from/Write to supplied buffer– Set length = Bytes written to buffer (in/out param)– Indicate if an error occurred (return -1)– Whether more data exists (return 0 => EOF, return > 0

=> Not EOF, return < 0 => ERROR | EOF)

• Framework: – generates reply message using same Command value

and Sequence number.– sets flags indicating status (EOF, Error etc)

49WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Failure Modes• Library support for lost messages:

– if (timeout > 0, Replies > 0), then CP API library will re-send with RETRY flag set.

– if (timeout > 0, Replies = 0 or all replies failed), then API library returns error to application

– If (timeout = 0 - No Timeout), then send operation blocks indefinitely.

• Lost Command message - – if (timeout > 0 and retries > 0), CP resends command;

same sequence number but RETRY flag set. Command buffer and flags passed to callback fn.

50WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Failure Modes• Lost Reply message,

– if no retries, Any issues?– if retries then CP resends

• New Command - Port knows this is a duplicate command (RETRY flag). Application responsible for handling retries. If an issue can use unique message ids. Extreme case use a history (last reply message).

• Next Command - Port receives Command w/Sequence > 0, w/RETRY flag. Passed to application which chooses the correct course of action. The intent is to ensure there are no holes in the reply data received by the CP.

• Cancel message - same as Next command.

51WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Possible Enhancements• Support asynchronous messaging:

– Multiple outstanding commands per port– Asynchronous I/O on CP– Speed up boot process and dynamic configuration– Facilitates implementing port monitoring (ping or

heartbeat) for fault detection and recovery. – two methods for reporting results:

• upcall - function registered by application is called when results arrive

• poll - application periodically polls library for results.

• Support Broadcast and/or Multicast

52WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Layer• Simple messaging facility optimized for MSR.

• Command message (CP sends):– Sent by CP to a specific MSR port (unicast)– Must fit within one AAL0 cell.– Message header, includes:

• protocol version

• Command

• Sequence number

• flags

– Application data follows header– Library implements Request/Reply protocol.

53WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Layer• Reply Message (Port sends):

– Port must send reply message in response to a Command message.

– Reply message Header:• version and sequence number: same as command msg.

• Includes application data and flags indicating if command was successful and if more data exists (EOF).

– Application registers command specific callback function at port.

– Callback function must conform to specified interface.

54WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Overvew

• Command Protocol description– Control Processor sends command messages to a

specific port and expects to receive a reply message indicating either Success or Failure. This is termed a Command Cycle.

– There is the notion of a Command Transaction which may include one or more command cycles. A command transaction is terminated when the target (port) responds with a reply msg containing an EOF

55WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Overvew• Command Protocol description, continued

– CP processing of Reply msg, depends on EOF flag:• If EOF is set then no further reply data is available and

the command transaction is closed.

• If EOF is not set then there is remaining data and the command transaction is still open.

– If remaining data (Not EOF), then CP must follow with a either a Next or Cancel command message.

• Sequence number indicates the “chunk” of data to be returned.

• Command indicates the message’s destination

• sequence number = previous + 1

56WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

APIC Cell FormatM

SR C

omm

and

Mes

sage

81624 0cidldccpoutx xxpin

clgfc vpi vci pti

x x x x x x x x x x x x x

ver length command/status sequence number flags

• Cell payload contains the MSR Command• Command header is 4 Bytes, leaving 44 Bytes

for sub-commands and data.

57WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

ATM/APIC Header

• pin (Ports-In) - Port cell arrived– Tx not used (set to 000b)– Rx: port cell arrived on (is the below correct?)

• 001 Port 0, 010 Port 1, 100 Port 2, etc.

• pout (Ports-Out) - Set of output ports.– Tx: Command library sets:

• 001 Fiber/Link, 010 Ribbon/Switch, 011 Both• 101 Loopback MV0, 110 Loopback MV1

– Rx: Set by VCXT, see pin above.

cidldccpoutx xxpin

clgfc vpi vci pti

x x x x x x x x x x x x x

58WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

ATM/APIC Cell Format

• cc (Control Cell Indicator) - Not used, set to 0b

• ld (Low Delay) - Not used, set to 0b. – Should we use low delay?

• cid (Connection Identifier) - set to vci value.

• gfc (Generic Flow Control) - set to 0000b.

cidldccpoutx xxpin

clgfc vpi vci pti

x x x x x x x x x x x x x

59WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

ATM/APIC Cell Formatcidldccpoutx xxpin

clgfc vpi vci pti

x x x x x x x x x x x x x

• vpi (Virtual Path Identifier) - Set to 0x0.

• vci (Virtual Circuit Identifier) - Equal to cid. – See presentation on MSR configurations for a

complete list of VCI assignments.

• pti (Payload Type) - Set to 000b (data cell)

• cl (Cell Loss Priority) - Set to 0b (High Priority)

60WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Header

• Version (2 bits) - Protocol version. Allows for at most 4 versions. Current version set to 0.– field width was a trade off with the length field.

• Length (6 bits) - Number of valid data bytes.– 0 <= Length <= 44, so 6 bits sufficient.– This field is indirectly set by the application or

command implementation. The CP library and kernel interfaces allow for applications to pass a buffer pointer and indicate the number of valid data bytes.

ver length command/status sequence number flags

61WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Header

• Command/Status (8 Bits) CP inserts command value, SPC/port inserts status information.– Valid Commands are listed in $SYS/msr/msr_ctl.h,

also see $MSR/utils/command/*.{c,h}– Library API on CP accepts Command as argument.

implementation in kernel - array of function pointers, uses Command as index

– Reply msg Status indicating success or an error code (Upcall, ATM, Cmd Invalid, Cmd Not Implemented, or Other Cmd Error).

ver length command/status sequence number flags

62WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Header

• Sequence Number (8 Bits) - Is of primary use by the applications. – When command message first sent, sequence = 0.– If the reply does not include an EOF flag, then CP

increments sequence by one for each subsequent command message.

– When EOF is received the Command Transaction is complete and the sequence number is reset to 0.

ver length command/status sequence number flags

63WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Command Header

• Flags (8 bits) - Bit field, valid flags are:– Invalid - flag = 0, should not occur– CMD - cell contains a valid command from CP– REPLY - cell contains reply from Port– ERROR - Reply only, error processing on Port – EOF - No reply data remains, end of cmd transaction– NEXT - get next set of reply data– CANCEL - cancel current cmd transaction– RETRY - set if cp resend a command after it was lost

ver length command/status sequence number flags

64WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

CP Library API• Library API for application on CP,

– int sendcmd(int sid, int cmd, char *data, int flags, int *dlen)

• sid = session id, • cmd - Command to execute on port• data = buffer pointer, • flags =

– RETRY (reply timeout),

– CANCEL (cancel current command),

– Next (get next set of reply data)

65WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

MSR Kernel API• MSR kernel interface $SYS/msr/msr_ctl.{h,c}• Callback function signature:

– msr_ctl_<cmd> (void *buf, u_int8_t flags, u_int8_t seq, u_int8_t *dlen)

– buf = command buffer w/application data,

– flags = • CMD,

• NEXT,

• RETRY or

• CANCEL,

– seq = sequence number indicating reply data set, and

– dlen is input/output parameter, data length in bytes.

66WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Kernel State Diagram

Command

Closed

Next Retry

Idle

Retry

Com

man

d

Proto

Error

CommandEO

F

Proto Error

Cance

l

67WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

CP Library State Diagram

Wait (for reply)

Closed

Next Retry

Idle

Proto

col E

rror Command

EOF

Protocol Error

Open Session

Result of a timeout

68WashingtonWASHINGTON UNIVERSITY IN ST LOUIS

Fred Kuhns - 1/8/02

Example Sending Cmd to Port

CP

Next/PrevHop

Next/PrevHop

Next/PrevHop

Next/PrevHop

Next/PrevHop

Next/PrevHop

Next/PrevHop

wugsP0

P1

P2

P3

P4

P5

P6

P7

192.168.200.X

192.168.201.X

192.168.202.X

192.168.203.X

192.168.204.X

192.168.205.X

192.168.206.X

192.168.207.XSPC/FPX

SPC/FPX

SPC/FPX

SPC/FPX

SPC/FPX

SPC/FPX

SPC/FPX

SPC/FPX

DQ

DQ

DQ

DQ DQ

DQ

DQ

DQ

192.168.203.2

192.168.202.2

sendcmd();create plugin instance:

port id = 0,PluginID = 200

cmddata

cell hdr

msr_ctl

reply();plugin instance created:

Status,Instance ID

Report command completion status

to application.

Lookup sub-commandperform function call

then report results