Upload
trephena-patricia
View
153
Download
2
Embed Size (px)
Citation preview
VHDL design in Liberouter accelerating cards
Jan Ko enek ř[email protected]
Accelerating cards
� Combo6 card
� Connected to PCI bus.
� Routing and filtering functionality.
� Interface card
� Connected to Combo6 card.
� PHYTERs driving
� Buffers
� Probably some routing functionality blocks.
HW resources
� Combo6
� FPGA Virtex II – XCV2-3000
� DDRAM
� 3 x SSRAM
� CAM
� Interface card
� 2 x FPGA Virtex II – XCV2-1000
� 2 x SSRAM
� 4 x PHYTER
PHYTER
PHYTER
PHYTER
PHYTER
Combo6 card
CAM
SDRAM
XCV2−3000Virtex−II
SSRAM
PLX
SSRAM
SSRAM
PCI bus
SSRAM
SSRAM
XCV2−1000Virtex−II
XCV2−1000Virtex−II
Interface card
IO in
terfa
ces
VHDL design entities
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
DRAMscheduler
Addressdecoder
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
VHDL design (ifc. card)
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
DRAMscheduler
Addressdecoder
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
PHYTER control
� Necessary PHYTER initialization
� Serial access to internal registers
� Speed can be slow.
� Registers are driven by SW.
� Serialize or deserialize data from or to PCI bus.
MDIO
MDC_OUTMDC_IN
PHY_ADR
REG_ADR
CLK_IN
RDY
PHYTER conrol
DATA
R/W
Input and output buffer
� Header field extractor suppose Virtex II Pro chip and its Rocket IO transceivers.
� Necessary functionality.
� Elastic buffer store one or more packets.
� Some signals – Full, Half Full, ...
� Compute CRC
� Not implemented yet
VHDL design – HFE
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
DRAMscheduler
Addressdecoder
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
Header Field Extractor
� Small universal processor (nano-processor).
� Analyze packet header and retrieve important information for routing and filtering.
� Store this information into the unified header structure.
� Send unified header to Look-up processor.
� Send packet to the DRAM scheduler.
Header Field Extractor implementation
UH_ADDRESSHFE_CORE
CPU
Instruction Memory
BlockRAM
ADDR
INSTR
ADDR
DATA
UH_DATA
DRAM DATA
CONTROL
PACKET DATA
Finite State Machine
REQs
ACKs
WENs
STATE sigs
External registers
HFE – main components
Processor core and instruction memory
External registers set – accesses environment (packets data input, DRAM,…) to processor core
Finite State Machine – processes DRAM and UH communication, may work core independently (intelligent peripherals)
Others – counters,… mapped into register set
Processor core overview
Simple RISC core, 16 bit data processing
But allows 8, 4 and 1 bit operations too
Fast loop and jump support (no wait cycles)
Arithmetic operations are reduced to addition and subtraction
Everything is mapped into one memory space (inputs, outputs, control, RAM), every instruction can access any register, I/O port or RAM memory
But program memory and stack is invisible (Harvard architecture) – faster, stable
Two-levels pipeline – decode and execute phases
Processor core structure
CONSTANTS GENERATOR
RAM
DATA Address Generation Unit
CONTROL & DECODE UNIT
ALU
INSTR
„Z“
SRC_ADDR
DST_ADDR
INTERNAL REGISTRES
MX
DIN
DOUT
MX
PIPELINE REGISTERSCLK
CONTROL SIGNALS
HFE_COREINSTRUCTION Address Generation Unit
IADDR
DSTADDR
STACK
Current state
� Processor core and most of peripherals fully implemented
� Working frequency about 60 MHz, we need some optimizations
� About 600 CLBs occupied (4,2 %)
� VHDL simulations, real packet data on input
Future performance improvements
� Discard the 16 bit adder – we don’t need addition and subtraction
� Optimize buses
� No indirect addressing
� More clock cycles for some instructions
� …
VHDL design – LUP
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
DRAMscheduler
Addressdecoder
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
Look-up processor
� Why we need this?
Recognize unified header and select output interfaces, packet priority and packet editation.
� Apply firewall rules
� Function description
� Load unified header from input FIFO
� Do the match and retrieve informations
� Send informations to Replicator
Look-up abstraction
Block structure
CAM block
� Do the match in CAM memory
� Select part of unified header (16 registers)
� Load registers into the buffer – necessary for full CAM performance
� Do the match in CAM
� Retrieve matched address and put it to processing unit
� Mutual exclusion to unified header FIFO.
Processing unit
� Simple processor.
� Program address get from CAM block.
� Supported instructions.
� TAB – new program counter value created from unified header.
� Jxxx – test lower and upper bounds.
� EXE – Last program instruction. Contain information for next blocks.
Current state
� HW design
Implementation in VHDL
! Behavioral simulations
" Post place and route processing unit simulations
VHDL design – Replicator
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
DRAMscheduler
Addressdecoder
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
Replicator
# Block that replicates matched data structure to Edit engine input queues
$ Function description
% Get input data structure from Match Engine or SW
& Load appropriate number replication data structure from BlockRAM memory and send them with address of packet into Priority queues
' Increment reference to DRAM allocation block
( Update statistic informations
Block diagram
Data structure
VHDL design – PQ
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
DRAMscheduler
Addressdecoder
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
Priority queues
) Records on every interface are sorting by priority and putting to the Edit Engine.
* Records are temporary stored inside SSRAM memory.
+ Replication on interface
, Why isn't this done by replicator?
- Spend less memory
. Replicator simplification
Architecture
/ Two memory components
0 16x64b DESC queues description and status
1 Generates SSRAM addresses
2 Helps for actual queue search
3 32x4b ASGN assigns every queue to interface (EE or SW)
4 Provides information for SEARCH block
Q15 Q1 Q0 Q0
Q0
Q1
Q2
Q15
Start WritePtrReadPtrLength
ASGN
DESC
Q2
Edit engine 1
Block structure
State diagram
Design critical points
5 Concurrent searching of actual queue and putting records to Edit Engine. It is necessary synchronization and dual-port memory DESC
6 Different behavior of SW and Edit Engine interface (SW hasn't WB state)
7 Design suitable pipeline
8 Current state – implementation in VHDL
VHDL design – Edit engine
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
DRAMscheduler
Addressdecoder
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
9 Block that creates the output stream for outgoing packets
: Inserts new L2 header
; Decrements Hop Limit
< Routing Header Options
= Encapsulate/Decapsulate IPv6 packet
> etc ....
Edit engine
Input data
Block Diagram
Instructions
Instructions for data sending and modification
? send data by given size
@ SPDP, SPEP, SAPB, SAPC
A send data to reference position
B SPDU, SPDE
Control instructions
C MARK, NXTO, LDEN, LDOP
Current state
VHDL design – DRAM scheduler
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
Addressdecoder
DRAMscheduler
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
DRAM scheduler
D DDR SDRAM memory control.
E Load and store packets into DDR SDRAM memory.
F Memory divided into fixed length blocks.
G Free blocks control.
H Bootle neck of the design – we need maximal memory speed.
I Three different types of interfaces – HFE, Replicator and Edit engine.
Block structure
32B 32B
32B 32B
32B 32B
32B 32B
EEinterface
REPinterface
interfaceHFE
Low level DDRSDRAM control
Timeslotscontrol
Addresscontrol
Scheduler_core
BlocRAM
BlocRAM
BlocRAM
BlocRAM
DDR SDRAM memory
HFE1
HFE2
HFE3
HFE4
EE1
EE2
EE3
EE4
Data Control
Core structure
J Low level SDRAM control
K Communication with SDRAM
L Time slots control
M Time sharing strategy
N Every component has slot
O Address control
P Number of references for every block address.
Scheduler_core
AddressTimeslotscontrol control
Low level DDRSDRAM control
Low level DDR SDRAM control
Q DRAM control
R Load and store data
S DCM – clock generation and phase shifting
T Command generator
U Memory initialization
V Auto refresh and read/write cycles
W Data path
X Time transforms (data are delayed)
SLow level DDR SDRAM control
data_from_ddr
data_to_ddr
clk_in
read
write
DD
R S
DR
AMdata
command
Data
Command
generator
2x D
cm
path
Current state
Y DDR SDRAM test implemented
Z HFE interface
[ Implemented, but no simulation
\ Low level scheduler
] Implemented
^ Behavioral simulation
_ Other blocks
` Specified but not implemented
VHDL design – Address decoder
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
DRAMscheduler
Addressdecoder
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 card
Address decoder and local bus
a Address decoder
b Hierarchical address space
c chip select for next level
d Local bus
e Connect all blocks to PLX (PCI bus)
f Address and data multiplex – 16 bits
g Long wires –> necessary wait cycles
Current state of VHDL design
buffer
buffer
buffer
buffer
buffer
buffer
buffer
buffer
Look−upprocessor
Editengine
Editengine
Editengine
Editengine
extractorfield
Header
extractorfield
Header
extractorfield
Header
extractorfield
Header
Addressdecoder
DRAMscheduler
FIFO
FIFO
FIFO
FIFO
PriorityqueueReplicator
PHYTERcontrol
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
To all blocksFrom all blocks
PCI bus
PH
YTE
RP
HY
TER
PH
YTE
RP
HY
TER
Interface card Combo6 cardBlock phase
Next steps
h Finish all blocks implementation
i Design completion (concurrently)
j Testing functionality and bugs fixing
k Move some blocks to interface card
l Adding new features and improve performance.
The End
Thank you for your attention