49
VHDL design in Liberouter accelerating cards Jan Ko enek ř [email protected]

Vm Vhdl Design

Embed Size (px)

Citation preview

Page 1: Vm Vhdl Design

VHDL design in Liberouter accelerating cards

Jan Ko enek ř[email protected]

Page 2: Vm Vhdl Design

Accelerating cards

� Combo6 card

� Connected to PCI bus.

� Routing and filtering functionality.

� Interface card

� Connected to Combo6 card.

� PHYTERs driving

� Buffers

� Probably some routing functionality blocks.

Page 3: Vm Vhdl Design

HW resources

� Combo6

� FPGA Virtex II – XCV2-3000

� DDRAM

� 3 x SSRAM

� CAM

� Interface card

� 2 x FPGA Virtex II – XCV2-1000

� 2 x SSRAM

� 4 x PHYTER

PHYTER

PHYTER

PHYTER

PHYTER

Combo6 card

CAM

SDRAM

XCV2−3000Virtex−II

SSRAM

PLX

SSRAM

SSRAM

PCI bus

SSRAM

SSRAM

XCV2−1000Virtex−II

XCV2−1000Virtex−II

Interface card

IO in

terfa

ces

Page 4: Vm Vhdl Design

VHDL design entities

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 5: Vm Vhdl Design

VHDL design (ifc. card)

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

DRAMscheduler

Addressdecoder

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 6: Vm Vhdl Design

PHYTER control

� Necessary PHYTER initialization

� Serial access to internal registers

� Speed can be slow.

� Registers are driven by SW.

� Serialize or deserialize data from or to PCI bus.

MDIO

MDC_OUTMDC_IN

PHY_ADR

REG_ADR

CLK_IN

RDY

PHYTER conrol

DATA

R/W

Page 7: Vm Vhdl Design

Input and output buffer

� Header field extractor suppose Virtex II Pro chip and its Rocket IO transceivers.

� Necessary functionality.

� Elastic buffer store one or more packets.

� Some signals – Full, Half Full, ...

� Compute CRC

� Not implemented yet

Page 8: Vm Vhdl Design

VHDL design – HFE

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 9: Vm Vhdl Design

Header Field Extractor

� Small universal processor (nano-processor).

� Analyze packet header and retrieve important information for routing and filtering.

� Store this information into the unified header structure.

� Send unified header to Look-up processor.

� Send packet to the DRAM scheduler.

Page 10: Vm Vhdl Design

Header Field Extractor implementation

UH_ADDRESSHFE_CORE

CPU

Instruction Memory

BlockRAM

ADDR

INSTR

ADDR

DATA

UH_DATA

DRAM DATA

CONTROL

PACKET DATA

Finite State Machine

REQs

ACKs

WENs

STATE sigs

External registers

Page 11: Vm Vhdl Design

HFE – main components

Processor core and instruction memory

External registers set – accesses environment (packets data input, DRAM,…) to processor core

Finite State Machine – processes DRAM and UH communication, may work core independently (intelligent peripherals)

Others – counters,… mapped into register set

Page 12: Vm Vhdl Design

Processor core overview

Simple RISC core, 16 bit data processing

But allows 8, 4 and 1 bit operations too

Fast loop and jump support (no wait cycles)

Arithmetic operations are reduced to addition and subtraction

Everything is mapped into one memory space (inputs, outputs, control, RAM), every instruction can access any register, I/O port or RAM memory

But program memory and stack is invisible (Harvard architecture) – faster, stable

Two-levels pipeline – decode and execute phases

Page 13: Vm Vhdl Design

Processor core structure

CONSTANTS GENERATOR

RAM

DATA Address Generation Unit

CONTROL & DECODE UNIT

ALU

INSTR

„Z“

SRC_ADDR

DST_ADDR

INTERNAL REGISTRES

MX

DIN

DOUT

MX

PIPELINE REGISTERSCLK

CONTROL SIGNALS

HFE_COREINSTRUCTION Address Generation Unit

IADDR

DSTADDR

STACK

Page 14: Vm Vhdl Design

Current state

� Processor core and most of peripherals fully implemented

� Working frequency about 60 MHz, we need some optimizations

� About 600 CLBs occupied (4,2 %)

� VHDL simulations, real packet data on input

Page 15: Vm Vhdl Design

Future performance improvements

� Discard the 16 bit adder – we don’t need addition and subtraction

� Optimize buses

� No indirect addressing

� More clock cycles for some instructions

� …

Page 16: Vm Vhdl Design

VHDL design – LUP

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 17: Vm Vhdl Design

Look-up processor

� Why we need this?

Recognize unified header and select output interfaces, packet priority and packet editation.

� Apply firewall rules

� Function description

� Load unified header from input FIFO

� Do the match and retrieve informations

� Send informations to Replicator

Page 18: Vm Vhdl Design

Look-up abstraction

Page 19: Vm Vhdl Design

Block structure

Page 20: Vm Vhdl Design

CAM block

� Do the match in CAM memory

� Select part of unified header (16 registers)

� Load registers into the buffer – necessary for full CAM performance

� Do the match in CAM

� Retrieve matched address and put it to processing unit

� Mutual exclusion to unified header FIFO.

Page 21: Vm Vhdl Design

Processing unit

� Simple processor.

� Program address get from CAM block.

� Supported instructions.

� TAB – new program counter value created from unified header.

� Jxxx – test lower and upper bounds.

� EXE – Last program instruction. Contain information for next blocks.

Page 22: Vm Vhdl Design

Current state

� HW design

Implementation in VHDL

! Behavioral simulations

" Post place and route processing unit simulations

Page 23: Vm Vhdl Design

VHDL design – Replicator

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 24: Vm Vhdl Design

Replicator

# Block that replicates matched data structure to Edit engine input queues

$ Function description

% Get input data structure from Match Engine or SW

& Load appropriate number replication data structure from BlockRAM memory and send them with address of packet into Priority queues

' Increment reference to DRAM allocation block

( Update statistic informations

Page 25: Vm Vhdl Design

Block diagram

Page 26: Vm Vhdl Design

Data structure

Page 27: Vm Vhdl Design

VHDL design – PQ

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 28: Vm Vhdl Design

Priority queues

) Records on every interface are sorting by priority and putting to the Edit Engine.

* Records are temporary stored inside SSRAM memory.

+ Replication on interface

, Why isn't this done by replicator?

- Spend less memory

. Replicator simplification

Page 29: Vm Vhdl Design

Architecture

/ Two memory components

0 16x64b DESC queues description and status

1 Generates SSRAM addresses

2 Helps for actual queue search

3 32x4b ASGN assigns every queue to interface (EE or SW)

4 Provides information for SEARCH block

Q15 Q1 Q0 Q0

Q0

Q1

Q2

Q15

Start WritePtrReadPtrLength

ASGN

DESC

Q2

Edit engine 1

Page 30: Vm Vhdl Design

Block structure

Page 31: Vm Vhdl Design

State diagram

Page 32: Vm Vhdl Design

Design critical points

5 Concurrent searching of actual queue and putting records to Edit Engine. It is necessary synchronization and dual-port memory DESC

6 Different behavior of SW and Edit Engine interface (SW hasn't WB state)

7 Design suitable pipeline

8 Current state – implementation in VHDL

Page 33: Vm Vhdl Design

VHDL design – Edit engine

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 34: Vm Vhdl Design

9 Block that creates the output stream for outgoing packets

: Inserts new L2 header

; Decrements Hop Limit

< Routing Header Options

= Encapsulate/Decapsulate IPv6 packet

> etc ....

Edit engine

Page 35: Vm Vhdl Design

Input data

Page 36: Vm Vhdl Design

Block Diagram

Page 37: Vm Vhdl Design

Instructions

Instructions for data sending and modification

? send data by given size

@ SPDP, SPEP, SAPB, SAPC

A send data to reference position

B SPDU, SPDE

Control instructions

C MARK, NXTO, LDEN, LDOP

Page 38: Vm Vhdl Design

Current state

Page 39: Vm Vhdl Design

VHDL design – DRAM scheduler

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

Addressdecoder

DRAMscheduler

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 40: Vm Vhdl Design

DRAM scheduler

D DDR SDRAM memory control.

E Load and store packets into DDR SDRAM memory.

F Memory divided into fixed length blocks.

G Free blocks control.

H Bootle neck of the design – we need maximal memory speed.

I Three different types of interfaces – HFE, Replicator and Edit engine.

Page 41: Vm Vhdl Design

Block structure

32B 32B

32B 32B

32B 32B

32B 32B

EEinterface

REPinterface

interfaceHFE

Low level DDRSDRAM control

Timeslotscontrol

Addresscontrol

Scheduler_core

BlocRAM

BlocRAM

BlocRAM

BlocRAM

DDR SDRAM memory

HFE1

HFE2

HFE3

HFE4

EE1

EE2

EE3

EE4

Data Control

Page 42: Vm Vhdl Design

Core structure

J Low level SDRAM control

K Communication with SDRAM

L Time slots control

M Time sharing strategy

N Every component has slot

O Address control

P Number of references for every block address.

Scheduler_core

AddressTimeslotscontrol control

Low level DDRSDRAM control

Page 43: Vm Vhdl Design

Low level DDR SDRAM control

Q DRAM control

R Load and store data

S DCM – clock generation and phase shifting

T Command generator

U Memory initialization

V Auto refresh and read/write cycles

W Data path

X Time transforms (data are delayed)

SLow level DDR SDRAM control

data_from_ddr

data_to_ddr

clk_in

read

write

DD

R S

DR

AMdata

command

Data

Command

generator

2x D

cm

path

Page 44: Vm Vhdl Design

Current state

Y DDR SDRAM test implemented

Z HFE interface

[ Implemented, but no simulation

\ Low level scheduler

] Implemented

^ Behavioral simulation

_ Other blocks

` Specified but not implemented

Page 45: Vm Vhdl Design

VHDL design – Address decoder

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

DRAMscheduler

Addressdecoder

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 card

Page 46: Vm Vhdl Design

Address decoder and local bus

a Address decoder

b Hierarchical address space

c chip select for next level

d Local bus

e Connect all blocks to PLX (PCI bus)

f Address and data multiplex – 16 bits

g Long wires –> necessary wait cycles

Page 47: Vm Vhdl Design

Current state of VHDL design

buffer

buffer

buffer

buffer

buffer

buffer

buffer

buffer

Look−upprocessor

Editengine

Editengine

Editengine

Editengine

extractorfield

Header

extractorfield

Header

extractorfield

Header

extractorfield

Header

Addressdecoder

DRAMscheduler

FIFO

FIFO

FIFO

FIFO

PriorityqueueReplicator

PHYTERcontrol

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

To all blocksFrom all blocks

PCI bus

PH

YTE

RP

HY

TER

PH

YTE

RP

HY

TER

Interface card Combo6 cardBlock phase

Page 48: Vm Vhdl Design

Next steps

h Finish all blocks implementation

i Design completion (concurrently)

j Testing functionality and bugs fixing

k Move some blocks to interface card

l Adding new features and improve performance.

Page 49: Vm Vhdl Design

The End

Thank you for your attention