Advanced Buses

8/8/2019 Advanced Buses

1/44

AMBA Multi-layer AHB Enables parallel access paths between multiple masters and

slaves Fully compatible with AHB wrappers

It is a topology (not protocol) evolution

Pure combinational matrix (scales poorly)

Master1

Master2

Slave1Interconnect

Matrix

Slave1

Slave1

AHB

AHB


2/44

Multi-Layer AHB implementation The matrix is completely flexible and can be adapted

MUXes are point arbitration stages AHB layer can be AHB-lite: single master, no

req/grant, no split/retry


3/44


4/44

Hierarchical systems

Slaves accessed only by masters on a given layer can

be made local to the layer


5/44

Multiple slaves

Multiple slaves appear a singleslave to the matrix combine low bandwidthslaves group slaves accessed onlyby one master (e.g. DMAcontroller)

Alternatively, a slave can be anAHB-to-APB bridge, thus

allowing connection to multiplelow-bandwidth slaves


6/44

Multiple masters per layer

Combine masters that havelow bandwidth requirements


7/44

Putting it alltogether

Interconnect matrix and Slave4are used for across-layercommunication


8/44

Dual port slaves

Common for off-chip SDRAM controllers Master1: bandwidth limited high priority traffic with low latency

requirements Master2: default traffic


9/44

Traffic mismatches

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

Semaphore No semaphore

Shared

Bridging

MultiLayer

Independent tasks (matrix

multiply)

With & without semaphoresynchronization

8 processors (small cache)

more than 2x

Lower speedup!

Exec. Time

Traffic mismatches degrade topology evolution benefits


10/44

Crossbars

Application-level speedup at the cost ofincreased complexity in crossbar logic

Scales poorly area and delay scale with N2

Impractical beyond 10x10!


11/44

STBus On-chip interconnect solution by STM

Multiple outstanding transactions with out-of-order completion Type 1-3: increasing complexity (and performance) Supports Packets (request and response) Support for protection, caches, locking

Deployed in a number of large-scale SoCs in STM


12/44

Transaction mappingTransaction

Req Packet Resp Packet

Split transaction into

request and responsepacket pair

Transaction level

Cell level

Packet level

Break each packet downinto a number of tokensdepending on bus width

Signal level

Physical encoding(e.g., req/gnt handshaking to

transfer a cell)E.g., 32 bits STBus.

LD8 transaction1 request packet, 1 response packet1 re uest cell 2 res onse cells


13/44

Type 1-2-3

Equivalent toAHB

functionality


14/44

Topology Shared Bus

Low performance, low cost


15/44

Topology Full Crossbar

High performance, high wiring complexity and cost


16/44

Read on STbus


17/44

Analysis: Protocol differences

AMBA

STBUS

P t l t hi


18/44

Protocol matching

STBus node

Upsize converter

STBus at work Downsize converterFreq. converter

LMI

LXIPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IPTG

IP 1

IP 3

IP 5

IP 2

IP 3

Type 2 128 Bit

T2166Mhz

64 BitT3

166MHz

64 BitT3

250MHz

VLIW

Type3

Off-chip

Mem. Ctrl


19/44

Critical overview

Protocol is not fully transaction-centric Cannot connect initiator to target directly

Packets are atomic on the interconnect Cannot initiate nor receive multiple packets at the same time

Large data transfers may starve other initiators

Complex bridge engineering Bridges are protocol specific


20/44

AMBA 3.0 (AMBA AXI)

High bandwidth low latency designs High frequency operation Flexibility in the implementation

Backward compatible with AHB and APB

Burst-based transactions with only first address issued

Address information can be issues ahead of actual data transfer Multiple outstanding addresses Out-of-order transaction completion easy addition of register stages for timing closure


21/44

Topology Partial Crossbar


22/44

Design paradigm change

Master

Slave

Master

Slave

Initiator

Communicationarchitecture

AXI AXI Target

Point-to-point interface specification Independent of the details of the communication architecture Communication architecture can freely evolve

Transaction-based specification of the interface Open Core Protocol (OCP) is another example of this paradigm


23/44

Internal data lanesM

aster

Slave

AXI

crossbar

M

aster

Slave

M

aster

S

lave

AXI

shared

bus

M

aster

S

lave

Most systems use one of three interconnect approaches:-shared address and data buses-Shared address buses and multiple data buses-Multilayer, with multiple address and data buses


24/44

Channel-based Architecture Five groups of signals

Read Address AR signal name prefix Read Data R signal name prefix Write Address AW signal name prefix Write Data W signal name prefix Write Response B signal name prefix

R. ADDRESS

READ DATA

WRITE DATA

RESPONSE

W. ADDRESS

Channels are independent and asynchronous wrt each other


25/44

Read transaction

Single address for burst transfers


26/44

Write transaction


27/44

Channels - One way flowAWVALID RVALID BVALIDWVALID

WLAST RLASTAWDDR BRESP

WDATA RDATA BIDAWLEN

BREADYWSTRB RRESPAWSIZE

WID RIDAWBURSTWREADY RREADYAWLOCK

AWPROT

AWCACHE

Channel:a set of unidirectional information

signals Valid/Readyhandshake mechanism

READYis the only return signal

Valid:source IF has valid data/control signals Ready:destination IF is ready to accept data

Last:indicates last word of a burst transaction

AWIDAWREADY


28/44

Burst support

Variable-length bursts, from 1 to 16 data transfers per burst Bursts with a transfer size of 8-1024 bits Wrapping, incrementing and non-incrementing bursts Atomic operations, using locked accesses


29/44

AMBA 2.0 AHB Burst

A21 A22 A23A11 A12 A13 A14

D21 D22 D23D11 D12 D13 D14

D31

D31

ADDRESS

DATA

AHB Burst

Address and Data are locked together

Two pipeline stages

HREADY controls pipeline operation


30/44

AXI - One Address for Burst

A21A11

D21 D22 D23D11 D12 D13 D14

D31

D31

ADDRESS

DATA

AXI Burst

One Address for entire burst

O d


31/44

AXI - Outstanding

Transactions

A21A11

D21 D22 D23D11 D12 D13 D14

D31

D31

ADDRESS

DATA

AXI Burst

One Address for entire burst

Allows multiple outstanding addresses

P bl


32/44

Problem:

Slow slave

A21A11

D11 D12

A31ADDRESS

DATA

If one slave is very slow, all data is held

up.


33/44

Out-of-Order CompletionA21A11

D21 D22 D23 D11 D12 D13 D14

D31

D31

ADDRESS

DATA

Out of order completion allowed

Fast slaves may return data ahead of slow slaves

Each transaction has an ID attached (given by the master IF) Channels have ID signals - AID, RID, etc.

Transactions with the same ID must be ordered

The interconnect in a multi-master system must append

another tag to ID to make each masters ID unique


34/44

AXI - Data Interleaving

A21A11 D31ADDRESS

D21 D22 D23D11 D12 D13D31 D14DATA

Returned data can even be interleaved

Gives maximum use of data bus

Note - Data within a burst is always in order


35/44

Burst readValid high until ready high

The valid-ready handshake regulates data transfer


36/44

Overlapping burst readAddress of second burst anticipated


37/44

Burst write

R i t li f


38/44

Register slices for max

frequency

Channels areasynchronous

Register slices can

be applied acrossany channel

Allows maximumfrequency of operationby changing delay into latency

Allows system topology to be matched toperformance requirements

WREADY

WID

WDATA

WSTRB

WLAST

WVALID


39/44

Comparison

Memorie settate con 2 wait states

AHB

STBUS low buf

STBUS high buf

AXI

Impossibile nasconderelatenza dellarbitraggio e

della risposta degli slave

Il complesso arbitraggio

attua un interleavingdelle transazioni

Inizia una nuova richiestamentre si processa ancora la

risposta

Vengono iniziate pi richiestementre si processano le

risposte


40/44

Scalability

Highly parallel benchmark (no slave bottlenecks)

AHB AXI STBus STBus (B)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%

2 Cores

4 Cores

6 Cores

8 Cores

Relativeexecution

time


0%10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

110%120%

130%

140%

150%

160%

170%

180%

2 Cores

4 Cores

6 Cores

8 Cores

Relativeexecution

time

256 B cache (high

bus traffic) 1 kB cache (low bus

traffic)


41/44

Scalability

AHB AXI STBus STBu s (B)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 Cores

4 Cores

6 Cores

8 Cores

Interconn

ectusageefficiency


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2 Cores

4 Cores

6 Cores8 Cores

Inter

connectbusy

Increasing contention: AXI, STBus show 80%+efficiency, AHB < 50%

Saturation of shared bus architectures


42/44

Networks-on-Chip (NoCs)Same paradigm of Wide Area Networks and

of large scale multi-processors

IP coremaster

NI

NIIP coreslave

switch

IP coremaster

NI

IP coremaster

NI

NIIP coreslave

NIIP coreslave

switch

switch

switch

NoC

IP coremaster

NI

NIIP coreslave

switch

IP coremaster

NI

IP coremaster

NI

NIIP coreslave

NIIP coreslave

switch

switch

switch

NoC

IP coremaster

NIIP coremaster

NI

NIIP coreslave NIIP coreslave

switch

IP coremaster

NIIP coremaster

NI

IP coremaster

NIIP coremaster

NI

NIIP coreslave

NIIP coreslave

NIIP coreslaveNI

IP coreslave

switch

switch

switch

NoCPAYLOAD HEADERTAIL

Packet

FLITFLITFLITFLIT

Clean separationat session layer

Core issues end-to-end

transactionsNetwork deals withlower level issues

Modularity at HW level

Only 2 building blocks:

network interface

switch

Physical design aware

Path segmentation

Regular routing


43/44

Shared buses vs NoCsNoCs Pros.

- Each integrated IP core adds bus load capacitance

+ Only point-to-point one-way links are used

- Bus timing problems in deep sub-micron designs

+ Better suited for GALS paradigm

- Arbiter delay grows with no of masters. Instance-specific arbiter+ Distributed routing decisions. Reinstantiable switches

- Bus bandwidth is shared among all masters+ Bus bandwidth scales with network dimension


44/44

Shared buses vs NoCsNoCs Cons.

+ After bus is granted, bus access latency is null

- Unpredictable latency due to network congestion problems

+ Very low silicon cost

- High area cost

+ Simple bus-IP core interface

- Network-IP core interface can be very complex (e.g. packetization,..)

+ Design guidelines are well known- New design paradigm

Documents

Advanced Buses