Upload
tkazuta
View
223
Download
0
Embed Size (px)
Citation preview
8/8/2019 Advanced Buses
1/44
AMBA Multi-layer AHB Enables parallel access paths between multiple masters and
slaves Fully compatible with AHB wrappers
It is a topology (not protocol) evolution
Pure combinational matrix (scales poorly)
Master1
Master2
Slave1Interconnect
Matrix
Slave1
Slave1
AHB
AHB
8/8/2019 Advanced Buses
2/44
Multi-Layer AHB implementation The matrix is completely flexible and can be adapted
MUXes are point arbitration stages AHB layer can be AHB-lite: single master, no
req/grant, no split/retry
8/8/2019 Advanced Buses
3/44
8/8/2019 Advanced Buses
4/44
Hierarchical systems
Slaves accessed only by masters on a given layer can
be made local to the layer
8/8/2019 Advanced Buses
5/44
Multiple slaves
Multiple slaves appear a singleslave to the matrix combine low bandwidthslaves group slaves accessed onlyby one master (e.g. DMAcontroller)
Alternatively, a slave can be anAHB-to-APB bridge, thus
allowing connection to multiplelow-bandwidth slaves
8/8/2019 Advanced Buses
6/44
Multiple masters per layer
Combine masters that havelow bandwidth requirements
8/8/2019 Advanced Buses
7/44
Putting it alltogether
Interconnect matrix and Slave4are used for across-layercommunication
8/8/2019 Advanced Buses
8/44
Dual port slaves
Common for off-chip SDRAM controllers Master1: bandwidth limited high priority traffic with low latency
requirements Master2: default traffic
8/8/2019 Advanced Buses
9/44
Traffic mismatches
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Semaphore No semaphore
Shared
Bridging
MultiLayer
Independent tasks (matrix
multiply)
With & without semaphoresynchronization
8 processors (small cache)
more than 2x
Lower speedup!
Exec. Time
Traffic mismatches degrade topology evolution benefits
8/8/2019 Advanced Buses
10/44
Crossbars
Application-level speedup at the cost ofincreased complexity in crossbar logic
Scales poorly area and delay scale with N2
Impractical beyond 10x10!
8/8/2019 Advanced Buses
11/44
STBus On-chip interconnect solution by STM
Multiple outstanding transactions with out-of-order completion Type 1-3: increasing complexity (and performance) Supports Packets (request and response) Support for protection, caches, locking
Deployed in a number of large-scale SoCs in STM
8/8/2019 Advanced Buses
12/44
Transaction mappingTransaction
Req Packet Resp Packet
Split transaction into
request and responsepacket pair
Transaction level
Cell level
Packet level
Break each packet downinto a number of tokensdepending on bus width
Signal level
Physical encoding(e.g., req/gnt handshaking to
transfer a cell)E.g., 32 bits STBus.
LD8 transaction1 request packet, 1 response packet1 re uest cell 2 res onse cells
8/8/2019 Advanced Buses
13/44
Type 1-2-3
Equivalent toAHB
functionality
8/8/2019 Advanced Buses
14/44
Topology Shared Bus
Low performance, low cost
8/8/2019 Advanced Buses
15/44
Topology Full Crossbar
High performance, high wiring complexity and cost
8/8/2019 Advanced Buses
16/44
Read on STbus
8/8/2019 Advanced Buses
17/44
Analysis: Protocol differences
AMBA
STBUS
P t l t hi
8/8/2019 Advanced Buses
18/44
Protocol matching
STBus node
Upsize converter
STBus at work Downsize converterFreq. converter
LMI
LXIPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IPTG
IP 1
IP 3
IP 5
IP 2
IP 3
Type 2 128 Bit
T2166Mhz
64 BitT3
166MHz
64 BitT3
250MHz
VLIW
Type3
Off-chip
Mem. Ctrl
8/8/2019 Advanced Buses
19/44
Critical overview
Protocol is not fully transaction-centric Cannot connect initiator to target directly
Packets are atomic on the interconnect Cannot initiate nor receive multiple packets at the same time
Large data transfers may starve other initiators
Complex bridge engineering Bridges are protocol specific
8/8/2019 Advanced Buses
20/44
AMBA 3.0 (AMBA AXI)
High bandwidth low latency designs High frequency operation Flexibility in the implementation
Backward compatible with AHB and APB
Burst-based transactions with only first address issued
Address information can be issues ahead of actual data transfer Multiple outstanding addresses Out-of-order transaction completion easy addition of register stages for timing closure
8/8/2019 Advanced Buses
21/44
Topology Partial Crossbar
8/8/2019 Advanced Buses
22/44
Design paradigm change
Master
Slave
Master
Slave
Initiator
Communicationarchitecture
AXI AXI Target
Point-to-point interface specification Independent of the details of the communication architecture Communication architecture can freely evolve
Transaction-based specification of the interface Open Core Protocol (OCP) is another example of this paradigm
8/8/2019 Advanced Buses
23/44
Internal data lanesM
aster
Slave
AXI
crossbar
M
aster
Slave
M
aster
S
lave
AXI
shared
bus
M
aster
S
lave
Most systems use one of three interconnect approaches:-shared address and data buses-Shared address buses and multiple data buses-Multilayer, with multiple address and data buses
8/8/2019 Advanced Buses
24/44
Channel-based Architecture Five groups of signals
Read Address AR signal name prefix Read Data R signal name prefix Write Address AW signal name prefix Write Data W signal name prefix Write Response B signal name prefix
R. ADDRESS
READ DATA
WRITE DATA
RESPONSE
W. ADDRESS
Channels are independent and asynchronous wrt each other
8/8/2019 Advanced Buses
25/44
Read transaction
Single address for burst transfers
8/8/2019 Advanced Buses
26/44
Write transaction
8/8/2019 Advanced Buses
27/44
Channels - One way flowAWVALID RVALID BVALIDWVALID
WLAST RLASTAWDDR BRESP
WDATA RDATA BIDAWLEN
BREADYWSTRB RRESPAWSIZE
WID RIDAWBURSTWREADY RREADYAWLOCK
AWPROT
AWCACHE
Channel:a set of unidirectional information
signals Valid/Readyhandshake mechanism
READYis the only return signal
Valid:source IF has valid data/control signals Ready:destination IF is ready to accept data
Last:indicates last word of a burst transaction
AWIDAWREADY
8/8/2019 Advanced Buses
28/44
Burst support
Variable-length bursts, from 1 to 16 data transfers per burst Bursts with a transfer size of 8-1024 bits Wrapping, incrementing and non-incrementing bursts Atomic operations, using locked accesses
8/8/2019 Advanced Buses
29/44
AMBA 2.0 AHB Burst
A21 A22 A23A11 A12 A13 A14
D21 D22 D23D11 D12 D13 D14
D31
D31
ADDRESS
DATA
AHB Burst
Address and Data are locked together
Two pipeline stages
HREADY controls pipeline operation
8/8/2019 Advanced Buses
30/44
AXI - One Address for Burst
A21A11
D21 D22 D23D11 D12 D13 D14
D31
D31
ADDRESS
DATA
AXI Burst
One Address for entire burst
O d
8/8/2019 Advanced Buses
31/44
AXI - Outstanding
Transactions
A21A11
D21 D22 D23D11 D12 D13 D14
D31
D31
ADDRESS
DATA
AXI Burst
One Address for entire burst
Allows multiple outstanding addresses
P bl
8/8/2019 Advanced Buses
32/44
Problem:
Slow slave
A21A11
D11 D12
A31ADDRESS
DATA
If one slave is very slow, all data is held
up.
8/8/2019 Advanced Buses
33/44
Out-of-Order CompletionA21A11
D21 D22 D23 D11 D12 D13 D14
D31
D31
ADDRESS
DATA
Out of order completion allowed
Fast slaves may return data ahead of slow slaves
Each transaction has an ID attached (given by the master IF) Channels have ID signals - AID, RID, etc.
Transactions with the same ID must be ordered
The interconnect in a multi-master system must append
another tag to ID to make each masters ID unique
8/8/2019 Advanced Buses
34/44
AXI - Data Interleaving
A21A11 D31ADDRESS
D21 D22 D23D11 D12 D13D31 D14DATA
Returned data can even be interleaved
Gives maximum use of data bus
Note - Data within a burst is always in order
8/8/2019 Advanced Buses
35/44
Burst readValid high until ready high
The valid-ready handshake regulates data transfer
8/8/2019 Advanced Buses
36/44
Overlapping burst readAddress of second burst anticipated
8/8/2019 Advanced Buses
37/44
Burst write
R i t li f
8/8/2019 Advanced Buses
38/44
Register slices for max
frequency
Channels areasynchronous
Register slices can
be applied acrossany channel
Allows maximumfrequency of operationby changing delay into latency
Allows system topology to be matched toperformance requirements
WREADY
WID
WDATA
WSTRB
WLAST
WVALID
8/8/2019 Advanced Buses
39/44
Comparison
Memorie settate con 2 wait states
AHB
STBUS low buf
STBUS high buf
AXI
Impossibile nasconderelatenza dellarbitraggio e
della risposta degli slave
Il complesso arbitraggio
attua un interleavingdelle transazioni
Inizia una nuova richiestamentre si processa ancora la
risposta
Vengono iniziate pi richiestementre si processano le
risposte
8/8/2019 Advanced Buses
40/44
Scalability
Highly parallel benchmark (no slave bottlenecks)
AHB AXI STBus STBus (B)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%
2 Cores
4 Cores
6 Cores
8 Cores
Relativeexecution
time
AHB AXI STBus STBus (B)
0%10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%120%
130%
140%
150%
160%
170%
180%
2 Cores
4 Cores
6 Cores
8 Cores
Relativeexecution
time
256 B cache (high
bus traffic) 1 kB cache (low bus
traffic)
8/8/2019 Advanced Buses
41/44
Scalability
AHB AXI STBus STBu s (B)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2 Cores
4 Cores
6 Cores
8 Cores
Interconn
ectusageefficiency
AHB AXI STBus STBus (B)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2 Cores
4 Cores
6 Cores8 Cores
Inter
connectbusy
Increasing contention: AXI, STBus show 80%+efficiency, AHB < 50%
Saturation of shared bus architectures
8/8/2019 Advanced Buses
42/44
Networks-on-Chip (NoCs)Same paradigm of Wide Area Networks and
of large scale multi-processors
IP coremaster
NI
NIIP coreslave
switch
IP coremaster
NI
IP coremaster
NI
NIIP coreslave
NIIP coreslave
switch
switch
switch
NoC
IP coremaster
NI
NIIP coreslave
switch
IP coremaster
NI
IP coremaster
NI
NIIP coreslave
NIIP coreslave
switch
switch
switch
NoC
IP coremaster
NIIP coremaster
NI
NIIP coreslave NIIP coreslave
switch
IP coremaster
NIIP coremaster
NI
IP coremaster
NIIP coremaster
NI
NIIP coreslave
NIIP coreslave
NIIP coreslaveNI
IP coreslave
switch
switch
switch
NoCPAYLOAD HEADERTAIL
Packet
FLITFLITFLITFLIT
Clean separationat session layer
Core issues end-to-end
transactionsNetwork deals withlower level issues
Modularity at HW level
Only 2 building blocks:
network interface
switch
Physical design aware
Path segmentation
Regular routing
8/8/2019 Advanced Buses
43/44
Shared buses vs NoCsNoCs Pros.
- Each integrated IP core adds bus load capacitance
+ Only point-to-point one-way links are used
- Bus timing problems in deep sub-micron designs
+ Better suited for GALS paradigm
- Arbiter delay grows with no of masters. Instance-specific arbiter+ Distributed routing decisions. Reinstantiable switches
- Bus bandwidth is shared among all masters+ Bus bandwidth scales with network dimension
8/8/2019 Advanced Buses
44/44
Shared buses vs NoCsNoCs Cons.
+ After bus is granted, bus access latency is null
- Unpredictable latency due to network congestion problems
+ Very low silicon cost
- High area cost
+ Simple bus-IP core interface
- Network-IP core interface can be very complex (e.g. packetization,..)
+ Design guidelines are well known- New design paradigm