Advanced Buses

  • Upload
    tkazuta

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Advanced Buses

    1/44

    AMBA Multi-layer AHB Enables parallel access paths between multiple masters and

    slaves Fully compatible with AHB wrappers

    It is a topology (not protocol) evolution

    Pure combinational matrix (scales poorly)

    Master1

    Master2

    Slave1Interconnect

    Matrix

    Slave1

    Slave1

    AHB

    AHB

  • 8/8/2019 Advanced Buses

    2/44

    Multi-Layer AHB implementation The matrix is completely flexible and can be adapted

    MUXes are point arbitration stages AHB layer can be AHB-lite: single master, no

    req/grant, no split/retry

  • 8/8/2019 Advanced Buses

    3/44

  • 8/8/2019 Advanced Buses

    4/44

    Hierarchical systems

    Slaves accessed only by masters on a given layer can

    be made local to the layer

  • 8/8/2019 Advanced Buses

    5/44

    Multiple slaves

    Multiple slaves appear a singleslave to the matrix combine low bandwidthslaves group slaves accessed onlyby one master (e.g. DMAcontroller)

    Alternatively, a slave can be anAHB-to-APB bridge, thus

    allowing connection to multiplelow-bandwidth slaves

  • 8/8/2019 Advanced Buses

    6/44

    Multiple masters per layer

    Combine masters that havelow bandwidth requirements

  • 8/8/2019 Advanced Buses

    7/44

    Putting it alltogether

    Interconnect matrix and Slave4are used for across-layercommunication

  • 8/8/2019 Advanced Buses

    8/44

    Dual port slaves

    Common for off-chip SDRAM controllers Master1: bandwidth limited high priority traffic with low latency

    requirements Master2: default traffic

  • 8/8/2019 Advanced Buses

    9/44

    Traffic mismatches

    0

    1000000

    2000000

    3000000

    4000000

    5000000

    6000000

    7000000

    Semaphore No semaphore

    Shared

    Bridging

    MultiLayer

    Independent tasks (matrix

    multiply)

    With & without semaphoresynchronization

    8 processors (small cache)

    more than 2x

    Lower speedup!

    Exec. Time

    Traffic mismatches degrade topology evolution benefits

  • 8/8/2019 Advanced Buses

    10/44

    Crossbars

    Application-level speedup at the cost ofincreased complexity in crossbar logic

    Scales poorly area and delay scale with N2

    Impractical beyond 10x10!

  • 8/8/2019 Advanced Buses

    11/44

    STBus On-chip interconnect solution by STM

    Multiple outstanding transactions with out-of-order completion Type 1-3: increasing complexity (and performance) Supports Packets (request and response) Support for protection, caches, locking

    Deployed in a number of large-scale SoCs in STM

  • 8/8/2019 Advanced Buses

    12/44

    Transaction mappingTransaction

    Req Packet Resp Packet

    Split transaction into

    request and responsepacket pair

    Transaction level

    Cell level

    Packet level

    Break each packet downinto a number of tokensdepending on bus width

    Signal level

    Physical encoding(e.g., req/gnt handshaking to

    transfer a cell)E.g., 32 bits STBus.

    LD8 transaction1 request packet, 1 response packet1 re uest cell 2 res onse cells

  • 8/8/2019 Advanced Buses

    13/44

    Type 1-2-3

    Equivalent toAHB

    functionality

  • 8/8/2019 Advanced Buses

    14/44

    Topology Shared Bus

    Low performance, low cost

  • 8/8/2019 Advanced Buses

    15/44

    Topology Full Crossbar

    High performance, high wiring complexity and cost

  • 8/8/2019 Advanced Buses

    16/44

    Read on STbus

  • 8/8/2019 Advanced Buses

    17/44

    Analysis: Protocol differences

    AMBA

    STBUS

    P t l t hi

  • 8/8/2019 Advanced Buses

    18/44

    Protocol matching

    STBus node

    Upsize converter

    STBus at work Downsize converterFreq. converter

    LMI

    LXIPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IPTG

    IP 1

    IP 3

    IP 5

    IP 2

    IP 3

    Type 2 128 Bit

    T2166Mhz

    64 BitT3

    166MHz

    64 BitT3

    250MHz

    VLIW

    Type3

    Off-chip

    Mem. Ctrl

  • 8/8/2019 Advanced Buses

    19/44

    Critical overview

    Protocol is not fully transaction-centric Cannot connect initiator to target directly

    Packets are atomic on the interconnect Cannot initiate nor receive multiple packets at the same time

    Large data transfers may starve other initiators

    Complex bridge engineering Bridges are protocol specific

  • 8/8/2019 Advanced Buses

    20/44

    AMBA 3.0 (AMBA AXI)

    High bandwidth low latency designs High frequency operation Flexibility in the implementation

    Backward compatible with AHB and APB

    Burst-based transactions with only first address issued

    Address information can be issues ahead of actual data transfer Multiple outstanding addresses Out-of-order transaction completion easy addition of register stages for timing closure

  • 8/8/2019 Advanced Buses

    21/44

    Topology Partial Crossbar

  • 8/8/2019 Advanced Buses

    22/44

    Design paradigm change

    Master

    Slave

    Master

    Slave

    Initiator

    Communicationarchitecture

    AXI AXI Target

    Point-to-point interface specification Independent of the details of the communication architecture Communication architecture can freely evolve

    Transaction-based specification of the interface Open Core Protocol (OCP) is another example of this paradigm

  • 8/8/2019 Advanced Buses

    23/44

    Internal data lanesM

    aster

    Slave

    AXI

    crossbar

    M

    aster

    Slave

    M

    aster

    S

    lave

    AXI

    shared

    bus

    M

    aster

    S

    lave

    Most systems use one of three interconnect approaches:-shared address and data buses-Shared address buses and multiple data buses-Multilayer, with multiple address and data buses

  • 8/8/2019 Advanced Buses

    24/44

    Channel-based Architecture Five groups of signals

    Read Address AR signal name prefix Read Data R signal name prefix Write Address AW signal name prefix Write Data W signal name prefix Write Response B signal name prefix

    R. ADDRESS

    READ DATA

    WRITE DATA

    RESPONSE

    W. ADDRESS

    Channels are independent and asynchronous wrt each other

  • 8/8/2019 Advanced Buses

    25/44

    Read transaction

    Single address for burst transfers

  • 8/8/2019 Advanced Buses

    26/44

    Write transaction

  • 8/8/2019 Advanced Buses

    27/44

    Channels - One way flowAWVALID RVALID BVALIDWVALID

    WLAST RLASTAWDDR BRESP

    WDATA RDATA BIDAWLEN

    BREADYWSTRB RRESPAWSIZE

    WID RIDAWBURSTWREADY RREADYAWLOCK

    AWPROT

    AWCACHE

    Channel:a set of unidirectional information

    signals Valid/Readyhandshake mechanism

    READYis the only return signal

    Valid:source IF has valid data/control signals Ready:destination IF is ready to accept data

    Last:indicates last word of a burst transaction

    AWIDAWREADY

  • 8/8/2019 Advanced Buses

    28/44

    Burst support

    Variable-length bursts, from 1 to 16 data transfers per burst Bursts with a transfer size of 8-1024 bits Wrapping, incrementing and non-incrementing bursts Atomic operations, using locked accesses

  • 8/8/2019 Advanced Buses

    29/44

    AMBA 2.0 AHB Burst

    A21 A22 A23A11 A12 A13 A14

    D21 D22 D23D11 D12 D13 D14

    D31

    D31

    ADDRESS

    DATA

    AHB Burst

    Address and Data are locked together

    Two pipeline stages

    HREADY controls pipeline operation

  • 8/8/2019 Advanced Buses

    30/44

    AXI - One Address for Burst

    A21A11

    D21 D22 D23D11 D12 D13 D14

    D31

    D31

    ADDRESS

    DATA

    AXI Burst

    One Address for entire burst

    O d

  • 8/8/2019 Advanced Buses

    31/44

    AXI - Outstanding

    Transactions

    A21A11

    D21 D22 D23D11 D12 D13 D14

    D31

    D31

    ADDRESS

    DATA

    AXI Burst

    One Address for entire burst

    Allows multiple outstanding addresses

    P bl

  • 8/8/2019 Advanced Buses

    32/44

    Problem:

    Slow slave

    A21A11

    D11 D12

    A31ADDRESS

    DATA

    If one slave is very slow, all data is held

    up.

  • 8/8/2019 Advanced Buses

    33/44

    Out-of-Order CompletionA21A11

    D21 D22 D23 D11 D12 D13 D14

    D31

    D31

    ADDRESS

    DATA

    Out of order completion allowed

    Fast slaves may return data ahead of slow slaves

    Each transaction has an ID attached (given by the master IF) Channels have ID signals - AID, RID, etc.

    Transactions with the same ID must be ordered

    The interconnect in a multi-master system must append

    another tag to ID to make each masters ID unique

  • 8/8/2019 Advanced Buses

    34/44

    AXI - Data Interleaving

    A21A11 D31ADDRESS

    D21 D22 D23D11 D12 D13D31 D14DATA

    Returned data can even be interleaved

    Gives maximum use of data bus

    Note - Data within a burst is always in order

  • 8/8/2019 Advanced Buses

    35/44

    Burst readValid high until ready high

    The valid-ready handshake regulates data transfer

  • 8/8/2019 Advanced Buses

    36/44

    Overlapping burst readAddress of second burst anticipated

  • 8/8/2019 Advanced Buses

    37/44

    Burst write

    R i t li f

  • 8/8/2019 Advanced Buses

    38/44

    Register slices for max

    frequency

    Channels areasynchronous

    Register slices can

    be applied acrossany channel

    Allows maximumfrequency of operationby changing delay into latency

    Allows system topology to be matched toperformance requirements

    WREADY

    WID

    WDATA

    WSTRB

    WLAST

    WVALID

  • 8/8/2019 Advanced Buses

    39/44

    Comparison

    Memorie settate con 2 wait states

    AHB

    STBUS low buf

    STBUS high buf

    AXI

    Impossibile nasconderelatenza dellarbitraggio e

    della risposta degli slave

    Il complesso arbitraggio

    attua un interleavingdelle transazioni

    Inizia una nuova richiestamentre si processa ancora la

    risposta

    Vengono iniziate pi richiestementre si processano le

    risposte

  • 8/8/2019 Advanced Buses

    40/44

    Scalability

    Highly parallel benchmark (no slave bottlenecks)

    AHB AXI STBus STBus (B)

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    110%

    2 Cores

    4 Cores

    6 Cores

    8 Cores

    Relativeexecution

    time

    AHB AXI STBus STBus (B)

    0%10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    110%120%

    130%

    140%

    150%

    160%

    170%

    180%

    2 Cores

    4 Cores

    6 Cores

    8 Cores

    Relativeexecution

    time

    256 B cache (high

    bus traffic) 1 kB cache (low bus

    traffic)

  • 8/8/2019 Advanced Buses

    41/44

    Scalability

    AHB AXI STBus STBu s (B)

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    2 Cores

    4 Cores

    6 Cores

    8 Cores

    Interconn

    ectusageefficiency

    AHB AXI STBus STBus (B)

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    2 Cores

    4 Cores

    6 Cores8 Cores

    Inter

    connectbusy

    Increasing contention: AXI, STBus show 80%+efficiency, AHB < 50%

    Saturation of shared bus architectures

  • 8/8/2019 Advanced Buses

    42/44

    Networks-on-Chip (NoCs)Same paradigm of Wide Area Networks and

    of large scale multi-processors

    IP coremaster

    NI

    NIIP coreslave

    switch

    IP coremaster

    NI

    IP coremaster

    NI

    NIIP coreslave

    NIIP coreslave

    switch

    switch

    switch

    NoC

    IP coremaster

    NI

    NIIP coreslave

    switch

    IP coremaster

    NI

    IP coremaster

    NI

    NIIP coreslave

    NIIP coreslave

    switch

    switch

    switch

    NoC

    IP coremaster

    NIIP coremaster

    NI

    NIIP coreslave NIIP coreslave

    switch

    IP coremaster

    NIIP coremaster

    NI

    IP coremaster

    NIIP coremaster

    NI

    NIIP coreslave

    NIIP coreslave

    NIIP coreslaveNI

    IP coreslave

    switch

    switch

    switch

    NoCPAYLOAD HEADERTAIL

    Packet

    FLITFLITFLITFLIT

    Clean separationat session layer

    Core issues end-to-end

    transactionsNetwork deals withlower level issues

    Modularity at HW level

    Only 2 building blocks:

    network interface

    switch

    Physical design aware

    Path segmentation

    Regular routing

  • 8/8/2019 Advanced Buses

    43/44

    Shared buses vs NoCsNoCs Pros.

    - Each integrated IP core adds bus load capacitance

    + Only point-to-point one-way links are used

    - Bus timing problems in deep sub-micron designs

    + Better suited for GALS paradigm

    - Arbiter delay grows with no of masters. Instance-specific arbiter+ Distributed routing decisions. Reinstantiable switches

    - Bus bandwidth is shared among all masters+ Bus bandwidth scales with network dimension

  • 8/8/2019 Advanced Buses

    44/44

    Shared buses vs NoCsNoCs Cons.

    + After bus is granted, bus access latency is null

    - Unpredictable latency due to network congestion problems

    + Very low silicon cost

    - High area cost

    + Simple bus-IP core interface

    - Network-IP core interface can be very complex (e.g. packetization,..)

    + Design guidelines are well known- New design paradigm