23
Using Clos Switches in Area Efficient Asynchronous SDM Routers Wei Song and Doug Edwards Advanced Processor Technologies Group The University of Manchester

Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

  • Upload
    vandat

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Using Clos Switches in Area Efficient

Asynchronous SDM Routers

Wei Song and Doug EdwardsAdvanced Processor Technologies Group

The University of Manchester

Page 2: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Contents

• Asynchronous Network-on-Chip

• Spatial division multiplexing (SDM)

• 2-stage Clos switch

– Motivation

– Clos

– 2-stage Clos switch

• SDM router using Clos switches

– Structure

– Performance

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

2

Page 3: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Asynchronous Network-on-Chip

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

3

ProcessorCache/Memory

IPs/Function Blocks

Router

Processing Element

PE

Router

PE

Router

PE

Router

PE

Router

PE

Router

Network

Interface

Network-on-Chip (NoC) or on-chip network is the state-of-the-art on-chip communication structure for multiprocessor systems.

GALS:Globally asynchronous and locally synchronous

Page 4: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Basic Router Structure

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

4

Buffer

Buffer

Buffer

SwitchAllocator

Ain

Bin

Cin

Aout

Bout

Cout

HA DA DA TA

HB DB DB TB

HA DA DA TA HB DB DB TBtime

1 2 3 4 5 6 7 8 9

Ain

Bin

Cout

Flow control:The algorithm used to allocate resources in a router to multiple frames.

Wormhole:Frames are divided into flits. The header flit contains the target addr. and it is used to reserve a path. Other flits just follow the path.

Page 5: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Head-of-Line (HOL)

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

5

N N

WW E

SS

E

R1 R2

A

B

Frame AFrame BFrame C

C

The spread of blocking:Assuming frame B is blocked by frame A, frame B may also block frame C; however, C is not directly related to A.

Page 6: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Spatial Division Multiplexing

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

6

SwitchAllocator

Input ports

Output ports

A virtual circuit

The output buffer ofa virtual circuitThe input buffer of

a virtual circuit

Page 7: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

SDM vs. Wormhole

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

7

0 50 100 150 200 250 300 3500

100

200

300

400

500

600

700

800

900

1000

Avg

. F

ram

e L

ate

ncy (

ns)

Injected Traffic(MByte/Node/s)

Wormhole

SDM

WH SDM

Input Buf. 14,303 21,995

Output Buf. 5,935 6,000

Crossbar 4,356 21,744

Arbiters 772 22,208

Overall 25,366 71,956

The area of crossbarsWH:

P*P*WSDM:

MP*MP*W/M = MP*P*W

[7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing router,”

Microprocessors and Microsystems, 35(2), 85-97, 2011

Page 8: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Clos Switch - Motivation

• The problems of SDM

– High-radix crossbars

– Large crossbar and switch allocator

• Clos networks are the optimal switch

structure

• Problems to solve

– Dynamic configuration [11]

– Optimal structure for SDM router (this paper)

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

8

Page 9: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Clos Networks

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

9

n×m k×k m×n

n×m

n×m

IM(1)

IM(i)

IM(k)

CM(1)

k×k

CM(r)

k×k

CM(m)

OM(1)

m×n

OM(j)

m×n

OM(k)

IP(1,1)

IP(1,n)

IP(k,1)

IP(k,n)

IP(i,1)

IP(i,n)

IP(i,h)

OP(1,1)

OP(1,n)

OP(k,1)

OP(k,n)

OP(j,1)

OP(j,n)

OP(j,h)

LI(1,1)

LI(k,m)

LO(1,1)

LO(m,k)

IP/OP: input/output portIM: input moduleCM: central moduleOM: output modulen: number of IPs in IMk: number of IMsm: number of CMsN = kn: the total number of IPs

When m >= n, the switch is no-blocking

Page 10: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Clos vs. Crossbar

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

10

Page 11: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Asynchronous Clos Scheduler

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

11

IRG

IRG

IRG

IRG

OMRICB

CMRICB

IMD

OMRCCB

CMDimcfg

cmcfg

IMSCHi CMSCHr

CMSCH4IMSCH8

IMSCH1 CMSCH1

OMSCH8

OMSCHj

OMSCH1req1,1

req1,2

req1,3

req1,4

reqi,1reqi,2reqi,3reqi,4

req8,1req8,2req8,3req8,4

[11] W. Song and D. Edwards. “An asynchronous routing algorithm for Clos

networks,” In Proc. of International Conference on Application of Concurrency to

System Design, 2010, 67-76

Page 12: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Async vs. Sync Algorithm

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

12

0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75

0.34

0.36

0.38

0.40

0.42

0.44

0.46

0.48

0.50

0.52T

hro

ug

hp

ut

Injected Traffic

Synchronous Scheduler

Asynchronous Scheduler

[11] W. Song and D. Edwards. “An asynchronous routing algorithm for Clos

networks,” In Proc. of International Conference on Application of Concurrency to

System Design, 2010, 67-76

Page 13: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

2-stage Clos Switch

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

13

M×M5×5

IM(s) CM(1)

5×5

CM(r)

5×5

CM(M)

IP(s,1)

IP(s,M)

OP(s,1)

OP(s,M)

M×M

IM(w)IP(w,1)

IP(w,M)

M×M

IM(n)IP(n,1)

IP(n,M)

M×M

IM(e)IP(e,1)

IP(e,M)

M×M

IM(l)IP(l,1)

IP(l,M)

OP(w,1)

OP(w,M)

OP(n,1)

OP(n,M)

OP(e,1)

OP(e,M)

OP(l,1)

OP(l,M)

Benefits:1. 2-stage1.a less latency1.b smaller1.c simpler scheduler2. CMs can be further

simplified

Page 14: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Simplified CM

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

14

Si

Wi

Ni

Ei

Li

So Wo No Eo Lo

cfg EL

cfg NL

cfg WL

cfg SL

cfg LE

cfg WE

cfg WN

cfg SN

cfg EN

cfg LN

cfg LW

cfg EW

cfg ES

cfg LS

cfg NS

cfg WS

Page 15: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Clos vs. Crossbar

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

15

Page 16: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

2-stage Clos Scheduler

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

16

IRG0

IRGi

IRGM-1

CMRICB

IMD

CMD

imcfg

IMSCHN

CMSCHr

CMSCHM-1IMSCHL

IMSCHS CMSCH0

rt_rS,0

rt_rS,i

rt_rS,M-1

rt_rN,M-1

rt_rN,0

rt_rL,M-1

rt_rL,0

IRG

IRG

IRG

IRG

OMRICB

CMRICB

IMD

OMRCCB

CMDimcfg

cmcfg

IMSCHi CMSCHr

CMSCH4IMSCH8

IMSCH1 CMSCH1

OMSCH8

OMSCHj

OMSCH1req1,1

req1,2

req1,3

req1,4

reqi,1reqi,2reqi,3reqi,4

req8,1req8,2req8,3req8,4

Page 17: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

A New SDM-Clos Router

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

17

M×M5×5

M×M

M×M

M×M

M×M

5×5

5×5

Clos Scheduler

South In

West In

North In

East In

Local In

South Out

West Out

North Out

East Out

Local Out

Input buffers Output buffers2-stage Clos

Page 18: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Area Breakdown

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

18

WH: 1 SDM: 3.9 SDM-Clos: 1.7

Page 19: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Speed Performance

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

19

Page 20: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Evaluation: MPEG-4

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

20

Video Output

SDRAM0

AudioDSP

AudioOuput

SDRAM1

Unsampling MCE

Padding

MediaCPU

SDRAM2

BABScaling context calc.

3D GFX rasteri-zation

iScanAC/DCiQuantiDCT

RISCCPU

190

0.5

60

60040

40

500 250

173

670

32

910

0.5R

PE(0,1)

R R

PE(0,2)

PE(0,0)

R

PE(1,1)

R R

PE(1,2)

PE(1,0)

R

PE(2,1)

R R

PE(2,2)

PE(2,0)

R

PE(3,1)

R R

PE(3,2)

PE(3,0)

9595

425.2595

896.5896.5

0.250.25

330.25

0

335

386

400.25

5050

3700

050

0801.25

796.5796.5

320320

3200 250

695

0320

0.250.25

0.25471.25

790790

4710

7900

102.5102.5

16

335

102.5546.5

125125

1250

125

0

250250

18901280 500

3958

4040

840 1330

31922432.5741

94335861040.5

Page 21: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Network Performance

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

21

Throughput requirement: ~3,400 Mbyte/s

SDM-Clos: small latency overhead; low energy consumption; half of the area

Page 22: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Conclusion

• SDM improves throughput

• Clos switch can reduce the area overhead

• A new 2-stage Clos switch

– Half area (4 virtual circuits)

– Small latency overhead

– Less energy consumption

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

22

Page 23: Using Clos Switches in Area Efficient Asynchronous SDM Routerswsong83.github.io/presentation/ukef20110704.pdf · [7] W. Song and D. Edwards. “Asynchronous spatial division multiplexing

Thanks

http://opencores.org/project,async_sdm_noc

03/07/2011Advanced Processor Technologies Group

The School of Computer Science

23