66
A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Chita R. Das Onur Mutlu

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

Embed Size (px)

Citation preview

Page 1: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

A Heterogeneous Multiple Network-On-Chip Design:

An Application-Aware Approach

Asit K. Mishra Chita R. DasOnur Mutlu

Page 2: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

2

Executive summary• Problem: Current day NoC designs are agnostic to application requirements and

are provisioned for the general case or worst case. Applications have widely differing demands from the network

• Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications

• Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive

- Not all applications are equally sensitive to bandwidth and latency

• Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity

• Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency

• Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)

Page 3: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

3

• Channel bandwidth affects network latency, throughput and energy/power

• Increase in channel BW leads to- Reduction in packet serialization- Increase in router power

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

Page 4: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

4

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

Simulation settings:

• 8x8 multi-hop packet based mesh network

• Each node in the network has an OoO processor (2GHz), private L1 cache and a router (2GHz) • Shared 1MB per core shared L2

• 6VC/PC, 2 stage router

Page 5: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

5

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0

1

2

3

4

5

6

7

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

Page 6: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

6

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0

1

2

3

4

5

6

7

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)

Page 7: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

7

Resource requirements of various applications - I

Impact of channel bandwidth on application performance

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0

1

2

3

4

5

6

7

64b links 128b links 256b links 512b links

IT (

norm

. to

64b

links

)

1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)

2. 12/30 (15/36 total) applications’ performance scale with increase in channel BW (8x BW inc. → at least 2x performance inc.)

Page 8: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

8

• Reduction in router latency (by increasing frequency)- Reduction in packet latency- Increase in router power consumption

Impact of network latency on application performance

Resource requirements of various applications - II

Page 9: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

9

Resource requirements of various applications - II

Simulation settings:

• … same as last experiment

• 128b links

• Added dummy stages (2-cycle and 4-cycle ) to each router

Impact of network latency on application performance

Page 10: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

10

Resource requirements of various applications - II

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0.5

0.6

0.7

0.8

0.9

1.0

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

Impact of network latency on application performance

Page 11: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

11

Resource requirements of various applications - II

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0.5

0.6

0.7

0.8

0.9

1.0

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)

Impact of network latency on application performance

Page 12: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

12

Resource requirements of various applications - II

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq g

asta

r

milc h

swim

sjbb sa

p

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0.5

0.6

0.7

0.8

0.9

1.0

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

2. 12/30 (15/36 total) applications’ performance is marginally sensitive to network latency (3x latency increase → less than 15% performance improvement)

1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)

Impact of network latency on application performance

Page 13: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

13

a

wrf art

deal s ba g

h264 gc

c p t

libq g

asta

r

milc h

sjbb sa

p x s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

0.5

0.7

0.9

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

Application-aware approach to designing multiple NoCs a

pp

lu wrf

art

de

al

sje

ng

ba g

h2

64

gcc

pvr

ay

ton

to

libq g

ast

ar

milc h

swim

sjb

b

sap

xala

n s

bzi

p

lbm

sja

s

sop

lx

cact

s o

mcf

01234567

64b links 128b links 256b links 512b links

IT (

no

rm. t

o 6

4b

lin

ks)

Page 14: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

14

a

wrf art

deal s ba g

h264 gc

c p t

libq g

asta

r

milc h

sjbb sa

p x s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

0.5

0.7

0.9

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

Application-aware approach to designing multiple NoCs

Based on the observations:

1. Applications can be classified into distinct classes: typically LAT/BW sensitive2. LAT sensitive applications can benefit from low network latency3. BW sensitive applications can benefit from high network bandwidth4. Not all applications are equally sensitive to either LAT or BW5. Monolithic network cannot optimize both classes simultaneously

ap

plu wrf

art

de

al

sje

ng

ba g

h2

64

gcc

pvr

ay

ton

to

libq g

ast

ar

milc h

swim

sjb

b

sap

xala

n s

bzi

p

lbm

sja

s

sop

lx

cact

s o

mcf

01234567

64b links 128b links 256b links 512b links

IT (

no

rm. t

o 6

4b

lin

ks)

Page 15: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

15

a

wrf art

deal s ba g

h264 gc

c p t

libq g

asta

r

milc h

sjbb sa

p x s

bzip

lbm

sjas

sopl

x

cact

s o

mcf

0.5

0.7

0.9

1.12-cycle router 4-cycle router 6-cycle router

IT (n

orm

. to

2-cy

cle

rout

er)

Application-aware approach to designing multiple NoCs

Solution

Two NoCs where each (sub)network is optimized for either LAT or BW sensitive applications

ap

plu wrf

art

de

al

sje

ng

ba g

h2

64

gcc

pvr

ay

ton

to

libq g

ast

ar

milc h

swim

sjb

b

sap

xala

n s

bzi

p

lbm

sja

s

sop

lx

cact

s o

mcf

01234567

64b links 128b links 256b links 512b links

IT (

no

rm. t

o 6

4b

lin

ks)

Page 16: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

16

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

Page 17: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

17

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme

1

Page 18: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

18

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme

1

2 Design sub-networks based on applications’ demand- This network architecture is better than a monolithic iso-resource

network

2

Page 19: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

19

Design methodology

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme

1

2 Design sub-networks based on applications’ demand- This network architecture is better than a monolithic iso-resource

network

2

DE

MU

X

Page 20: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

20

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycleO

uts

tan

din

g n

etw

ork

p

acke

ts

Page 21: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

21

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

Ou

tsta

nd

ing

net

wo

rk

pac

kets

Page 22: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

22

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

• App. has no outstanding packet• High IPC

Ou

tsta

nd

ing

net

wo

rk

pac

kets

Page 23: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

23

Design: Dynamic classification of applications

Network episode Compute episode

Ep

iso

de

he

igh

t

Episode length time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

• App. has no outstanding packet• High IPC

Episode length = Number of consecutive cycles there are net. packets

Episode height = Avg. number of L1 packets injected during an episode

Ou

tsta

nd

ing

net

wo

rk

pac

kets

Page 24: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

24

Design: Dynamic classification of applications

Network episode Compute episode

time

Application life cycle

• App. has at least one outstanding packet• Processor is likely stalling → low IPC

• App. has no outstanding packet• High IPC

Ou

tsta

nd

ing

net

wo

rk

pac

kets

Short episode ht.: Low MLP, each request is critical (LAT sensitive)Tall episode ht.: High MLP (BW sensitive)

Short episode len.: Packets are very critical (LAT sensitive)Long episode len.: Latency tolerant (could be de-prioritized)

Episode length Ep

iso

de

he

igh

t

Page 25: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

25

Classification and ranking

Classification Length Long Medium Short

Tall gems, mcf sphinx, lbm, cactus, xalan sjeng, tonto

Height Medium omnetpp, apsiocean, sjbb, sap, bzip,

sjas, soplex, tpc

applu, perl, barnes, gromacs, namd, calculix,

gcc, povray, h264, gobmk, hmmer, astar

Short leslie art, libq, milc, swim wrf, deal

Classification: LAT/BW

Page 26: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

26

Classification and ranking

Classification: LAT/BW

Ranking Length Long Medium Short

High Rank-4 Rank-2 Rank-1Height Medium Rank-3 Rank-2 Rank-2

Short Rank-4 Rank-3 Rank-1

Ranking: Sensitivity to LAT/BW

Classification Length Long Medium Short

Tall gems, mcf sphinx, lbm, cactus, xalan sjeng, tonto

Height Medium omnetpp, apsiocean, sjbb, sap, bzip,

sjas, soplex, tpc

applu, perl, barnes, gromacs, namd, calculix,

gcc, povray, h264, gobmk, hmmer, astar

Short leslie art, libq, milc, swim wrf, deal

Page 27: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

27

Network design

1N-128

Page 28: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

28

Network design

1N-128 2N-64x256-ST(Steering)

Page 29: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

29

Network design

1N-128 2N-64x256-ST(Steering)

2N-64x256-ST-RK(Steering+Ranking)

Page 30: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

30

Network design

1N-128 2N-64x256-ST(Steering)

2N-64x256-ST-RK(Steering+Ranking)

2N-64x256-ST-RK(FS)(Steering+Ranking and

Frequency Scaling)

Page 31: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

31

Network design

1N-128

1N-256

2N-64x256-ST(Steering)

2N-64x256-ST-RK(Steering+Ranking)

2N-64x256-ST-RK(FS)(Steering+Ranking and

Frequency Scaling)

1N-512 (High BW)2N-128X128

1N-320(iso-BW)

1N-320(FS)(iso-resource)

Page 32: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

32

Analysis

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Performance (25 WL with 50% BW and 50% LAT)

Page 33: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

33

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

Page 34: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

34

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

+18%

Performance (25 WL with 50% BW and 50% LAT)

Page 35: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

35

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

+7%+18%

Page 36: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

36

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

+5%+7%

+18%

Page 37: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

37

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

5%+5%

+7%+18%

Page 38: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

38

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2%5%

+5%+7%

+18%

Page 39: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

39

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2%5%

+5%+7%

+18%

Page 40: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

40

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2%5%

+5%+7%

+18%

1N-128

1N-256

2N-128x128

1N-512

2N-64x256-ST

2N-64x256-ST+RK(no FS)

2N-64x256-ST+RK(FS)

1N-320(no FS)

1N-320 (FS)

0

0.4

0.8

1.2

1.6

2

Nor

mal

ized

ene

rgy

Energy (25 WL with 50% BW and 50% LAT)

- 47%- 59%

Page 41: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

41

1N-128

1N-256

2N-128

x128

1N-512

2N-64x

256-S

T

2N-64x

256-S

T+RK(no

FS)

2N-64x

256-S

T+RK(FS)

1N-320

(no F

S)

1N-320

(FS)

0

10

20

30

40

50

60

We

igh

ted

sp

ee

du

p

Analysis

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2%5%

+5%+7%

+18%

1N-128

1N-256

2N-128x128

1N-512

2N-64x256-ST

2N-64x256-ST+RK(no FS)

2N-64x256-ST+RK(FS)

1N-320(no FS)

1N-320 (FS)

0

0.4

0.8

1.2

1.6

2

Nor

mal

ized

ene

rgy

Energy (25 WL with 50% BW and 50% LAT)

- 47%- 59%

Best EDP across all designs

Page 42: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

42

Conclusions• Problem: Current day NoC designs are agnostic to application requirements and

are provisioned for the general case or worst case. Applications have widely differing demands from the network

• Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications

• Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive

- Not all applications are equally sensitive to bandwidth and latency

• Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity

• Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency

• Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)

Page 43: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

43

Thank you

Q?Asit Mishra

[email protected]

Page 44: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

44

Backup Slides . . .

Page 45: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

45

Other metrics considered for application classification

0

20

40

60

80

100

120

0

200

400

600

800

1000

1200

1400

1600

1800

2000

L1MPKI L2MPKI Slack

L1

/L2

MP

KI

Sla

ck (

in c

ycle

s)

Page 46: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

46

Analysis of network episode length and height

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc h

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

02468

101214

Avg.

epi

sode

hei

ght

(net

wor

k pa

cket

s)

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0100020003000400050006000

Avg

. epi

sode

leng

th

(in c

ycle

s)

Short length/height

Medium length/height

Long length/High height

0.3M10K

0.4M18K

Page 47: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

47

Analysis of network episode length and height

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc h

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

02468

101214

Avg.

epi

sode

hei

ght

(net

wor

k pa

cket

s)

appl

u

wrf art

deal

sjeng

barn

es

grm

cs

nam

d

h264 gc

c

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

0100020003000400050006000

Avg

. epi

sode

leng

th

(in c

ycle

s)

Short length/height

Medium length/height

Long length/High height

Based on performance scaling sensitivity to bandwidth and frequency

0.3M10K

0.4M18K

Page 48: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

48

Empirical results to support the classification

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 49: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

49

Empirical results to support the classification

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 50: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

50

Empirical results to support the classification

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 51: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

51

Empirical results to support the classification

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 52: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

52

Empirical results to support the classification

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 5 10 15 20 25 30 350

25

50

75

100

125

150

175

200

225

Number of clusters

Wit

hin

gro

up

su

m o

f s

qu

are

s

Page 53: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

53

Empirical results to support the classification

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 5 10 15 20 25 30 350

25

50

75

100

125

150

175

200

225

Number of clusters

Wit

hin

gro

up

su

m o

f s

qu

are

s 13x

Page 54: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

54

Empirical results to support the classification

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 5 10 15 20 25 30 350

25

50

75

100

125

150

175

200

225

Number of clusters

Wit

hin

gro

up

su

m o

f s

qu

are

s

Page 55: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

55

Analysis with varying workload combinations

0% BANDWIDTH 100% LATENCY

25% BAND-WIDTH 75% LATENCY

50% BAND-WIDTH 50% LATENCY

75% BAND-WIDTH 25% LATENCY

100% BAND-WIDTH 0% LATENCY

0.7

0.9

1.1

1.3

1.5

WS IT

WS

and

IT

(no

rm.

to 1

N-1

28 n

et.)

Page 56: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

56

Comparison to prior works

1N

-12

8-S

TC

1N

-12

8-S

T+

RK

2N

-12

8x1

28

-LD

-BA

L

2N

-64

x25

6-W

-LD

-BA

L

2N

-64

x25

6-S

T+

RK

...

0.8

1.0

1.2

1.4WS IT

WS

an

d IT

(n

orm

. to

1N

-12

8 n

et.)

Page 57: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

57

Dynamic steering of packets

ap

plu art

ba

rne

s

na

md

gcc libq

ast

ar

hm

me

r

lesl

ie

calc

ulix

sje

ng

sap

sph

nx

lbm

sop

lx

om

ne

t

mcf

oce

an

0%

20%

40%

60%

80%

100%

Latency-optimized network

% p

ack

ets

in s

ub

-ne

two

rk

Page 58: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

58

Design: Putting it all together

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Page 59: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

59

Design: Putting it all together

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Page 60: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

60

Design: Putting it all together

Episode LEN/HT

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Use network episode length/height to dynamically

identify apps

Page 61: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

61

Design: Putting it all together

Episode LEN/HT

Design LAT/BW optimized networks

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Use network episode length/height to dynamically

identify apps

Page 62: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

62

Design: Putting it all together

Classify applications based on sensitivity to network BW/LAT

Episode LEN/HT

Design LAT/BW optimized networks

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Use network episode length/height to dynamically

identify apps

Prioritization within networks

Page 63: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

63

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicores

Page 64: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

64

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicores

Small core

GPGPUs

Accelerators/ ASIC

Latency critical

Throughput (BW) critical

Throughput (BW) critical

Latency critical (real-

time constraints)

Big core

Page 65: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

65

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicores

Small core

GPGPUs

Accelerators/ ASIC

Latency critical

Big core

Providing all these guarantees in one network is hard

Multiple networks: each customized for one metric

Throughput (BW) critical

Throughput (BW) critical

Latency critical (real-

time constraints)

Page 66: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu

66

Summary

• A NoC paradigm based on top-down approach (application demand/requirement analysis)

• An efficient design paradigm for future heterogeneous multicore m/c

Latency

Throughput

Local communication

Long haul comm.

Power

1 cycle/ bufferless/Faster

routers

1 cycle/ high bandwidth

Power efficient links/DVFS router

Hybrid/ fewer connectivity

network

Butterfly/express channels

MU

XShare 2D space or 3D layers

Episode LEN/HT/??