43
Power Efficient Cache Coherence Craig Saldanha Mikko Lipasti

Power Efficient Cache Coherence

  • Upload
    nathan

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Power Efficient Cache Coherence. Craig Saldanha Mikko Lipasti. Motivation. Power consumption becoming a serious design constraint. Market demand for faster and more complex servers. Complex coherence protocol and interconnect. f clock + Complexity = P interconnect. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Power Efficient Cache Coherence

Power Efficient Cache Coherence

Craig SaldanhaMikko Lipasti

Page 2: Power Efficient Cache Coherence

Motivation Power consumption becoming a

serious design constraint. Market demand for faster and

more complex servers. Complex coherence protocol and

interconnect. fclock + Complexity = Pinterconnect

Page 3: Power Efficient Cache Coherence

Motivation Traditional power saving

methodologies ineffective. Minimize number of transaction

packets. At the end-points: Jetty. At the source: Serial Snoop.

Page 4: Power Efficient Cache Coherence

OUTLINE

Overview of Snoop based protocols and opportunity for power savings

Latency and Power consumption of parallel snooping techniques

Serial Snooping Results Conclusions Future Work

Page 5: Power Efficient Cache Coherence

Snoop Based Coherence

Memory

Local Node Tag Lookup Bus Arbitration Snoop Transmission

Remote Node Tag Lookup Snoop Response Combination of Responses Data Fetch Data Transmit

Tag Array

Data Array

P1 P2 P3 P4

Page 6: Power Efficient Cache Coherence

Degrees of Speculation 3 Degrees of Freedom to SpeculateSnooping Data Fetch Data Transmit

Snoop

DFetch

DFetch

D-Xmit

D-Xmit

D-Xmit

D-Xmit

Parallel Snoop,Spec Dfetch Spec DXmit

Parallel Snoop,Spec Dfetch Non-Spec DXmit

Parallel Snoop,Non-Spec Dfetch Non-Spec DXmit

Serial Snoop,Spec Dfetch Spec DXmit

Serial Snoop,Spec Dfetch Non-Spec DXmit

Serial Snoop,Non-Spec Dfetch Non-Spec DXmitX

X

Page 7: Power Efficient Cache Coherence

Latency and Power assumptions

MEMORY

Consider only load misses

Tree of point-point connections.

Latency to traverse a link: 1Bus cycle (7ns)

Tag Look up :1 bus cycle (7ns)

D-Fetch: 2 bus cycles (14ns)

DRAM access: 10 bus cycles (70ns)

P1 P2 P3 P4

Root Node

Switch1 Switch2

Board1

Backplane

Page 8: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORY

Snoop Broadcast

0 7

Latency

Power

Plink

Page 9: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORY

Snoop Broadcast

0 7

Latency

Power

Plink+Pswitch

14

Page 10: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORY

Snoop BroadcastLatency

Power

2Plink+Pswitch

0 7 14 21

Page 11: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORY

Snoop BroadcastLatency

Power

2Plink+2Pswitch

0 7 14 21 28

Page 12: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORYSnoop BroadcastLatency

Power

5Plink+2Pswitch

0 7 14 21 28 35

Page 13: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORY

Snoop BroadcastLatency

Power

5Plink+4Pswitch

0 7 14 21 28 35 42

Page 14: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORY

Snoop BroadcastLatency

Power

8Plink+4Pswitch

0 7 14 21 28 35 42 49

Page 15: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

Memory Access :Data FetchLatency

Power

Pmem

35 91 105

MEMORY

Page 16: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

Memory Access :Data TransmitLatency

Power

Pmem+3Plink+2Pswitch

35 91 105 140

MEMORY

Page 17: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

Remote Node: Tag LookupLatency

Power

3Ptag

49 56

MEMORY

Page 18: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORYRemote Node :Snoop Response Latency

Power

3Ptag+3*(Pswitch+2Plink)+Plink+2Pswitch+2Plink

49 56 105

Page 19: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

Remote Node: Data FetchLatency

Power

3Pcache

49 63

MEMORY

Page 20: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

MEMORYRemote Node :Data Transmit Latency

Power

3Ptag+3*(4Plink+3Pswitch)

49 63 112

Page 21: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

Local Node

0TL Snoop BRDCSTTL49

105 14091Data XmitMemory Access

35Memory

105 14091Data XmitMemory Access

35

63DF

49Remote Node11263

DF Data Xmit49

10556 RSP + CMB

49Remote Node

10556 RSP + CMB

49TL

Latency

Power29Plink+18Pswitch+3Ptag+3Pcache+PmemRemote Node supplies the data

Memory supplies the data20Plink+11Pswitch+3Ptag+3Pcache+Pmem

Page 22: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Non-Speculative Transmit (PS/SF/NT)

0 49

63

105

105 14091

56

Snoop BRDCST

RSP

+ CMB

DF

Data XmitMemory Access35

TL

49

49

Local Node

Remote Node

Memory

Remote Node

105 154

TL

Data Xmit

Power21Plink+12Pswitch+3Ptag+Pcache+PmemRemote Node supplies the data

Memory supplies the data20Plink+11Pswitch+3Ptag+3Pcache+Pmem

Latency

Page 23: Power Efficient Cache Coherence

Parallel Snoop Non-Speculative Fetch Non-Speculative Transmit (PS/NF/NT)

49

105

196161

Snoop BRDCST

RSP +

CMB

Data XmitMemory Access91

TL

49

Local Node

Remote Node

Memory

105

TL

119 168

Remote Node

DF Data Xmit

0

Latency

Power21Plink+12Pswitch+3Ptag+PcacheRemote Node supplies the data

Memory supplies the data 20Plink+11Pswitch+3Ptag+Pmem

Page 24: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Avoids Speculative transmission of Snoop packets.

Page 25: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Avoids Speculative transmission of Snoop packets.

Check the nearest neighbor

Data supplied with minimum latency and power

Page 26: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Forward snoop to next level

Page 27: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Forward snoop to next level

Page 28: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Search other half of tree

Page 29: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Search other half of tree

Search leaf nodes serially

Page 30: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Search other half of tree

Search leaf nodes serially

Page 31: Power Efficient Cache Coherence

Serial Snooping : Features Latency to satisfy a request dependent on

distance from requestor. Data resident at the nearest neighbor

supplied with the lowest latency and power. Requests visible to memory controller only

at root node. Latency is adversely affected when

requested data present at the farthest node Worst case power consumption is still less

than the parallel snooping.

Page 32: Power Efficient Cache Coherence

MEMORY

Serial Snooping :

Request satisfied by Nearest NodeLatency

Power

0 21 SNPTL

35 56DF Data Xmit

21

28 49 RSPTL

21

P1

P2

P2

Xmit Snoop: 2Plink + Pswitch

P2 Tag access and snoop response: Ptag + 2Plink + Pswitch

P2 Data Fetch and Xmit: Pcache +2Plink + Pswitch

Ptotal= 6Plink+3Pswitch+Ptag+Pcache

Page 33: Power Efficient Cache Coherence

MEMORY

Serial Snooping :

Request satisfied by Next-Nearest NeighborLatency

Power

0 21 SNPTL

28 49

RSPTL21 63 77

77 84 133

77 91 140DF

TL Xmit

Xmit

XmitMemory Access63 133 168

P1

P2

P3

P3

Memory

16Plink+10Pswitch+2Ptag+Pcache

Page 34: Power Efficient Cache Coherence

MEMORY

Serial Snooping :

Request satisfied by farthest nodeLatency

Power

0 21

XmitTL

28 49 RSPTL

21 63 77

77 84 10TL Xm

P1

P2

P3

Spec-Memory

P4

P4

Non-Spec-Memory

XmitMemory Access 133 168

RSPTL

105 161

DF Xmit

147

112

105 168119

XmitMemory Access217 252

147

XmitMemory Access133 182147

18Plink+11Pswitch+3Ptag+PcacheRemote Node supplies the data

If Memory supplies the data 17Plink+10Pswitch+3Ptag+Pmem

Page 35: Power Efficient Cache Coherence

RESULTS : Load Miss Distributions

Load Miss Distribution

0

5

10

15

20

25

30

35

40

45

% o

f lo

ad

mis

se

s s

ati

sfi

ed

NearestNeighbourNext NearestNodeFarthestNodeMemory

raytrace specwebspecjbb tpc-w

Page 36: Power Efficient Cache Coherence

RESULTS: Average Latencies to satisfy load misses

Average Latencies to satisfy load misses

0 20 40 60 80 100 120 140 160 180 200

Avg. Latencies (ns)

SSNFNT

SSSFNT

SSSFST

PSNFNT

PSSFNT

PSSFST

raytrace

specjbb

spe

cwe

btpc-w

Page 37: Power Efficient Cache Coherence

RESULTS: Relative Power Savings

Factors contributing to power consumption

0

0.2

0.4

0.6

0.8

1

1.2

PS/SF/ST

SS/SF/ST

PS/SF/NT

SS/SF/NT

PS/NF/NT

SS/NF/NT

Plink Psw Ptag Pcache Pmem

Page 38: Power Efficient Cache Coherence

CONCLUSIONS Reducing degree of speculation has

potential for significant power savings

Performance degradation is minimal for the set of benchmarks studied.

Serial Snooping with speculative memory fetch provides optimal latency and power consumption.

Page 39: Power Efficient Cache Coherence

Future Work Develop detailed execution-driven

Power Model Explore different interconnect

topologies. Examine the viability of adaptive

mechanisms for protocol policy.

Page 40: Power Efficient Cache Coherence

MEMORY

Serial Snooping

Nearest NeighborLatency

Power

2Plink+Pswitch

0 7 14 21

Page 41: Power Efficient Cache Coherence

Questions

Page 42: Power Efficient Cache Coherence
Page 43: Power Efficient Cache Coherence

Parallel Snoop Speculative Fetch Speculative Transmit (PS/SF/ST)

Local Node

0TL Snoop BRDCSTTL49

MEMORY

Memory Access35

Memory Access35 91