16
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge [email protected] [email protected] [email protected] [email protected] [email protected]

Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

1

Drowsy CachesSimple Techniques for Reducing Leakage Power

Krisztián FlautnerNam Sung Kim

Steve MartinDavid BlaauwTrevor Mudge

[email protected]@[email protected]@[email protected]

Page 2: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

2

Motivation

0

2 0 0

4 0 0

6 0 0

8 0 0

10 00

12 00

0 .0 50 .10 .150 .2

Minimum gate length (µm)

Nor

mal

ized

leak

age

pow

er 10 5 ºC

75 ºC

50 ºC

2 5 ºC

! On-chip caches" responsible for 15%~20% of the total power " leakage power can exceed 50% of total cache power

according to our projection using Berkeley Predictive Models

! Ever increasing leakage power" as feature size shrinks

! Vt scales down" exponential increase in

leakage power

Page 3: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

3

Processor power trends

• Based on ITRS roadmap and transistor count estimates.• Total power in this projection cannot come true.

0

200

400

600

800

1000

Pentium II Pentium III Pentium 4 One Gen Two Gen Three Gen

Processor Generation

Pow

er C

onsu

mpt

ion

(W)

Dynamic Power

Leakage Power

Page 4: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

4

0%

10%

20%

30%

40%

50%

crafty vortex bzip vpr mcf parser gcc facerec equake mesa

An observation about data caches! L1 data caches

• Working set: fraction of cache lines accessed in a time window.• Window size = 2000 cycles.• Only a small fraction of lines are accessed in a window.

Working set of current window

Working set of current + 1, 8, and 32 previous windows

Page 5: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

5

The Drowsy Cache approach

• Optimize across circuit-microarchitecture boundary:– Use of the appropriate circuit technique enables simplified

microarchitectural control.

• Requirement: state preservation in low leakage mode.

Instead of being sophisticated about predicting the working set, reduce the penalty for being wrong.

Algorithm:• Periodically put all lines in cache into drowsy mode.• When accessed, wake up the line.

Page 6: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

6

Access control flow – Awake tags

Awake tag match Line wake up Line access

Memory

Awake tag miss

Replacement

Line wake up

Awake tags

Hit

Miss

• Drowsy hit / miss adds at most 1 cycle latency• Access to awake line is not penalized

Page 7: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

7

• Drowsy tags implementation is more complicated• Is the complexity worth it?

– Tags use about 7% of data bits (32 bit address)– Only small incremental leakage reduction

• Worst case: 3 cycle extra latency

Access control flow – Drowsy tags

Awake tag match Line wake up Line access

Memory

Awake tag miss

Replacement

Line wake up

Drowsy tags

Hit

Miss

Tag wake up

Tag wake up Unneeded tagsand lines back

to drowsy

Page 8: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

8

Low-leakage circuit techniques

•More SEU noise susceptible•Retains cell state•Fase mode switching•More power reduction than ABB

DVS

•Slow mode switching•Retains cell stateABB-MTCMOS

•Loses cell state•Largest leakage reduction•Fast mode switching•Easy implementation

Gated-VDD

ConsProsCircuit

Page 9: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

9

Drowsy memory using DVS

• Low supply voltage for inactive memory cells– Low voltage reduces leakage current too! – Quadratic reduction in leakage power

leakage path

supply voltage for drowsy mode

supply voltage for normal mode

PP↓↓↓↓ = I= I↓↓ ×× VV↓↓

Page 10: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

10

0.2V

0.25V

0.3V

0.35V

85%

90%

95%

100%

76% 78% 80% 82% 84% 86% 88% 90% 92% 94%

Leakage reduction

Perf

orm

ance

Leakage reduction using DVS

• High-Vt devices for access transistors ! reduce leakage power ! increase access time of cache

! Right Trade-off point" 91% leakage reduction" 6% cycle time increase

Projections for 0.07µm process

Page 11: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

11

Drowsy cache line architecture

VDD (1V)

VDDLow (0.3V)

drowsy (set)

drowsy signal

SRAMs

row

dec

oder

wor

d lin

e dr

iver

voltage controller

word line

word line

power line

word line gate

wake up (reset)

drowsy bit

drowsy

drowsy

Page 12: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

12

Energy reduction

• Projections for 0.07µm process• High leakage: lines have to be powered up when accessed.• Drowsy circuit

– Without high vt device (in SRAM): 6x leakage reduction, no access delay.– With high vt device: 10x leakage reduction, 6% access time increase.

DynamicDynamic

High leakage

Leakage

Drow sy

0%

20%

40%

60%

80%

100%

Regular Cache Drowsy Cache

Drowsy

Page 13: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

13

1 cycle vs. 2 cycle wake up

• Fast wakeup is important – but easy to accomplish !– Cache access time: 0.57ns (for 0.07µm from CACTI using 0.18µm baseline).– Speed dependent on voltage controller size: 64 x Leff – 0.28ns (half cycle at 4

GHz), 32 x Leff – 0.42ns, 16 x Leff – 0.77ns.• Impact of drowsy tags are quite similar to double-cycle wake up.

70%

75%

80%

85%

90%

95%

100%

0.00% 0.20% 0.40% 0.60% 0.80% 1.00% 1.20% 1.40% 1.60% 1.80% 2.00% 2.20%

Run-time increase

Drow

sy fr

actio

n

ammp00 applu00apsi00 art00bzip200 crafty00eon00 equake00facerec00 fma3d00galgel00 gap00gcc00 gzip00lucas00 mcf00mesa00 mgrid00parser00 sixtrack00swim00 twolf00vortex00 vpr00wupwise00

1 cycle vs. 2 cycle wakup

simple policy, awake tags,4000 cycle window

Page 14: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

14

Policy comparison

applu artcrafty

eon

facerec

galgel

gap

gcc gziplucas

mgrid

parser

sixtrack

twolf

vortex

70%

75%

80%

85%

90%

95%

100%

0.00% 0.20% 0.40% 0.60% 0.80% 1.00% 1.20% 1.40%

Run-time increase

Drow

sy fr

actio

n

ammp00 applu00apsi00 art00bzip200 crafty00eon00 equake00facerec00 fma3d00galgel00 gap00gcc00 gzip00lucas00 mcf00mesa00 mgrid00parser00 sixtrack00swim00 twolf00vortex00 vpr00wupwise00

noaccess vs. simple policy

1 cycle wakeup, awake tags,simple policy: 2000 and 4000 cycle window, noaccess policy: 2000 cycle window

simple 2000

simple 4000

noaccess 4000

Page 15: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

15

Energy reduction

• Theoretical minimum assumes zero leakage in drowsy mode• Total energy reduction within 0.1 of theoretical minimum

– Diminishing returns for better leakage reduction techniques• Above figures assume 6x leakage reduction, 10x possible with small

additional run-time impact

0.84%0.090.240.310.42Drowsy tags

0.41%0.150.290.350.46Awake tags

Theoretical min.DVSTheoretical min.DVS

Run-time increase

Normalized Leakage EnergyNormalized Total Energy

> 50% total energy reduction > 70% leakage energy reduction

Page 16: Simple Techniques for Reducing Leakage Powerweb.eecs.umich.edu/~manowar/publications/drowsy-talk.pdf · facerec galgel gap gcc gzip lucas mgrid parser sixtrack twolf vortex 70% 75%

16

Conclusions

• Simple circuit technique– Need high-Vt transistors, low Vdd supply

• Simple architecture– No need to keep counter/predictor state for each line– Periodic global counter asserts drowsy signal– Window size (for periodic drowsy transition) depends on

core: ~4000 cycles has good E-delay trade-off

• Technique also works well on in-order procesors– Memory subsystem is already latency tolerant

• Drowsy circuit is good enough– Diminishing returns on further leakage reduction– Focus is again on dynamic energy