44
1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science and Engineering University of California, Riverside Frank Vahid – PhD Advisor This work was supported by the U.S. National Science Foundation, and by the Semiconductor Research Corporation

1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Embed Size (px)

Citation preview

Page 1: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

1

Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy

Consumption

Ann Gordon-RossDepartment of Computer Science and Engineering

University of California, RiversideFrank Vahid – PhD Advisor

This work was supported by the U.S. National Science Foundation, and by the Semiconductor Research Corporation

Page 2: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 2 of 45

Introduction

Much research is devoted to reducing power consumption in mobile embedded devices Increased battery life Decreased cooling requirements

Page 3: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 3 of 45

Introduction Cache hierarchy consumes a lot of power We can use configurable caches to reduce power

consumption However, configuring/tuning the cache is very difficult

Many parameters lead to a very large design space In this talk, I describe research that addresses the

problem of quickly tuning highly configurable caches Efficient heuristics for increasingly-complex configurable

cache hierarchies Feedback-control system for online cache tuning

Page 4: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 4 of 45

Cache Power Consumption Memory access: 50% of embedded processor’s system power

Caches are power hungry ARM920T (Segars 01) M*CORE (Lee/Moyer/Arends 99)

Thus, caches are a good candidate for optimizations

Main Mem

L1 Cache

Processor

L2 Cache

53%

Page 5: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 5 of 45

Reducing Cache Energy Consumption

Research shows that different applications have different cache requirements – Zhang ‘04 Depending on the working set of the application, the

application may require different values for cache parameters:

Total size Line size (block size) Associativity

Cache parameters that don’t match an application’s behavior can waste over 40% of energy Balasubramonian ’00, Zhang ’03

Page 6: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 6 of 45

Excess energy

Excess Cache Energy Consumption

Size Excess fetch and static energy

if too large

= working

set

Excess thrashing energy if too small

to next level of memory

Stall cycles = excess energy

Line size Excess fetch energy if line size

too large

= fetched

Excess energy fetching unused

data

Excess stall energy if line size too small from next level of memory

Stall cycles = excess energy

Page 7: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 7 of 45

Excess energy checking unused ways

Excess Cache Energy Consumption

Associativity Excess fetch energy per access

if too high

= working

set

Excess miss energy if too low – decreased performance

Configurable caches allow for cache parameter values to be varied or tuned thus specializing the cache to the needs of an application

Page 8: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 8 of 45

Configurable Caches Soft cores – designer specified cache

parameters ARM, MIPS, Tensillica

Processor - HDLSpecialized cache Chip with

specialized cache

Fab

Page 9: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 9 of 45

Configurable Caches Even hard processors contain configurable caches

Specialized software instructions can change cache parameters Specialized hardware enables the cache to be configured at startup

or in system during runtime Motorola M*CORE – Malik ISLPED’00, Albonesi MICRO’00, Zhang

ISCA’03

2K

B

2K

B

2K

B

2K

B

8 KB, 4-way base cache

2K

B

2K

B

2K

B

2K

B

8 KB, 2-way

2K

B

2K

B

2K

B

2K

B

8 KB, direct-mapped

Way concatenation

2K

B

2K

B

2K

B

2K

B

4 KB, 2-way2

KB

2K

B

2K

B

2K

B

2 KB, direct-mapped

Way shutdown

Configurable Line size

16 byte physical line size

Tunable cache

Tuning hw

Page 10: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 10 of 45

Cache Tuning However, configurable caches are relatively new Designers are provided with configurable caches

but are not told how to determine the best cache configuration

Cache tuning is the process of determining the appropriate cache parameters for an application

Cache tuning is very difficult - 100’s to 10000’s of different configurations

Page 11: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 11 of 45

Cache Tuning Difficulties

Simulation method

Microprocessor

L2 cache

L1 cache

Main Memory

TUNE

TUNE

Choose lowest energy configuration

Possible Cache Configurations

Ene

rgy

Realistic input stimulus is difficult to model

inputA few seconds of real

execution may take days or weeks to simulate

Prediction method

Chosen config

Examine the code

Page 12: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 12 of 45

Cache Tuning Difficulties

Runtime tuning

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Download application

Time

Ene

rgy

System startup

Cache tuning

Exhaustive exploration can unnecessarily

expand this high energy tuning time

Tunable cache

Tuning hw

Runtime tuning allows for

adaptation to new software and new

operating environments

Page 13: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 13 of 45

Cache Tuning Difficulties

Heuristic tuning method

Design space

100’s – 10,000’s

Lowest

energy

Simulation based approach

Possible Cache ConfigurationsE

nerg

y

Exhaustive method

Possible Cache Configurations

Ene

rgy

Heuristic method

Runtime based approach

System Startup

Ene

rgy

Exhaustive method

System Startup

Ene

rgy

Heuristic method

Existing heuristics do not address the complexities of tuning a highly configurable cache consisting of 10,000’s of different configurations

Page 14: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 14 of 45

Outline

Develop an efficient tuning heuristic for a highly configurable two-level cache hierarchy Develop using a simulation-based

environment but is applicable to a dynamic tuning environment

62% energy savings on average Current research

Feedback-control system for online cache tuning

Page 15: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 15 of 45

Challenge for Two Level Cache Tuning Heuristic Development

Current methods

L1

- Size- Line size- Assoc.

10’s of configurations

Single level configuration

Two-level configuration

10’s of configurations

L1 L2

Hierarchy

L1 L2

HierarchyL1 L2

Hierarchy

L1 L2

Hierarchy …

Our Two-level Cache Tuning Goal

Two-level configuration with separate

L2 caches

L1

- Size- Line size- Assoc.

- Size- Line size- Assoc.

D

I

L2

- Size- Line size- Assoc.

- Size- Line size- Assoc.

30 configs per cache

**

30*30 + 30*30 = 1800 configs

L1

- Size- Line size- Assoc.

- Size- Line size- Assoc.

D

I- Size- Line size- Assoc.

L2

**

30*30*30 = 27,000 configs

Two-level configuratio

n with a unified

second level of cache

Page 16: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 16 of 45

Single Level Tuning Heuristic

Mic

ropr

oces

sor

Mai

n M

emor

yI$

D$

Tuner

Zhang’s Configurable Cache

18 configurations per cache

Independently tuned

L1

Mic

ropr

oces

sor

Mai

n M

emor

yI$

D$

Tuner

I$

D$

Our Extended Configurable Cache

216 configurations per cache hierarchy

L1L2

Tuning dependenc

y

Impact-ordered heuristics have been shown effective in previous tuning efforts (Zhang’03)

Tune parameters in order of energy impact – highest impact first i.e., vary each parameter while holding others fixed, measure change Impact order for cache: 1. Total size 2. line size 3. associativity

Search parameters from smallest to largest Minimize flushing in a dynamic environment

Tune instruction cache then tune data cache

Page 17: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 17 of 45

First Heuristic – Tune Levels One-at-a-Time

Tune each cache using impact-ordered heuristic for one-level cache tuning

Tune L1, the L2 Initial L2: 64 KByte, 4-way, 64 byte line

size For best L1 configuration, tune L2 cache

Microprocessor

Main Memory

L1 Cache

L2 Cache

Page 18: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 18 of 45

Results of First Heuristic Base cache configuration

Level 1 – 8KByte, 4-way, 32 byte line size Level 2 – 64KByte, 4-way, 64 byte line size

0

0.2

0.4

0.6

0.8

1

1.2

g721

rawcaudio

pegwitAIFFTR01AIFIRF01BITMNP01IDCTRN01PNTRCH01TTSPRK01

average

FirstHeuristic

Optimal

Energ

y c

onsu

mpti

on

norm

aliz

ed t

o t

he b

ase

cach

e

configura

tion

Base line

32% vs 53%

Worse than base cache

Page 19: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 19 of 45

Interlacing Heuristic Did not find optimal in most cases

Sometimes 200% or 300% worse Conclusion: The two levels should not be explored separately

Too much interdependence among L1 and L2 cache parameters – not addressed with Zhang’s method

L2 cache performance depends on how much and what misses in the L1 cache To more fully explore the dependencies between the two levels, we

interlaced the exploration of the level one and level two caches

Interlacing performed better than the initial heuristic but there was still much room for improvement

Mic

ropr

oces

sor

Mai

n M

emor

yI$

D$

Tuner

I$

D$

L1L2

1. Tune L1 Size1. Tune L1 Size I$ I$ 2. Tune L2 Size2. Tune L2 SizeI$3. Tune L1 Line Size3. Tune L1 Line Size

I$

4. Tune L2 Line Size4. Tune L2 Line SizeI$

5. Tune L1 Associativity5. Tune L1 Associativity

I$

6. Tune L2 Associativity6. Tune L2 Associativity

Do the same for the data cache hierarchy

Page 20: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 20 of 45

Final Heuristic: Interlaced with Local Search

Some cases were still sub-optimal - manually examined

Limitation of the configurable cache architecture Certain associativities were not possible for

some sizes Determined small local search needed to overcome

the limitation Final heuristic - The Two Level Cache Tuner (TCaT)

Page 21: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 21 of 45

TCaT Results

0

0.2

0.4

0.6

0.8

1

1.2

g721

rawcaudiopegwit

AIFFTR01AIFIRF01BITMNP01IDCTRN01PNTRCH01TTSPRK01average

FirstHeuristicTCaT

Optimal

Energ

y c

onsu

mpti

on

norm

aliz

ed t

o t

he b

ase

cach

e

configura

tion

Base line

53% energy savings –

near optimal

Page 22: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 22 of 45

Extending the TCaT - Exploring a Unified Second Level of Cache

Unified second level caches are standard in desktop computers and are becoming increasingly popular in embedded microprocessors

Current cache tuning heuristics do not directly apply due to the added circular dependency

A change in any cache affects the performance of all other caches in the

hierarchyMic

ropr

oces

sor

Mai

n M

emor

yI$

D$

Tuner U$

Page 23: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 23 of 45

Level Two Cache Configurability

For maximum configurability, the level two cache utilized the Motorola M*CORE style way management

U-w

ay

U-w

ay

U-w

ay

U-w

ayTraditional,

4-way unified

level two cache

Motorola M*CORE way management

cache Cfg

W

ay

Cfg

W

ay

Cfg

W

ay

Cfg

W

ay

I-w

ay

D-w

ay

U-w

ay

I-w

ay

D-w

ay

U-w

ay

I-w

ay

D-w

ay

U-w

ay

I-w

ay

D-w

ay

U-w

ay

I-w

ay

D-w

ay

U-w

ay

I-w

ay

I-w

ay

U-w

ay

I-w

ay

D-w

ay

D-w

ay

Sample way management L2 caches

In addition, the L2 cache offers the same line

size configurability as in the L1 caches

Design space explodes Design space explodes to 18,000 configurationsto 18,000 configurations

Page 24: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 24 of 45

Alternating Cache Exploration with Additive Way Tuning (ACE-AWT)

D

Tune level one sizes

I

Tune level two size

I

Tune level one line sizes

D

Tune level two line size

Tune level two associativity

{ }

{ }I{ }D

Tune level one associativities

These steps are difficult because changing size and associativity is

synonymous in a way management style cache

Page 25: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 25 of 45

Way ManagementI-

way

D-w

ay

U-w

ay

8Kb 1-way

Increase L2 size

I-w

ay

D-w

ay

U-w

ay

16Kb 2-way

I-w

ay

I-w

ay

U-w

ay

16Kb 2-way

I-w

ay

U-w

ay

U-w

ay

16Kb 2-way

I-w

ay

D-w

ay

U-w

ay

24Kb 3-way

Decrease L2

associativity I-

way

D-w

ay

U-w

ay

16Kb 2-way

I-w

ay

D-w

ay

U-w

ay

16Kb 2-way

I-w

ay

D-w

ay

U-w

ay

16Kb 2-way

Page 26: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 26 of 45

ACE-AWT First Phase – L2 Size Exploration

Start with

empty L2 cache

Current L2

config

Simulate

Simulate

Simulate

I-w

ay

D-w

ay

U-w

ay

+

+

+

Add one of each

way type…

Current L2

config

I-w

ay

Current L2

config

Current L2

config

D-w

ay

U-w

ay

=

=

=

…resulting in 3

candidate configs

Select minimu

m energy

ener

gy

energy

energy

If cache max size

cmp

energ

y

DONE

If increase

in energy

If decrease in energy

Min energy

cfg

Min energy

cfg

Min energy

cfg

Min energy

cfg

Min energy

cfg

Min energy

cfg

Min energy

cfg

Min energy

cfg

Current L2

config

Selected L2 cfg

Page 27: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 28 of 45

Simulate

Simulate

Simulate

Simulate

Simulate

Simulate

ACE-AWT Fine Tuning Phase – Associativity Exploration

Start with current cache

configuration

Current L2 cfg

Size and availability

permitting, try 3 way additions

and removals …

I-w

ay

D-w

ay

U-w

ay

I-w

ay

D-w

ay

U-w

ay

+++

---

Current L2 cfg

Current L2 cfg

Current L2 cfg

I-w

ay

D-w

ay

U-w

ay

Current L2 cfg

Current L2 cfg

Current L2 cfg

I-w

ay

D-w

ay

U-w

ay

… resulting in 6

candidate configs

=

=

=

=

=

=

Select minimu

m energy

ener

gy

energyenergyenergy

energy

energy

cmp

energ

y

DONE

If increase

in energy

If decrease in energy

Min energy

cfg

Current L2 cfg

If no new configuration to explore

Selected L2 cfg

Page 28: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 29 of 45

Results

Heuristic achieved near optimal results (when optimal computed) 62% energy savings compared to base cache Yet only searched 0.2% of the search space

Key to previous heuristics Combined proven space pruning method (impact-ordering of

parameters) with architecture-specific knowledge highly-efficient and effective results

0.0

0.2

0.4

0.6

0.8

1.0

A2TIME01BaseFP01

CACHEB01CANRDR01IIRFLT01

MATRIX01PUWMOD01

RSPEED01TBLOOK01AIFFTR01AIIFFT01AIFIRF01

BITMNP01IDCTRN01PNTRCH01TTSPRK01

bcnt bilvbinary

blitbrevg3fax

matmulpocsagps-jpegucbqsort

v42 avg

Energy consumption normalized to the

base cache configuration

ACE-AWT Optimal

Base line

Page 29: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 30 of 45

Outline Develop an efficient tuning heuristic

for a highly configurable two-level cache hierarchy Develop using a simulation-based environment

but is applicable to a dynamic tuning environment

62% energy savings on average Current research

Feedback-control system for online cache tuning

Page 30: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 31 of 45

Online Cache Tuning Reconfigure the cache dynamically to adapt to

different phases of program execution or different applications in a multi-application environment

Base cache energy

Application-tuned

TimeEnerg

y C

onsu

mpti

on

Phase-tuned

Change cache

Page 31: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 32 of 45

Online Cache Tuning Challenges

Need a good tuning interval Tuning interval is the time between invocations of the

tuning hardware Should closely match phase interval - length of time the

system executes between phase changes

Base cache energy

TimeEnerg

y C

onsu

mpti

on

Phase Interval

Base cache energy

TimeEnerg

y C

onsu

mpti

on

Runtime energy

Tuning interval

Excess tuning energy

Tuning interval

too short

Tuning interval too long

Base cache energy

TimeEnerg

y C

onsu

mpti

on Runtime

energy

Tuning interval

Wasted energy in suboptimal configuration

Page 32: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 33 of 45

Previous Online Cache Tuning

Largely ad hoc Fixed tuning interval

Inspect counters and adjust cache Search very small configuration space ≈ 4

Limited tuning overhead Adjusted tuning thresholds

Do not analyze the chosen tuning interval None attempted to tune the tuning

interval

Page 33: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 34 of 45

Periodic System

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1m2m3m4m5m6m7m8m9m10m11m12m13m14m15m16m17m18m19m20mTuning interval (millions of cycles)

Normalized energy

Online tuningenergynormalized tobase

Phase interval fixed at 10

million cyclesTuning interval

too shortTuning interval

too long

Energy savings = 32%

Severely penalized if phase interval is not precisely followed

Energy savings = 28%

Penalty is acceptable

Goal: Tuning interval should be 1/2 of the phase interval

Page 34: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 35 of 45

Online Algorithms

Need to determine tuning interval while system is executing

Online algorithms process data piecemeal - unable to view entire dataset Online tuner must be able to determine

the tuning interval based on current and past events with no knowledge of future

Page 35: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 36 of 45

Feedback Control System

Plant (System under control)

Set-Point(Goal)

Actuator(device to

manipulate plant)

Controller (compute input to plant)

ut = F(xt)∑

Error detector

Reference input rt

Sensor

Measured error

Disturbances

Difficulty: Set-points are typically fixed values. We want minimization of energy which makes

developing the control system much more difficult.

Page 36: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 37 of 45

Online Cache Tuner

Goal: Adjust tuning interval to match phase interval

Observe change in energy due to tuning Compare energy before and after tuning If there is a change, then tuning interval is

too long, missed a phase change If there is no change, then tuning interval is

too short

Page 37: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 38 of 45

Online Cache Tuner - Feedback Control System

Plant (System under control)

Set-Point(Goal)

Actuator(device to

manipulate plant)

Controller (compute input to plant)

ut = F(xt)∑

Error detector

Reference input rt

Sensor

Measured error

Disturbances

Plant (Microprocessor)$

Set-Point(minimiz

e energy)

Actuator

Cache Tuner

Controller (activate cache tuner on tuning interval)

Miss rate

Sensor(energy model)

Store previou

s energy

(phase changes)

%∆E

Page 38: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 39 of 45

Controller Logic

Based on attack/decay online algorithm Increase tuning interval slow to avoid

overshooting Decrease tuning interval quickly to avoid wasted

energy Draw on fuzzy logic to stabilize tuning

interval Change tuning interval based on how close or far

the system is to being stable 2 part equation

Page 39: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 40 of 45

Controller Logic

%∆E

0100%

Change t

o t

unin

g inte

rval (∆

TI)

Stable System

PoS

1.0

Large energy change, tunes

too infrequently,

decrease interval

Small energy change,

tunes too frequently, increase interval

U

D

If %∆E < PoS,

y =1−U

PoSx + U

If %∆E >= PoS,

y =D −1

1− PoSx +1−

D −1

1− PoSPoS %∆E averaged

over last W measurements to eliminate erratic

behavior

Determine U, D, PoS and W through experimentation

%∆E

∆TI

Page 40: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 41 of 45

Tracking Interval Length Over Time

0

2000000

4000000

6000000

8000000

10000000

12000000

1 251 501 751 1001 1251 1501 1751 2001

Execution time (10k cycles)

Cycles

Phaseinterval

Tuninginterval

Tuning interval

oscillates near 1/2 of the

phase interval

Page 41: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 42 of 45

Online Cache Tuner Energy Savings

00.20.40.60.8

11.21.41.61.8

2

ps-jpeg/v42blit/g721Decbinary/pocsagjpegEnc/jpegDec

bcnt/epic

pegwitDec/g3fax

fir/bilv

ucbqsort/brevmatmul/mpegDec

pegwitEnc/rawcaudio

average

Optimal Tuner Tuning int = 1/2 phase int

Variable tuning interval

Base line

Observed similar results for less periodic systems.

29% energy

savings - within 8% of optimal

Norm

aliz

ed E

nerg

y

Page 42: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 43 of 45

Conclusions Developed a very efficient cache tuning heuristic

for a highly configurable cache Offers 18,000 different cache configurations 62% energy savings in the cache hierarchy while only

searching 0.2% of the search space Key: Combination of efficient heuristic method with

knowledge of architecture features Developed a feedback control system for online

cache tuning 29% energy savings on average - 8% from optimal Key: Application of control theory to online cache tuning Continuing work for more random systems

Page 43: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 44 of 45

Future Work Future work

Dynamic optimizations in a multi-core environment Cache hierarchy – some levels may be shared Dynamic load distribution Dynamic per-core shutdown or voltage reduction for reduced

power consumption Etc – Many single-core optimizations can be non-trivially

applied to a multi-core environment Dynamic tuning enables energy savings with no extra

designer effort – suitable for standard binary situations, changing environment situations, etc.

Other multi-core issues Ease development for a multi-core system

Designer writes an application without specialization for multi-core and the application is transparently mapped to a multi-core system

Architectural support for debugging - i.e. shared resources

Page 44: 1 Hot Caches, Cool Techniques: Online Tuning of Highly Configurable Caches for Reduced Energy Consumption Ann Gordon-Ross Department of Computer Science

Ann Gordon-Ross, UC Riverside 45 of 45

Publications Journal Papers

Frequent Loop Detection Using Non-Intrusive On-Chip Hardware A. Gordon-Ross, F. Vahid, IEEE Transactions on Computing - Best of the 2003 MICRO and CASES conferences special issue. Special Issue-Embedded Systems, Microarchitecture, and Compilation Techniques, in Memory of B. Ramakrishna (Bob) Rau, Oct. 2005, Vol. 54, Issue 10, pp 1203-1215.

Tiny Instruction Caches For Low Power Embedded Systems A. Gordon-Ross, S. Cotterell, F. Vahid, ACM Transactions on Embedded Computing Systems, Vol. 2, Issue 4, Nov. 2003, pp. 449-481.

Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example A. Gordon-Ross, S. Cotterell, F. Vahid, IEEE Computer Architecture Letters, Vol I, January 2002.

Conference Papers A One-Shot Configurable-Cache Tuner for Improved Energy and Performance A. Gordon-

Ross, P. Viana, F. Vahid, W. Najjar, E. Barros. IEEE/ACM DATE, April 2007. Configurable Cache Subsetting for Fast Cache Tuning P. Viana, A. Gordon-Ross, E. Keogh, E.

Barros, F. Vahid. IEEE DAC, July 2006 Fast Configurable-Cache Tuning with a Unified Second-Level Cache A. Gordon-Ross, F.

Vahid, N. Dutt. IEEE/ACM ISLPED, August 2005 A First Look at the Interplay of Code Reordering and Configurable Caches A. Gordon-Ross,

F. Vahid, N. Dutt. ACM GLSVLSI April 2005. Automatic Tuning of Two-Level Caches to Embedded Applications A. Gordon-Ross, F.

Vahid, N. Dutt IEEE/ACM DATE, February 2004. Frequent Loop Detection Using Non-Intrusive On-Chip Hardware A. Gordon-Ross, F. Vahid,

IEEE/ACM CASES, October 2003. Dynamic Loop Caching Meets Preloaded Loop Caching -- A Hybrid Approach A. Gordon-

Ross, F. Vahid, IEEE ICCD, September 2002. A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power F. Vahid,

A. Gordon-Ross, IEEE/ACM ISLPED, August 2001.