30
Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori , Maziar Goudarzi , Koji Inoue , and Kazuaki Murakami Speaker: Tohru Ishihara Institute of Systems & Information Technologies/KYUSHU, Japan Kyushu University, Japan

Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

Embed Size (px)

Citation preview

Page 1: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

Improving Energy Efficiency of Configurable Caches via

Temperature-Aware Configuration Selection

Hamid Noori† , Maziar Goudarzi‡ , Koji Inoue ‡ , andKazuaki Murakami ‡

Speaker: Tohru Ishihara ‡

†Institute of Systems & Information Technologies/KYUSHU, Japan ‡Kyushu University, Japan

Page 2: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

2/26 ISVLSI2008@Montpellier, FranceKyushu University

Outline

Background Motivation Problem Definition Proposed Approach

Architecture Reconfiguration Flow

Experimental Results Conclusions

Page 3: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

3/26 ISVLSI2008@Montpellier, FranceKyushu University

Outline

Background Motivation Problem Definition Proposed Approach

Architecture Reconfiguration Flow

Experimental Results Conclusions

Page 4: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

4/26 ISVLSI2008@Montpellier, FranceKyushu University

Background(1/2)

Vdd:180nm = 1.66V100nm = 1.125V

70nm = 0.9 V

Temperature:Dynamic energy is

temperature independent

0

0.05

0.1

0.15

0.2

0.25

0.3

180nm 100nm 70nm

Technology

Dy

na

mic

En

erg

y (

nJ

)

32K 16K 8K 4K 2K 1K

Vdd:180nm = 1.66V

100nm = 1.125V70nm = 0.9V

Temperatue:100°C

0

50

100

150

200

250

300

180nm 100nm 70nm

Technology

Le

ak

ag

e P

ow

er

(mW

)

32K 16K 8K 4K 2K 1K

The dynamic energy per a cache access

The leakage power of a cache memory

Page 5: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

5/26 ISVLSI2008@Montpellier, FranceKyushu University

Background(2/2)

Vdd:180nm = 1.66V

100nm = 1.125V70nm = 0.9V

Cache Size:32KB

0

20

40

60

80

100

120

140

0°C 20°C 40°C 60°C 80°C 100°C

Temperature

Le

ak

ag

e P

ow

er

for

Ca

ch

e

32

KB

(m

W)

180nm 100nm 70nm

Page 6: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

6/26 ISVLSI2008@Montpellier, FranceKyushu University

Outline

Background Motivational Example Problem Definition Proposed Approach

Architecture Reconfiguration Flow

Experimental Results Conclusions

Page 7: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

7/26 ISVLSI2008@Montpellier, FranceKyushu University

Motivational Example (1/3)

Execution time is Technology &Temperature Independent

0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

16000000

18000000

20000000

128K 64K 32K 16K 8K 4K 2K 1K

Instruction Cache Size - qsort

No

. of

Ex

ec

uti

on

Clo

ck

Cy

cle

s (

K)

Page 8: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

8/26 ISVLSI2008@Montpellier, FranceKyushu University

Motivational Example (2/3)

Technology: 70nm

Vdd: 0.9V

0

500

1000

1500

2000

2500

128K 64K 32K 16K 8K 4K 2K 1K

Cache SizeS

tati

c E

ne

rgy

(m

J)

0°C

20°C

40°C

60°C

80°C

100°C

Technology:70nm

Vdd: 0.9V

Dynamic Energy isTemperature Independent

0

500

1000

1500

2000

2500

3000

3500

4000

128K 64K 32K 16K 8K 4K 2K 1K

Cache Size

Dy

na

mic

En

erg

y (

mJ

)

Total dynamic energy for executing a program

Total static energy for executing a program

Page 9: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

9/26 ISVLSI2008@Montpellier, FranceKyushu University

Motivational Example (3/3)

Technology: 70nm

Vdd: 0.9V

0

500

1000

1500

2000

2500

3000

3500

4000

4500

128K 64K 32K 16K 8K 4K 2K 1K

Instruction Cache Size - qsort

To

tal

En

erg

y (

mJ

)

0°C 20°C 40°C

60°C 80°C 100°C

Minimum-energy cache size

Page 10: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

10/26 ISVLSI2008@Montpellier, FranceKyushu University

Outline

Background Motivation Problem Definition Proposed Approach

Architecture Reconfiguration Flow

Experimental Results Conclusions

Page 11: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

11/26 ISVLSI2008@Montpellier, FranceKyushu University

Problem Definition (1/3)

Objective function: total memory energy Cache dynamic energy Cache static energy Off-chip memory access energy Energy consumption during processor stall

CPUI-$

D-$

Mainmemory

Page 12: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

12/26 ISVLSI2008@Montpellier, FranceKyushu University

Problem Definition (2/3)energy_memory(C, Temp, Tech) =

energy_dynamic(C, Tech) + energy_static(C, Temp, Tech) (1)

energy_dynamic(C, Tech) = cache_accesses(C) * energy_cache_access(C, Tech) +

cache_misses(C) * energy_miss(C,Tech) (2)

energy_miss(C, Tech) = energy_off_chip_stall + energy_cache_block_refill(C, Tech) (3)

energy_static(C, Temp, Tech) = executed_clock_cycles(C) * clock_period * leakage_power(C, Temp, Tech) (4)

Page 13: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

13/26 ISVLSI2008@Montpellier, FranceKyushu University

Problem Definition (3/3)

“For a given application, processor architecture, technology, and valid configurations of the configurable cache, find a valid cache configuration that results in minimum energy consumption in a specific temperature over the entire execution of the given application.”

Page 14: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

14/26 ISVLSI2008@Montpellier, FranceKyushu University

Outline

Background Motivation Problem Definition Proposed Approach

Architecture Reconfiguration Flow

Experimental Results Conclusions

Page 15: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

15/26 ISVLSI2008@Montpellier, FranceKyushu University

Architecture

TACC BCC (proposed by Zhang et al. [1])

Cache size (way shutdown) Number of ways (way concatenation) Line size

Thermal sensor Accessible port for reading the thermal sensor

[1] C. Zang, F. Vahid and W. Najjar,.“A Highly Configurable Cache Architecture for Embedded Systems,” ACM Trans. on Embedded Computing Systems, vol.4, no.2, May 2005

Page 16: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

16/26 ISVLSI2008@Montpellier, FranceKyushu University

Reconfiguration FlowStatic and dynamicpower for differentcache configuration

and temperatures forthe target technology

Execution time, number ofhits and misses for

different cacheconfigurations obtained

through running theapplication on an ISS

Determining thelowest energy cache

configuration fordifferent targettemperatures

Fill the lookup table of theconfigurable cache withproper configuration for

each temperature

Evaluationphase

(offline)

Detect the currenttemperature

Use the lookup table andload the proper

configuration for thecurrent temperature

Execute theapplication

Reconfigurationphase (online)

Page 17: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

17/26 ISVLSI2008@Montpellier, FranceKyushu University

Outline

Background Motivation Problem Definition Proposed Approach

Architecture Reconfiguration Flow

Experimental Results Conclusions

Page 18: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

18/26 ISVLSI2008@Montpellier, FranceKyushu University

Experiment Setup (1/2)

Mibench Simplescalar

Cache hit: one clock cycle Cache miss: 100 clock cycles Clock freq of the base processor: 200 MHz

CACTI 4.2 Target technology 70nm (Vdd=0.9)

BCC (16KB) 16KB (4-, 2-, 1-way) 8KB (2-, and 1-way) 4KB (1-way) The line size for each of the configurations can be 8-, 16-, or 32-

byte.

Page 19: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

19/26 ISVLSI2008@Montpellier, FranceKyushu University

Experimental Setup (2/2) Base Configurable Cache (BCC)

It has the same architecture proposed by Zhang et al. [1] It supports a limited set of configurations It is configured for each application for corner-case (i.e.

leakage at 100°C)

Temperature-Aware Configurable Cache (TACC) TACC is configured for each execution of an application

considering the chip temperature at that time

[1] C. Zang, F. Vahid and W. Najjar,.“A Highly Configurable Cache Architecture for Embedded Systems,” ACM Trans. on Embedded Computing Systems, vol.4, no.2, May 2005

Page 20: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

20/26 ISVLSI2008@Montpellier, FranceKyushu University

Energy & Performance Evaluation

Energy Saving =

100__

_100__

tempBCCenergy

TACCenergytempBCCenergy × 100

BCCtimeexec

TACCtimeexecBCCtimeexec

__

____ Performance Enhancement =

× 100

Page 21: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

21/26 ISVLSI2008@Montpellier, FranceKyushu University

Data and Instruction CacheD$ qsort djpeg lame dijkstra patricia sha adpcm crc fft

0°C 16K, 32, 2 16K, 32, 2 16K, 32, 4 16K, 32, 2 16K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 4

20°C 8K, 32, 2 16K, 32, 2 16K, 32, 4 16K, 32, 2 16K, 32, 2 8K, 32, 1 8K, 32, 2 8K, 32, 2 16K, 32, 4

40°C 8K, 32, 2 16K, 32, 2 16K, 32, 4 8K, 32, 2 16K, 32, 2 4K, 32, 1 8K, 32, 2 8K, 32, 2 16K, 32, 4

60°C 8K, 32, 2 16K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 16, 1 8K, 32, 2 8K, 32, 2

80°C 8K, 32, 2 8K, 32, 2 16K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 16, 1 4K, 32, 1 8K, 32, 2

100°C 4K, 32, 1 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 4K, 32, 1 4K, 32, 1 4K, 32, 1 8K, 32, 2

I$ basimath qsort djpeg lame dijkstra blowfish rijndael gsm fft

0°C 16K, 8, 4 16K, 8, 4 16K, 32, 1 16K, 32, 2 16K, 32, 1 16K, 16, 2 16K, 32, 1 16K, 16, 4 8K, 32, 1

20°C 16K, 16, 4 16K, 16, 4 16K, 32, 1 16K, 32, 2 16K, 32, 1 16K, 16, 2 16K, 32, 1 16K, 32, 2 8K, 32, 1

40°C 16K, 16, 4 16K, 16, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 16K, 32, 1 16K, 32, 2 8K, 32, 1

60°C 16K, 16, 4 16K, 16, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 16K, 32, 1 8K, 32, 2 8K, 32, 1

80°C 16K, 32, 4 16K, 32, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 1 4K, 32, 1 8K, 32, 1

100°C 16K, 32, 4 16K, 32, 4 8K, 32, 2 8K, 32, 2 8K, 32, 2 8K, 32, 2 16K, 32, 2 4K, 32, 1 8K, 32, 1

Page 22: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

22/26 ISVLSI2008@Montpellier, FranceKyushu University

Energy Saving

Technology: 70nmVdd: 0.9V

BCC & TACC Max. Size = 16KB

Operation Temperature

0

10

20

30

40

50

60

70

80

basic

mat

hqso

rt

susa

n

cjpeg

djpeg

lam

e

dijkst

ra

patric

ia

blowfis

h

rijndae

lsh

agsm

adpc

m crc fft

aver

age-

DC

max

-IC

aver

age-

IC

En

erg

y s

av

ing

(%

)

0°C

20°C

60°C

Page 23: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

23/26 ISVLSI2008@Montpellier, FranceKyushu University

Performance Enhancement

Technology:70nm Vdd:0.9VBCC & TACC Max. Size = 16KB

OperationTemperature

0

5

10

15

20

25

30

basicm

ath

qsort

susa

n

cjpeg

djpeg

lam

e

dijkst

ra

patrici

a

blowfis

h

rijndae

lsh

agsm

adpcm cr

c fft

aver

age-D

C

max-

IC

aver

age-IC

Per

form

ance

en

han

cem

ent

(%) 0°C

20°C

60°C

Page 24: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

24/26 ISVLSI2008@Montpellier, FranceKyushu University

Outline

Background Motivation Problem Definition Proposed Approach

Architecture Reconfiguration Flow

Experimental Results Conclusions

Page 25: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

25/26 ISVLSI2008@Montpellier, FranceKyushu University

Conclusions

1. Importance of temperature-aware configurable cache for finer technologies. Up to 61% (17% on average) energy consumption in 70nm technology for instruction cache

2. Data cache is more easily affected by temperature than instruction cache. Using a configurable data cache, up to 77% (36% on average) energy can be saved in 70nm technology.

3. The TACC improves the performance for instruction cache up to 28% (5% on average) and for data cache, it is up to 17% (8.1% in average).

Page 26: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

26/26 ISVLSI2008@Montpellier, FranceKyushu University

Thank you for your attention

Please ask any questions to [email protected]

Page 27: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

27/26 ISVLSI2008@Montpellier, FranceKyushu University

Backup slides

Page 28: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

28/26 ISVLSI2008@Montpellier, FranceKyushu University

Page 29: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

29/26 ISVLSI2008@Montpellier, FranceKyushu University

Technology: 180nm

Vdd: 1.66V

0

500

1000

1500

2000

2500

3000

3500

4000

128K 64K 32K 16K 8K 4K 2K 1K

Instruction Cache Size - qsort

To

tal E

ne

rgy

(m

J)

0°C 20°C 40°C

60°C 80°C 100°C

Page 30: Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki

30/26 ISVLSI2008@Montpellier, FranceKyushu University

ARM7TDMI ARM966E-S

130nm Power consumption

7.98 mW 62.5 mW

Frequency 133 MHz 250 MHz

90nm Power consumption

7.08 mW 51.7 mW

Frequency 236 MHz 470 MHz