67
Dynamic Tag-Check Omission: A Low-Power Instruction Cache Architecture Exploiting Execution Footprints K. Inoue, V. Moshnyaga, and K. Murakami

Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Dynamic Tag-Check Omission: A Low-Power Instruction Cache Architecture

Exploiting Execution Footprints

K. Inoue, V. Moshnyaga, and K. Murakami

Page 2: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Introduction

DEC 21164 CPU* StrongARM SA-110 CPU* Bipolar ECL CPU**

25% 43% 50%* Kamble et. al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97** Joouppi et. al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor” ,IEEE Journal

of Solid-State Circuits’93

Increase in cache size

Power consumed in on-chip caches

Page 3: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Breakdown of Cache Energy

# of words in a Subbank-entry(Total # of Subbanks)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

8(1) 4(2) 2(4) 1(8)

This calculation is based on Kamble, et. Al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97

Word: 64 bitsCache Size: 64 KBLine Size: 64 B

Data(bit-lines)

Tag(bit-lines)

Others

Energy consumed in CacheEdecode + Esram + Eio

Etag + Edata

Breakdown of Esram per access

TagMemory

Data Memory (Cache Lines)

Cache Subbanking

Subbank

Page 4: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

History-Based Tag-Comparison (HBTC) Instruction Cache -Motivation-

Hit rate of instruction cache (I-$) is quite HIGH!Most of the tag-comparisons result in HIT

Conventional I-$Performs tag-comparison in EVERY cache access despite that some instructions obviously exist in cache.

Dissipates energyunnecessarily!

HBTC I-$OMITS tag-comparison for some cache accesses if it knows that the corresponding instructions obviously exist in cache.

Eliminate the energy consumed by the

redundant tag-checks!

Page 5: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Can We Know The Existence of Instructions in the I-Cache without Tag-Comparison?

YES!

We know that the target instruction exists in the cache !

Consider;• The target instruction has been referenced

before, and• No cache miss has occurred since the previous

reference of the target instruction.

Page 6: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

A?

1. Execute an instruction block A at time TLeave the execution footprint in the corresponding BTB-entry.$

(2. If a cache miss occurs, then erase all the footprints.)

3. Execute the instruction block A at time T+X

A $If the footprint is detected in the BTB, then omit the tag comparisons for all the instructions in the block A!

Exploit Execution Footprints Provided by the BTB!

But, How?

Page 7: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Branch Inst. Addr. Target Address

Target Address

BTB

EFT(Execution Footprint for Target)

EFF(EF for Fall-through)

Not-taken

I-$

ModeController Tag-check control

Branch Prediction Result

HBTC I-$ Architecture

Target Instruction Block

Branch Inst. Addr.

PBAregBranch Inst. Addr. Pred.

Result

PC

Address for EF writing

Page 8: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

HBTC I-$ OperationNormal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Mode Transition

BTB HitOM

NM TM

Page 9: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

BTB Hit

HBTC I-$ Operation

OM

NM TM

Mode Transition Read EFT and EFF

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 10: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

BTB Hit

HBTC I-$ Operation

OM

NM TM

EF==1Mode Transition Read

EFT and EFF

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 11: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

BTB Hit

HBTC I-$ Operation

OM

NM TM

EF==1

EF==0

Mode Transition Read EFT and EFF

PC and Pred.-result

PBAreg

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 12: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

BTB Hit

HBTC I-$ Operation

OM

NM TM

EF==1

EF==0

Mode Transition Read EFT and EFF

PC and Pred.-result

PBAreg

Cache miss!

Cache miss!

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 13: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

BTB Hit

HBTC I-$ Operation

OM

NM TM

EF==1

EF==0

Mode Transition Read EFT and EFF

PC and Pred.-result

PBAreg

Cache miss!

Cache miss!

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 14: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

BTB Hit

HBTC I-$ Operation

OM

NM TM

EF==1

EF==0

Mode Transition Read EFT and EFF

PC and Pred.-result

PBAreg

Cache miss!

Cache miss!

BTB Hit!Instruction

Block

Previous BTB hit

Current BTB hit

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 15: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

BTB Hit

HBTC I-$ Operation

OM

NM TM

EF==1

EF==0

Mode Transition Read EFT and EFF

PC and Pred.-result

PBAreg

Cache miss!

Cache miss!

BTB Hit!Instruction

Block

Previous BTB hit

Current BTB hitValidate the EF pointed by the PBAreg!

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 16: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

HBTC I-$ Operation

OM

NM TM

BTB HitGOtoNM

GOtoNM

GOtoNMI-Cache miss orBTB replacement orRAS address orBranch misprediction

Mode Transition

All EFs are invalidated!

EF==1

EF==0

Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checksTracing Mode (TM): w/ Tag checks

(on a BTB hit, the EF addressed by the PBAreg is set to 1)

Page 17: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

EFT EFF MODE (NM,OM,TM)

1-C

Execution Flow

State of BTB

Time(Iteration Count – Address of Branch)

Branch Target Buffer

Operation Example

PBAregAdr:T/N12 345

IterationCount

Page 18: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Performing!

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

Page 19: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

Page 20: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Omitting!

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

Page 21: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

Page 22: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

Performing!

Page 23: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Page 24: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

Page 25: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

Performing!

Page 26: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

--:--Branch-C3-CBranch-D

Page 27: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

--:--Branch-C3-CBranch-D

Omitting-Mode

Page 28: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

--:--Branch-C3-CBranch-D

Omitting-ModeOmitting!

Page 29: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

--:--Branch-C3-CBranch-D

Omitting-Mode

--:--Branch-C3-DBranch-D

Omitting-Mode

Page 30: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

--:--Branch-C3-CBranch-D

Omitting-Mode

--:--Branch-C3-DBranch-D

Omitting-Mode

Omitting!

Page 31: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

12 345

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitialBranch-D

--:--Branch-C1-CBranch-D

Omitting-Mode

C:NBranch-C2-CBranch-D

Tracing-Mode

D:TBranch-C2-DBranch-D

Tracing-Mode

--:--Branch-C3-CBranch-D

Omitting-Mode

--:--Branch-C3-DBranch-D

Omitting-ModePerforming!Performing!Omitting!

Page 32: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Performance/Energy Overhead

PerformanceBTB access conflict (1 stall cycle)

for execution footprint writingfor execution footprint invalidation

EnergyEnergy for execution-footprint access

for reading (every BTB look-up)for writing (BTB hit in TMode)for invalidating (Cache miss or BTB replace)

Page 33: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Evaluation – Environment –•Cache Energy Model based on Kamble[97]

includes BTB access overhead•SimpleScalar Simulator

16 KB I-cache (partitioned into 4 subbanks), 32 B block, 2-bit bimod predictor4-way 2K-enetry BTB

•Benchmark6 from the SPECint, 4 from the Mediabench

Page 34: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Nor

mal

ized

Ta

g-C

ompa

rison

Cou

nt

0

0.2

0.4

0.6

0.8

099.go 129.compress 130.li adpcm(e) mpeg2(e)124.m88ksim 126.gcc 132.ijpeg adpcm(d) mpeg2(d)

Evaluation – Tag-Check Counts –

ITCHBTCComb.

The ITC approach works well for all programs.The HBTC approach works well for media programs.The hybrid approach makes significant reductions!

Page 35: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

099.go 129.compress 130.li adpcm(e) mpeg2(e)124.m88ksim 126.gcc 132.ijpeg adpcm(d) mpeg2(d)

Nor

mal

ized

Cac

he E

nerg

y

EbtbaddEbtbaddEoutputEoutputEtagEtagEdataEdata

BTB energy overhead does not have large impact.For media programs, Ecache saving is more than 15 %.

Evaluation – Cache Energy –

Page 36: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

0.9

0.92

0.94

0.96

0.98

1

1.02

1.04

099.go 126.compress 130.li adpcm(e) mpeg2(e)124.m88ksim 126.gcc 132.ijpeg adpcm(d) mpeg2(d)

Nor

mal

ized

Exe

cutio

n Ti

me

For media programs, the performance overhead is trivial.For others, the performance degradation might not be acceptable.

Evaluation – Execution Time –

Page 37: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

0

0.5

1

1.5

2

2.5

3

3.5

1 2 4 8 16 32

099.go

124.m88ksim

126.gcc

129.compress

130.li

132.ijpeg

adpcm(e)

adpcm(d)

mpeg2(e)

mpeg2(d)

Nor

mal

ized

Exe

cutio

n Ti

me

If the penalty is equal to or smaller than 4 cycles, the performance overhead is trivial.If the penalty is greater than 4 cycles, the performance overhead is serious.

Evaluation – Invalidation Penalty –

Invalidation Penalty [clock cycle]

Page 38: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Conclusions

1. Exploits execution footprints recorded in the BTB.2. Reduces tag-comparison count by 95% (adpcm(d)).3. Achieves 17 % of cache energy saving!

History-Based Tag-Comparison Instruction Cache

Future work• Analyze energy consumption based on chip design.

Page 39: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Buck Up Slides(History-based Tag-Comparison Cache)

Page 40: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Outline

1. Introduction2. History-Based Tag-Comparison Cache

• Motivation• Mechanism• Architecture• Operation

3. Evaluations4. Conclusions

Page 41: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Low Power Caches- Reducing both Etag and Edata -

Adding a small L0 cache

L1 Cache

L0 CacheProcessor

•Filter Cache•S-Cache•Block Buffering

Dividing cache moduleCache•MDM Cache

Multiple accessing•MRU Cache•Hash-Rehash Cache

Sequential Way-Access

way

0w

ay1

way

2w

ay3

Page 42: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Dividing cache module •Cache Sub-Banking

Accessing sequentially•Phased Cache•Pipelined Cache

Tag Line

Tag Line

Hit!Miss!

Replace

Low Power Caches- Reducing Edata -

Page 43: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Breakdown of Esram

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

8(1) 4(2) 2(4) 1(8)

# of words in a Subbank (Total # of Subbanks)

Bre

akdo

wn

of E

nerg

y

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

8(1) 4(2) 2(4) 1(8)

32-bit CPU 64-bit CPU

Esram_data_bit Esram_tag_bit

Esram_others This calculation is based on Kamble, et. Al., “Analytical energy Dissipation Models for Low Power Caches”, ISLPED’97

CS = 32 KBL S = 32 B

CS = 64 KBLS = 64 B

CS: Cache SizeLS: Line Size

Page 44: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

EFT EFF MODE (NM,OM,TM)

1-C

Execution Flow

State of BTB

Time(Iteration Count – Address of Branch)

Branch Target Buffer

Operation Example

PBAregAdr:T/N

Page 45: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Performing!

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Page 46: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Page 47: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Performing!

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Page 48: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Page 49: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Performing!

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Page 50: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3--:--

Page 51: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3--:--

Omitting!Omitting!

Page 52: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3--:--

Branch-CC-4C:N

EFF is selected!

Page 53: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3--:--

Performing!Performing! Branch-CC-4C:N

EFF is selected!

Page 54: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

PBAreg--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3--:--

Branch-CC-4C:N

Branch-CD-4D:TBranch-D New Entry

Registration!

Page 55: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3--:--

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Performing!Performing!

Page 56: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3--:--

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

EFF of Branch-C

Page 57: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Omitting!Omitting!

Page 58: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Branch-CD-5--:--Branch-D

Page 59: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Omitting!Omitting!

Branch-CD-5--:--Branch-D

Page 60: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Branch-CD-5--:--Branch-D

Branch-CC-6--:--Branch-D

Page 61: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Omitting!Omitting!

Branch-CD-5--:--Branch-D

Branch-CC-6--:--Branch-D

Page 62: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Branch-CD-5--:--Branch-D

Branch-CC-6--:--Branch-D

Branch-CD-6--:--Branch-D

Page 63: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Omitting!Omitting!

Branch-CD-5--:--Branch-D

Branch-CC-6--:--Branch-D

Branch-CD-6--:--Branch-D

Page 64: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Branch-CD-5--:--Branch-D

Branch-CC-6--:--Branch-D

Branch-CD-6--:--Branch-D

Branch-CB-7B:TBranch-D

Branch-B

Page 65: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Top

Branch to F

Branch to A

Branch to A

Top

A

B

C

D

F

1234 567

IterationCount

Operation ExampleEFT EFF MODE (NM,OM,TM)

--:--Branch-CInitial

Branch-CC-1C:T

Branch-CC-2C:T

Branch-CC-3C:T

Branch-CC-4C:N

Branch-CD-4D:TBranch-D

Branch-CC-5--:--Branch-D

Performing!Performing!

Branch-CD-5--:--Branch-D

Branch-CC-6--:--Branch-D

Branch-CD-6--:--Branch-D

Branch-CB-7B:TBranch-D

Branch-B

Page 66: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

Hit rate of instruction cache (I-$) is quite HIGH!Most of the tag-comparisons result in MATCH

Conventional I-$Performs tag check in EVERYcache access despite that some instructions obviously exist in the cache.

Wastes unnecessary energy!

Conventional Tag-Check Scheme

Cache-line address0 511255

100

104

1

Ave

. # o

f ref

eren

ces

Per

sta

ble-

time

16 KB Direct-Mapped Cache (32 B Lines)132.ijpeg

Page 67: Dynamic Tag-Check Omission: A Low-Power Instruction Cache ...koji.inoue/paper/2002/PACS02slides.pdf · Breakdown of Cache Energy # of words in a Subbank-entry (Total # of Subbanks)

History-Based Tag-Check (HBTC) Scheme (1/2)• An instruction has been executed at least once.• No cache miss has occurred since the last

reference of the instruction.

We know that the instruction exists in the cache now!

How can we detect these conditions at run time?