49
Exploi’ng Compressed Block Size as an Indicator of Future Reuse Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch

Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

  • Upload
    ngohanh

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Exploi'ng  Compressed  Block  Size  as  an  Indicator  of  Future  Reuse  

Gennady  Pekhimenko,    Tyler  Huberty,  Rui  Cai,    Onur  Mutlu,  Todd  C.  Mowry      

     

Phillip  B.  Gibbons,    Michael  A.  Kozuch  

   

Page 2: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Execu've  Summary  

2  

   

•  In  a  compressed  cache,  compressed  block  size  is  an  addiDonal  dimension  

•  Problem:  How  can  we  maximize  cache  performance  uDlizing  both  block  reuse  and  compressed  size?  

•  Observa'ons:    –  Importance  of  the  cache  block  depends  on  its  compressed  size  –  Block  size  is  indicaDve  of  reuse  behavior  in  some  applicaDons  

•  Key  Idea:  Use  compressed  block  size  in  making  cache  replacement  and  inserDon  decisions  

•  Results:  –  Higher  performance  (9%/11%  on  average  for  1/2-­‐core)  –  Lower  energy  (12%  on  average)  

Page 3: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Poten'al  for  Data  Compression  

3  Compression  Ra'o  

Latency  

0.5   1.5  1.0   2.0  

FPC  X  

C-­‐Pack  X  

BDI  X  

SC2  

X  

Frequent  PaXern    Compression  (FPC)  

[Alameldeen+,  ISCA’04]  

C-­‐Pack  [Chen+,    Trans.  on  VLSI’10]  

Base-­‐Delta-­‐Immediate  (BDI)  [Pekhimenko+,  PACT’12]  StaDcDcal  Compression  (SC2)  [Arelakis+,  ISCA’14]  

5  cycles  

9  cycles  

1-­‐2  cycles  

8-­‐14  cycles  

Can  we  do  beSer?  All  use  LRU  replacement  policy  All  these  works  showed  significant  performance  improvements  

Page 4: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Size-­‐Aware  Cache  Management  

1.  Compressed  block  size  maXers  

2.  Compressed  block  size  varies  

3.  Compressed  block  size  can  indicate  reuse  

4  

Page 5: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

#1:  Compressed  Block  Size  MaSers  

5  

A   Y   B  Cache:  

X  miss   hit  

A   Y  hit  

B  miss  C  

miss  

C  

(me  

X  miss   hit  

B   C  (me  

A   Y  miss   hit   hit  

Belady’s  OPT  Replacement  

Size-­‐Aware  Replacement  

saved  cycles  

Size-­‐aware  policies  could  yield  fewer  misses    

X    A    Y    B    C  Access  stream:  

large   small  

Page 6: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

#2:  Compressed  Block  Size  Varies  

6  

0%  

20%  

40%  

60%  

80%  

100%  

mcf   lbm   astar   tpch2   calculix  h264ref   wrf   povray   gcc  

Cache  Co

verage,  %

 

size:  0-­‐7   size:  8-­‐15   size:  16-­‐23   size:  24-­‐31   size:  32-­‐39  size:  40-­‐47   size:  48-­‐55   size:  56-­‐63   size:  64  

Compressed  block  size  varies  within  and  between  applicaDons  

Block  size,  bytes  

BDI  compression  algorithm  [Pekhimenko+,  PACT’12]    

Page 7: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

#3:  Block  Size  Can  Indicate  Reuse  

7  

0

2000

4000

6000

1 8 16 20 24 34 36 40 64

Reu

se D

ista

nce

(# o

f mem

ory

acce

sses

) bzip2  

Block  size,  bytes  

Different  sizes  have  different  dominant  reuse  distances  

Page 8: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Code  Example  to  Support  Intui'on  

int  A[N];                                  //  small  indices:  compressible  double  B[16];              //  FP  coef1icients:  incompressible  for  (int  i=0;  i<N;  i++)  {              int  idx  =  A[i];              for  (int  j=0;  j<N;  j++)  {                    sum  +=  B[(idx+j)%16];            }  }  

8  

long  reuse,  compressible  

short  reuse,  incompressible  

Compressed  size  can  be  an  indicator  of  reuse  distance  

Page 9: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Size-­‐Aware  Cache  Management  

1.  Compressed  block  size  maXers  

2.  Compressed  block  size  varies  

3.  Compressed  block  size  can  indicate  reuse  

9  

Page 10: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Outline  

10  

• MoDvaDon  •  Key  ObservaDons  •  Key  Ideas:  • Minimal-­‐Value  EvicDon  (MVE)  •  Size-­‐based  InserDon  Policy  (SIP)  

•  EvaluaDon  •  Conclusion  

Page 11: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Compression-­‐Aware  Management  Policies  (CAMP)  

11  

CAMP  

MVE:    Minimal-­‐Value  

Evic'on  

SIP:  Size-­‐based    

Inser'on  Policy  

Page 12: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Define  a  value  (importance)  of  the  block  to  the  cache  Evict  the  block  with  the  lowest  value  

Minimal-­‐Value  Evic'on  (MVE):  Observa'ons  

12  

Block  #0  Block  #1  

Block  #N  

…  

Highest  priority:  MRU  

Lowest  priority:  LRU  

Set:  

Block  #X  Block  #X  

…   Importance  of  the  block  depends  

on  both  the  likelihood  of  reference  and  the  compressed  block  size  

Highest  value  

Lowest  value  

Page 13: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Minimal-­‐Value  Evic'on  (MVE):  Key  Idea  

13  

Value          =      Probability  of  reuse  

 

     Compressed  block  size  

Probability  of  reuse:  •  Can  be  determined  in  many  different  ways  •  Our  implementaDon  is  based  on  re-­‐reference  

interval  predicDon  value  (RRIP  [Jaleel+,  ISCA’10])  

Page 14: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Compression-­‐Aware  Management  Policies  (CAMP)  

14  

CAMP  

MVE:    Minimal-­‐Value  

Evic'on  

SIP:  Size-­‐based    

Inser'on  Policy  

Page 15: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Size-­‐based  Inser'on  Policy  (SIP):  Observa'ons  •  SomeDmes  there  is  a  rela'on  between  the  compressed  block  size  and  reuse  distance  

15  

compressed    block  size  data  

structure  

•  This  relaDon  can  be  detected  through  the  compressed  block  size  

•  Minimal  overhead  to  track  this  relaDon  (compressed  block  informaDon  is  a  part  of  design)  

reuse  distance  

Page 16: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Size-­‐based  Inser'on  Policy  (SIP):  Key  Idea  •  Insert  blocks  of  a  certain  size  with  higher  priority  if  this  improves  cache  miss  rate  

•  Use  dynamic  set  sampling  [Qureshi,  ISCA’06]  to  detect  which  size  to  insert  with  higher  priority  

16  

set  A  set  B  set  C  set  D  set  E  

set  G  set  H  

set  F  

set  I  

Main  Tags  

set  A   8B  

Auxiliary  Tags  

set  D   64B  

set  F   8B  

set  I   64B  

set  A  

set  F  

+  CTR8B  -­‐  …   miss  

set  F   8B  

set  A   8B  

…  miss  

decides  policy  for  steady  state  

Page 17: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Outline  

17  

• MoDvaDon  •  Key  ObservaDons  •  Key  Ideas:  • Minimal-­‐Value  EvicDon  (MVE)  •  Size-­‐based  InserDon  Policy  (SIP)  

•  EvaluaDon  •  Conclusion  

Page 18: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Methodology  

18  

•  Simulator  –   x86  event-­‐driven  simulator  (MemSim  [Seshadri+,  PACT’12])  

•  Workloads  –  SPEC2006  benchmarks,  TPC,  Apache  web  server  –  1  –  4  core  simulaDons  for  1  billion  representaDve  instrucDons  

•  System  Parameters  –  L1/L2/L3  cache  latencies  from  CACTI  –  BDI  (1-­‐cycle  decompression)  [Pekhimenko+,  PACT’12]  –  4GHz,  x86  in-­‐order  core,  cache  size  (1MB  -­‐  16MB)  

Page 19: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Evaluated  Cache  Management  Policies  

19  

Design    

Descrip'on  

LRU   Baseline  LRU  policy  -­‐  size-­‐unaware  

RRIP   Re-­‐reference  Interval  PredicDon  [Jaleel+,  ISCA’10]  -­‐  size-­‐unaware  

Page 20: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Evaluated  Cache  Management  Policies  

20  

Design    

Descrip'on  

LRU   Baseline  LRU  policy  -­‐  size-­‐unaware  

RRIP   Re-­‐reference  Interval  PredicDon  [Jaleel+,  ISCA’10]  -­‐  size-­‐unaware  

ECM   EffecDve  Capacity  Maximizer  [Baek+,  HPCA’13]  +  size-­‐aware  

Page 21: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Evaluated  Cache  Management  Policies  

21  

Design    

Descrip'on  

LRU   Baseline  LRU  policy  -­‐  size-­‐unaware  

RRIP   Re-­‐reference  Interval  PredicDon  [Jaleel+,  ISCA’10]  -­‐  size-­‐unaware  

ECM   EffecDve  Capacity  Maximizer  [Baek+,  HPCA’13]  +  size-­‐aware  

CAMP   Compression-­‐Aware  Management  Policies  (MVE  +  SIP)  

Page 22: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Size-­‐Aware  Replacement  

22  

•  EffecDve  Capacity  Maximizer  (ECM)            [Baek+,  HPCA’13]  –  Inserts  “big”  blocks  with  lower  priority  – Uses  heurisDc  to  define  the  threshold  

•  Shortcomings  – Coarse-­‐grained  – No  relaDon  between  block  size  and  reuse  – Not  easily  applicable  to  other  cache  organizaDons  

Page 23: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

CAMP  Single-­‐Core    

23  

0.8  

0.9  

1  

1.1  

1.2  

1.3  

astar  

cactusAD

M  

gcc  

gobm

k  h2

64ref  

soplex  

zeusmp  

bzip2  

GemsFDT

D  grom

acs  

leslie3d  

perlb

ench  

sjeng  

sphinx3  

xalancbm

k  milc  

omne

tpp  

libqu

antum  

lbm  

mcf  

tpch2  

tpch6  

apache

 

GMeanInten

se  

GMeanA

ll  

Normalized

 IPC  

RRIP   ECM   MVE   SIP   CAMP  SPEC2006,  databases,  web  workloads,  2MB  L2  cache  

CAMP  improves  performance  over  LRU,  RRIP,  ECM  

31%  

8%  5%  

Page 24: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Mul'-­‐Core  Results  •  ClassificaDon  based  on  the  compressed  block  size  distribuDon  – Homogeneous  (1  or  2  dominant  block  size(s))  – Heterogeneous  (more  than  2  dominant  sizes)  

•  We  form  three  different  2-­‐core  workload  groups  (20  workloads  in  each):  – Homo-­‐Homo  – Homo-­‐Hetero  – Hetero-­‐Hetero  

24  

Page 25: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Mul'-­‐Core  Results  (2)    

25  

0.95  

1.00  

1.05  

1.10  

Homo-­‐Homo   Homo-­‐Hetero   Hetero-­‐Hetero   GeoMean  

Normalized

 Weighted  Speedu

p   RRIP   ECM   MVE   SIP   CAMP  

CAMP  outperforms  LRU,  RRIP  and  ECM  policies  Higher  benefits  with  higher  heterogeneity  in  size  

Page 26: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Effect  on  Memory  Subsystem  Energy  

26  

0.7  

0.8  

0.9  

1  

1.1  

1.2  astar  

cactusAD

M  

gcc  

gobm

k  h2

64ref  

soplex  

zeusmp  

bzip2  

GemsFDT

D  grom

acs  

leslie3d  

perlb

ench  

sjeng  

sphinx3  

xalancbm

k  milc  

omne

tpp  

libqu

antum  

lbm  

mcf  

tpch2  

tpch6  

apache

 

GMeanInten

se  

GMeanA

ll  

Normalized

 Ene

rgy   RRIP   ECM   CAMP  

CAMP  reduces  energy  consumpDon  in  the  memory  subsystem  

5%  

L1/L2  caches,  DRAM,  NoC,  compression/decompression  

Page 27: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

CAMP  for  Global  Replacement  

27  

G-­‐CAMP  

G-­‐MVE:    Global  Minimal-­‐Value  Evic'on  

G-­‐SIP:  Global  Size-­‐based    Inser'on  Policy  

CAMP  with  V-­‐Way  [Qureshi+,  ISCA’05]  cache  design  

Reuse  Replacement  instead  of  RRIP  Set  Dueling  instead  of  Set  Sampling  

Page 28: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

G-­‐CAMP  Single-­‐Core    

28  

SPEC2006,  databases,  web  workloads,  2MB  L2  cache  

G-­‐CAMP  improves  performance  over  LRU,  RRIP,  V-­‐Way  

0.8  0.9  1  

1.1  1.2  1.3  

astar  

cactusAD

M  

gcc  

gobm

k  h2

64ref  

soplex  

zeusmp  

bzip2  

GemsFDT

D  grom

acs  

leslie3d  

perlb

ench  

sjeng  

sphinx3  

xalancbm

k  milc  

omne

tpp  

libqu

antum  

lbm  

mcf  

tpch2  

tpch6  

apache

 

GMeanInten

se  

GMeanA

ll  

Normalized

 IPC  

RRIP   V-­‐WAY   G-­‐MVE   G-­‐SIP   G-­‐CAMP  63%-­‐97%    

14%  9%  

Page 29: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Storage  Bit  Count  Overhead  

29  

Designs  (with  BDI)  

LRU   CAMP   V-­‐Way   G-­‐CAMP  

tag-­‐entry   +14b   +14b   +15b   +19b  

data-­‐entry   +0b   +0b   +16b   +32b  

total   +9%   +11%   +12.5%   +17%  

Over  uncompressed  cache  with  LRU  

2%   4.5%  

Page 30: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Cache  Size  Sensi'vity  

30  

1  

1.1  

1.2  

1.3  

1.4  

1.5  

1.6  

1M   2M   4M   8M   16M  

Normalized

 IPC   LRU   RRIP   ECM   CAMP   V-­‐WAY   G-­‐CAMP  

CAMP/G-­‐CAMP  outperforms  prior  policies  for  all  $  sizes  G-­‐CAMP  outperforms  LRU  with  2X  cache  sizes  

Page 31: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Other  Results  and  Analyses  in  the  Paper  

•  Block  size  distribuDon  for  different  applicaDons  •  SensiDvity  to  the  compression  algorithm  •  Comparison  with  uncompressed  cache  •  Effect  on  cache  capacity  •  SIP  vs.  PC-­‐based  replacement  policies  •  More  details  on  mulD-­‐core  results  

31  

Page 32: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Conclusion  

32  

   

•  In  a  compressed  cache,  compressed  block  size  is  an  addiDonal  dimension  

•  Problem:  How  can  we  maximize  cache  performance  uDlizing  both  block  reuse  and  compressed  size?  

•  Observa'ons:    –  Importance  of  the  cache  block  depends  on  its  compressed  size  –  Block  size  is  indicaDve  of  reuse  behavior  in  some  applicaDons  

•  Key  Idea:  Use  compressed  block  size  in  making  cache  replacement  and  inserDon  decisions  

•  Two  techniques:  Minimal-­‐Value  EvicDon  and  Size-­‐based  InserDon  Policy  

•  Results:  –  Higher  performance  (9%/11%  on  average  for  1/2-­‐core)  –  Lower  energy  (12%  on  average)  

Page 33: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Exploi'ng  Compressed  Block  Size  as  an  Indicator  of  Future  Reuse  

Gennady  Pekhimenko,    Tyler  Huberty,  Rui  Cai,    Onur  Mutlu,  Todd  C.  Mowry      

     

Phillip  B.  Gibbons,    Michael  A.  Kozuch  

   

Page 34: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Backup  Slides  

34  

Page 35: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Poten'al  for  Data  Compression  

35  

Compression  algorithms  for  on-­‐chip  caches:  •  Frequent  PaXern  Compression  (FPC)  [Alameldeen+,  ISCA’04]  –  performance:  +15%,  comp.  raDo:  1.25-­‐1.75  

•  C-­‐Pack  [Chen+,  Trans.  on  VLSI’10]  –  comp.  raDo:  1.64  

•  Base-­‐Delta-­‐Immediate  (BDI)  [Pekhimenko+,  PACT’12]  

–  performance:  +11.2%,  comp.  raDo:  1.53  

•  StaDsDcal  Compression  [Arelakis+,  ISCA’14]  –  performance:  +11%,  comp.  raDo:  2.1  

 

Page 36: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Poten'al  for  Data  Compression  (2)  

36  

•  BeXer  compressed  cache  management  – Decoupled  Compressed  Cache  (DCC)  [MICRO’13]  

– Skewed  Compressed  Cache  [MICRO’14]  

Page 37: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

#  3:  Block  Size  Can  Indicate  Reuse  

37  

soplex  

Block  size,  bytes  

Different  sizes  have  different  dominant  reuse  distances  

Page 38: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Conven'onal  Cache  Compression  

38  

Tag0   Tag1  

…   …  

…   …  

Tag  Storage:  Set0  

Set1  

Way0   Way1  

Data0  

…  

…  

Set0  

Set1  

Way0   Way1  

…  

Data1  

…  

32  bytes  Data  Storage:  Conven'onal  2-­‐way  cache  with  32-­‐byte  cache  lines  

Compressed  4-­‐way  cache  with  8-­‐byte  segmented  data  

Tag0   Tag1  

…   …  

…   …  

Tag  Storage:  

Way0   Way1   Way2   Way3  

…   …  

Tag2   Tag3  

…   …  

Set0  

Set1  

üTwice  as  many  tags    

üC  -­‐  Compr.  encoding  bits  C

Set0  

Set1  

…   …   …   …   …   …   …   …  

S0  S0   S1   S2   S3   S4   S5   S6   S7  

…   …   …   …   …   …   …   …  

8  bytes  

üTags  map  to  mulDple  adjacent  segments  

Page 39: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

   CAMP  =  MVE  +  SIP  

Minimal-­‐Value  evicDon  (MVE)  

–   Probability  of  reuse  based  on  RRIP  [ISCA’10]                  

Size-­‐based  inserDon  policy  (SIP)  – Dynamically  prioriDzes  blocks  based  on  their  compressed  size  (e.g.,  insert  into  MRU  posiDon  for  LRU  policy)  

   

 39  

Value  Func'on  =      Probability  of  reuse  

 Compressed  block  size  

Page 40: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Overhead  Analysis  

40  

Page 41: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

CAMP  and  Global  Replacement  

CAMP  with  V-­‐Way  cache  design  

41  

Tag0   Tag1  

…   …  

…   …  

…   …  

Tag2   Tag3  

…   …  

Set0  

Set1   Data0  

…  

…  

64  bytes  

…  

Data1  

…  

status   tag   fptr  

R0   R1  

rptr  comp   …   …  …  

8  bytes  

v+s  

R2   R3  

Page 42: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

G-­‐MVE  Two  major  changes:  •  Probability  of  reuse    based  on  Reuse  Replacement  policy  – On  inserDon,  a  block’s  counter  is  set  to  zero    – On  a  hit,  the  block’s  counter  is  incremented  by  one  indicaDng  its  reuse  

•  Global  replacement  using  a  pointer  (PTR)  to  a  reuse  counter  entry  – StarDng  at  the  entry  PTR  points  to,  the  reuse  counters  of  64  valid  data  entries  are  scanned,  decremenDng  each  non-­‐zero  counter    

42  

Page 43: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

G-­‐SIP  

43  

•  Use  set  dueling  instead  of  set  sampling  (SIP)  to  decide  best  inserDon  policy:  – Divide  data-­‐store  into  8  regions  and  block  sizes  into  8  ranges  

– During  training,  for  each  region,  insert  blocks  of  a  different  size  range  with  higher  priority    

– Count  cache  misses  per  region  •  In  steady  state,  insert  blocks  with  sizes  of  top  performing  regions  with  higher  priority  in  all  regions  

Page 44: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Size-­‐based  Inser'on  Policy  (SIP):  Key  Idea  •  Insert  blocks  of  a  certain  size  with  higher  priority  if  this  improves  cache  miss  rate  

44  

Highest  priority  

Lowest  priority  

64  32  

16  64  

16  

32  

…  

Set:  

16  

32  

Page 45: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Evaluated  Cache  Management  Policies  

45  

Design    

Descrip'on  

LRU   Baseline  LRU  policy  

RRIP   Re-­‐reference  Interval  PredicDon[Jaleel+,  ISCA’10]  

ECM   EffecDve  Capacity  Maximizer[Baek+,  HPCA’13]  

V-­‐Way   Variable-­‐Way  Cache[ISCA’05]  

CAMP   Compression-­‐Aware  Management  (Local)  

G-­‐CAMP   Compression-­‐Aware  Management  (Global)  

Page 46: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Performance  and  MPKI  Correla'on  

Mechanism   LRU   RRIP   ECM  

CAMP   8.1%/-­‐13.3%   2.7%/-­‐5.6%   2.1%/-­‐5.9%  

46  

Performance  improvement  /  MPKI  reducDon  

CAMP  performance    improvement  correlates  with  MPKI  reducDon  

Page 47: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Performance  and  MPKI  Correla'on  

Mechanism   LRU   RRIP   ECM   V-­‐Way  

CAMP   8.1%/-­‐13.3%   2.7%/-­‐5.6%   2.1%/-­‐5.9%   N/A  

G-­‐CAMP   14.0%/-­‐21.9%   8.3%/-­‐15.1%   7.7%/-­‐15.3%   4.9%/-­‐8.7%  

47  

Performance  improvement  /  MPKI  reducDon  

CAMP  performance    improvement  correlates  with  MPKI  reducDon  

Page 48: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Mul'-­‐core  Results  (2)  

48  

G-­‐CAMP  outperforms  LRU,  RRIP  and  V-­‐Way  policies  

0.95  

1.00  

1.05  

1.10  

1.15  

1.20  

Homo-­‐Homo   Homo-­‐Hetero   Hetero-­‐Hetero   GeoMean  

Normalized

 Weighted  

Speedu

p  

RRIP   V-­‐WAY   G-­‐MVE   G-­‐SIP   G-­‐CAMP  

Page 49: Exploi’ng*Compressed*BlockSize* as*an*Indicator*of*Future ... · mcf lbm" astar" tpch2" calculix"h264ref wrf povray" gcc % size: ... "+11%,*comp."rao:" 2.1 + PotenalforDataCompression

Effect  on  Memory  Subsystem  Energy  

49  

0.4  0.5  0.6  0.7  0.8  0.9  1  

1.1  1.2  

astar  

cactusAD

M  

gcc  

gobm

k  h2

64ref  

soplex  

zeusmp  

bzip2  

GemsFDT

D  grom

acs  

leslie3d  

perlb

ench  

sjeng  

sphinx3  

xalancbm

k  milc  

omne

tpp  

libqu

antum  

lbm  

mcf  

tpch2  

tpch6  

apache

 

GMeanInten

se  

GMeanA

ll  

Normalized

 Ene

rgy   RRIP   ECM   CAMP   V-­‐WAY   G-­‐CAMP  

CAMP/G-­‐CAMP  reduces  energy  consumpDon  in  the  memory  subsystem  

15.1%  

L1/L2  caches,  DRAM,  NoC,  compression/decompression