21
INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING IN MANY-CORES ARCHITECTURES 18TH INTERNATIOALWORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODEL AND SUPPORTIVE ENVIRONMENTS | Safae Dahmani , Loïc Cudennec, CEA LIST Guy Gogniat, LabSTICC University of Bretagne Sud Boston, USA | MAY 20th 2013 MAY 20TH 2013

INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING IN

MANY-CORES ARCHITECTURES

18TH INTERNATIOALWORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODEL AND SUPPORTIVE ENVIRONMENTS | Safae Dahmani, Loïc Cudennec, CEA LIST Guy Gogniat, LabSTICC University of Bretagne Sud

Boston, USA | MAY 20th 2013 MAY 20TH 2013

Page 2: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

High-end Applicati

ons

Control Command

Scientific Computing

Data Security

Signal Processing

Financial Analytics

MAY 20TH 2013 | PAGE 2 Boston, USA | MAY 20th 2013

TOWARDS MANY-CORES

Page 3: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

TOWARDS MANY-CORES

MAY 20TH 2013 | PAGE 3 Boston, USA | MAY 20th 2013

Multi-cores: Multiple complex cores

•  Power and heat limits

Many-cores: hundreds of small cores •  High compute throughput with low power

consumption

Page 4: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

MAY 20TH 2013 | PAGE 4 Boston, USA | MAY 20th 2013

TOWARDS MANY-CORES

TILERA processors: TILE-GX (72 cores), TILEPro (64 cores)

KALRAY MPPA (256 cores)

INTEL Xeon-Phi(60 cores)

STMicroelectronics, CEA STHORM (69 cores)

Page 5: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

MANY-CORES CHALLENGES

MAY 20TH 2013 | PAGE 5 Boston, USA | MAY 20th 2013

 The main architects issue is:

  Memory wall:

 There is an inherent cost of accessing distant memory (NUMA)

 Speed gap between processor and memory  Heterogeneous set of workloads:

 Massively parallel applications with different memory requirements

 Complex deployment on large NUMA platform

Scalability

Page 6: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

MEMORY HIERARCHY SINGLE CORE

MAY 20TH 2013 | PAGE 6 Boston, USA | MAY 20th 2013

Figure 1: Cache hierarchy for distributed architectures

L1/Data

L1/Inst

L2 L3 Main Memory

More storage capacity, higher access latency, lower cost

 Cache hierarchy organization:   Number of levels   Cache size   Private/shared

 Caching strategy:   Replacement policy   Data allocation   Cache coherency

Page 7: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

MAY 20TH 2013 | PAGE 7 Boston, USA | MAY 20th 2013

Figure 1: Cache hierarchy for distributed architectures

 Cache hierarchy organization:   Number of levels   Cache size   Private/shared

 Caching strategy:   Replacement policy   Data allocation   Cache coherency

Processor Cluster

Sha

red

L2

External Memory

P

L1

Core

MEMORY HIERARCHY MANY-CORE

Page 8: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

PRIVATE VS SHARED

| PAGE 8 Boston, USA | MAY 20th 2013 MAY 20TH 2013

Figure 2: Private/Shared caching

L1D L1I

L2

P1 L1D L1I

L2

P2 L1D L1I

L2

P1 L1D L1I

P2

Private Shared

+  Scale easily -  Unneeded data replication

+  Avoid some data transfers +  Higher storage capacity -  Contention

Page 9: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

MAY 20TH 2013 | PAGE 9 Boston, USA | MAY 20th 2013

COOPERATIVE CACHING

Aggregate   Low access latency of private caches   High storage capacity of shared caches

Figure 3: Cooperative caching

 Cooperative caching is mainly used in power-aware large scale systems: wireless networking (MANET), data storage systems(web servers).

L1D L1I

L2

P1

L1D L1I

L2

P2

Cooperative Zone

Page 10: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

"   Hybrid Caching:

  Static cache partitioning

"   Adaptive Cache Partitioning   Dynamic cache partitioning

"   Elastic Cooperative Caching**   Local cache partitioning

MAY 20TH 2013 | PAGE 10 Boston, USA | MAY 20th 2013

Pour insérer une image : Menu « Insertion / Image »

ou Cliquer sur l’icône de la zone

image

COOPERATIVE CACHING TECHNIQUES

Private Shared

Unified shared cache space

Shared Private

**E. Herrero, J. Gonzalez, and R. Canal, “Elastic cooperative a caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors,”

Cache Partitioning Unit

Private Shared

Spilled Blocks Allocator

Figure 4: Hybrid Caching

Figure 5: Adaptive Caching

Figure 6: Elastic Cooperative Caching

P1

Shared Private

P2

Shared Private

Page 11: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

STRESSED NEIGHBORHOOD CONTEXT

"   In the case of highly stressed neighborhood:

  High data reuse amount   Concurrent accesses to cooperative zone

"   Elastic Cooperative Caching:   Cyclic cache partitioning   Round Robin spilled blocks allocation

MAY 20TH 2013 | PAGE 11 Boston, USA | MAY 20th 2013

Private Shared

Shared Private

Page 12: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

11/09/2012 | PAGE 12 CEA | 10 AVRIL 2012

CONTRIBUTION: DATA SLIDING MECHANISM

MAY 20TH 2013 | PAGE 12 Boston, USA | MAY 20th 2013

Figure 7: Data Sliding Approach

Page 13: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

DATA REPLACEMENT POLICY

"   Use two types of counters:

  LHC: Local Hit Counter for private data access   NHC[4]: One Neighbor Hit Counter per neighbor for shared data access

"   Priority based replacement:  LHC> sum(NHC): Shared LRU block  LHC< sum(NHC): Private LRU block

"   Stressed Neighborhood: Distance(LHC, sum(NHC))< Threshold

| PAGE 13 Nce Boston, USA | MAY 20th 2013

Pour insérer une image : Menu « Insertion / Image »

ou Cliquer sur l’icône de la zone

image

Replacement policy: Data replacement policy is based on access frequency to each data set (shared/private).

Shared Private

LHC++

W E

S

N

NHC[W]++

Page 14: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

BEST NEIGHBOR SELECTOR POLICY

Best Neighbor policy: Host neighbor is the least stressed one ó Min(NHC[4]). **NHC: Neighbor Hit Counter

MAY 20TH 2013 | PAGE 14 Boston, USA | MAY 20th 2013

Pour insérer une image : Menu « Insertion / Image »

ou Cliquer sur l’icône de la zone

image

Figure 8: Data sliding through closest neighbors

è Block’s destination is selected according to cache stress amount of each neighbor.

Page 15: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

"   Simulation tool calculates messages/node induced by application access. "  No timing is considered

MAY 20TH 2013 | PAGE 15 Boston, USA | MAY 20th 2013

Pour insérer une image : Menu « Insertion / Image »

ou Cliquer sur l’icône de la zone

image

VALIDATION PROCESS: ANALYTICAL SIMULATION

Figure 9: Validation Process Chain

Page 16: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

MAY 20TH 2013 | PAGE 16 Boston, USA | MAY 20th 2013

GLOBAL TRAFFIC EVALUATION

Figure 11: Stressed neighborhood

Page 17: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

Elastic Cooperative Caching protocol

Baseline protocol

Cooperative Caching with Data Sliding

Number of Messages

4x4 tiled chip

Number of Messages

4x4 tiled chip

Number of Messages

4x4 tiled chip

| PAGE 17 Boston, USA | MAY 20th 2013 MAY 20TH 2013

GLOBAL TRAFFIC EVALUATION

Page 18: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

| PAGE 18

Cyclic Replacement Priority-based Replacement

Number of Messages Number of Messages

  Traffic is half reduced   Lower off-chip eviction rate

PRIORITY-BASED REPLACEMENT EVALUATION

Boston, USA | MAY 20th 2013 MAY 20TH 2013

Page 19: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

CONCLUSION

  Data Sliding Mechanism: New approach of Cooperative Caching that allows data

migration between neighbors in highly stressed context.

  Local/ Neighbor Counters (Local view built on local and remote access frequency):

  Data Replacement policy

  Best Neighbor Selector

  Resulting Improvements:

  Reduces by half on-chip communications

  Improves cache miss rate (less data evictions)

MAY 20TH 2013 | PAGE 19 Boston, USA | MAY 20th 2013

Pour insérer une image : Menu « Insertion / Image »

ou Cliquer sur l’icône de la zone

image

Page 20: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

PERSPECTIVES

MAY 20TH 2013 | PAGE 20 Boston, USA | MAY 20th 2013

  Extension of data sliding from 1-Hop to N-Hop.

  Allowing access to hosted blocks. Figure 12: N-Hop Data Sliding

P1 P2

?

Figure 13: Shared blocks access

Page 21: INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING IN

MANY-CORES ARCHITECTURES

18TH INTERNATIOALWORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODEL AND SUPPORTIVE ENVIRONMENTS | Safae Dahmani, Loïc Cudennec, CEA LIST Guy Gogniat, LabSTICC University of Bretagne Sud

Boston, USA | MAY 20th 2013 MAY 20TH 2013 | PAGE 21