INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared

INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING IN

MANY-CORES ARCHITECTURES

18TH INTERNATIOALWORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODEL AND SUPPORTIVE ENVIRONMENTS | Safae Dahmani, Loïc Cudennec, CEA LIST Guy Gogniat, LabSTICC University of Bretagne Sud

Boston, USA | MAY 20th 2013 MAY 20TH 2013

High-end Applicati

ons

Control Command

Scientific Computing

Data Security

Signal Processing

Financial Analytics

MAY 20TH 2013 | PAGE 2 Boston, USA | MAY 20th 2013

TOWARDS MANY-CORES

TOWARDS MANY-CORES


Multi-cores: Multiple complex cores

•  Power and heat limits

Many-cores: hundreds of small cores •  High compute throughput with low power

consumption


TOWARDS MANY-CORES

TILERA processors: TILE-GX (72 cores), TILEPro (64 cores)

KALRAY MPPA (256 cores)

INTEL Xeon-Phi(60 cores)

STMicroelectronics, CEA STHORM (69 cores)

MANY-CORES CHALLENGES


 The main architects issue is:

  Memory wall:

 There is an inherent cost of accessing distant memory (NUMA)

 Speed gap between processor and memory  Heterogeneous set of workloads:

 Massively parallel applications with different memory requirements

 Complex deployment on large NUMA platform

Scalability

MEMORY HIERARCHY SINGLE CORE


Figure 1: Cache hierarchy for distributed architectures

L1/Data

L1/Inst

L2 L3 Main Memory

More storage capacity, higher access latency, lower cost

 Cache hierarchy organization:   Number of levels   Cache size   Private/shared

 Caching strategy:   Replacement policy   Data allocation   Cache coherency


Figure 1: Cache hierarchy for distributed architectures

 Cache hierarchy organization:   Number of levels   Cache size   Private/shared

 Caching strategy:   Replacement policy   Data allocation   Cache coherency

Processor Cluster

Sha

red

L2

External Memory

P

L1

Core

MEMORY HIERARCHY MANY-CORE

PRIVATE VS SHARED

| PAGE 8 Boston, USA | MAY 20th 2013 MAY 20TH 2013

Figure 2: Private/Shared caching

L1D L1I

L2

P1 L1D L1I

L2

P2 L1D L1I

L2

P1 L1D L1I

P2

Private Shared

+  Scale easily -  Unneeded data replication

+  Avoid some data transfers +  Higher storage capacity -  Contention


COOPERATIVE CACHING

Aggregate   Low access latency of private caches   High storage capacity of shared caches

Figure 3: Cooperative caching

 Cooperative caching is mainly used in power-aware large scale systems: wireless networking (MANET), data storage systems(web servers).

L1D L1I

L2

P1

L1D L1I

L2

P2

Cooperative Zone

"   Hybrid Caching:

  Static cache partitioning

"   Adaptive Cache Partitioning   Dynamic cache partitioning

"   Elastic Cooperative Caching**   Local cache partitioning


Pour insérer une image : Menu « Insertion / Image »

ou Cliquer sur l’icône de la zone

image

COOPERATIVE CACHING TECHNIQUES

Private Shared

Unified shared cache space

Shared Private

**E. Herrero, J. Gonzalez, and R. Canal, “Elastic cooperative a caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors,”

Cache Partitioning Unit

Private Shared

Spilled Blocks Allocator

Figure 4: Hybrid Caching

Figure 5: Adaptive Caching

Figure 6: Elastic Cooperative Caching

P1

Shared Private

P2

Shared Private

STRESSED NEIGHBORHOOD CONTEXT

"   In the case of highly stressed neighborhood:

  High data reuse amount   Concurrent accesses to cooperative zone

"   Elastic Cooperative Caching:   Cyclic cache partitioning   Round Robin spilled blocks allocation


Private Shared

Shared Private

11/09/2012 | PAGE 12 CEA | 10 AVRIL 2012

CONTRIBUTION: DATA SLIDING MECHANISM


Figure 7: Data Sliding Approach

DATA REPLACEMENT POLICY

"   Use two types of counters:

  LHC: Local Hit Counter for private data access   NHC[4]: One Neighbor Hit Counter per neighbor for shared data access

"   Priority based replacement:  LHC> sum(NHC): Shared LRU block  LHC< sum(NHC): Private LRU block

"   Stressed Neighborhood: Distance(LHC, sum(NHC))< Threshold

| PAGE 13 Nce Boston, USA | MAY 20th 2013



image

Replacement policy: Data replacement policy is based on access frequency to each data set (shared/private).

Shared Private

LHC++

W E

S

N

NHC[W]++

BEST NEIGHBOR SELECTOR POLICY

Best Neighbor policy: Host neighbor is the least stressed one ó Min(NHC[4]). **NHC: Neighbor Hit Counter




image

Figure 8: Data sliding through closest neighbors

è Block’s destination is selected according to cache stress amount of each neighbor.

"   Simulation tool calculates messages/node induced by application access. "  No timing is considered




image

VALIDATION PROCESS: ANALYTICAL SIMULATION

Figure 9: Validation Process Chain


GLOBAL TRAFFIC EVALUATION

Figure 11: Stressed neighborhood

Elastic Cooperative Caching protocol

Baseline protocol

Cooperative Caching with Data Sliding

Number of Messages

4x4 tiled chip

Number of Messages

4x4 tiled chip

Number of Messages

4x4 tiled chip

| PAGE 17 Boston, USA | MAY 20th 2013 MAY 20TH 2013

GLOBAL TRAFFIC EVALUATION

| PAGE 18

Cyclic Replacement Priority-based Replacement

Number of Messages Number of Messages

  Traffic is half reduced   Lower off-chip eviction rate

PRIORITY-BASED REPLACEMENT EVALUATION

Boston, USA | MAY 20th 2013 MAY 20TH 2013

CONCLUSION

  Data Sliding Mechanism: New approach of Cooperative Caching that allows data

migration between neighbors in highly stressed context.

  Local/ Neighbor Counters (Local view built on local and remote access frequency):

  Data Replacement policy

  Best Neighbor Selector

  Resulting Improvements:

  Reduces by half on-chip communications

  Improves cache miss rate (less data evictions)




image

PERSPECTIVES


  Extension of data sliding from 1-Hop to N-Hop.

  Allowing access to hosted blocks. Figure 12: N-Hop Data Sliding

P1 P2

?

Figure 13: Shared blocks access

INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING IN

MANY-CORES ARCHITECTURES

18TH INTERNATIOALWORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODEL AND SUPPORTIVE ENVIRONMENTS | Safae Dahmani, Loïc Cudennec, CEA LIST Guy Gogniat, LabSTICC University of Bretagne Sud

Boston, USA | MAY 20th 2013 MAY 20TH 2013 | PAGE 21

Documents

INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING … · 2015. 1. 26. · COOPERATIVE CACHING Aggregate Low access latency of private caches High storage capacity of shared