Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING IN
MANY-CORES ARCHITECTURES
18TH INTERNATIOALWORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODEL AND SUPPORTIVE ENVIRONMENTS | Safae Dahmani, Loïc Cudennec, CEA LIST Guy Gogniat, LabSTICC University of Bretagne Sud
Boston, USA | MAY 20th 2013 MAY 20TH 2013
High-end Applicati
ons
Control Command
Scientific Computing
Data Security
Signal Processing
Financial Analytics
MAY 20TH 2013 | PAGE 2 Boston, USA | MAY 20th 2013
TOWARDS MANY-CORES
TOWARDS MANY-CORES
MAY 20TH 2013 | PAGE 3 Boston, USA | MAY 20th 2013
Multi-cores: Multiple complex cores
• Power and heat limits
Many-cores: hundreds of small cores • High compute throughput with low power
consumption
MAY 20TH 2013 | PAGE 4 Boston, USA | MAY 20th 2013
TOWARDS MANY-CORES
TILERA processors: TILE-GX (72 cores), TILEPro (64 cores)
KALRAY MPPA (256 cores)
INTEL Xeon-Phi(60 cores)
STMicroelectronics, CEA STHORM (69 cores)
MANY-CORES CHALLENGES
MAY 20TH 2013 | PAGE 5 Boston, USA | MAY 20th 2013
The main architects issue is:
Memory wall:
There is an inherent cost of accessing distant memory (NUMA)
Speed gap between processor and memory Heterogeneous set of workloads:
Massively parallel applications with different memory requirements
Complex deployment on large NUMA platform
Scalability
MEMORY HIERARCHY SINGLE CORE
MAY 20TH 2013 | PAGE 6 Boston, USA | MAY 20th 2013
Figure 1: Cache hierarchy for distributed architectures
L1/Data
L1/Inst
L2 L3 Main Memory
More storage capacity, higher access latency, lower cost
Cache hierarchy organization: Number of levels Cache size Private/shared
Caching strategy: Replacement policy Data allocation Cache coherency
MAY 20TH 2013 | PAGE 7 Boston, USA | MAY 20th 2013
Figure 1: Cache hierarchy for distributed architectures
Cache hierarchy organization: Number of levels Cache size Private/shared
Caching strategy: Replacement policy Data allocation Cache coherency
Processor Cluster
Sha
red
L2
External Memory
P
L1
Core
MEMORY HIERARCHY MANY-CORE
PRIVATE VS SHARED
| PAGE 8 Boston, USA | MAY 20th 2013 MAY 20TH 2013
Figure 2: Private/Shared caching
L1D L1I
L2
P1 L1D L1I
L2
P2 L1D L1I
L2
P1 L1D L1I
P2
Private Shared
+ Scale easily - Unneeded data replication
+ Avoid some data transfers + Higher storage capacity - Contention
MAY 20TH 2013 | PAGE 9 Boston, USA | MAY 20th 2013
COOPERATIVE CACHING
Aggregate Low access latency of private caches High storage capacity of shared caches
Figure 3: Cooperative caching
Cooperative caching is mainly used in power-aware large scale systems: wireless networking (MANET), data storage systems(web servers).
L1D L1I
L2
P1
L1D L1I
L2
P2
Cooperative Zone
" Hybrid Caching:
Static cache partitioning
" Adaptive Cache Partitioning Dynamic cache partitioning
" Elastic Cooperative Caching** Local cache partitioning
MAY 20TH 2013 | PAGE 10 Boston, USA | MAY 20th 2013
Pour insérer une image : Menu « Insertion / Image »
ou Cliquer sur l’icône de la zone
image
COOPERATIVE CACHING TECHNIQUES
Private Shared
Unified shared cache space
Shared Private
**E. Herrero, J. Gonzalez, and R. Canal, “Elastic cooperative a caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors,”
Cache Partitioning Unit
Private Shared
Spilled Blocks Allocator
Figure 4: Hybrid Caching
Figure 5: Adaptive Caching
Figure 6: Elastic Cooperative Caching
P1
Shared Private
P2
Shared Private
STRESSED NEIGHBORHOOD CONTEXT
" In the case of highly stressed neighborhood:
High data reuse amount Concurrent accesses to cooperative zone
" Elastic Cooperative Caching: Cyclic cache partitioning Round Robin spilled blocks allocation
MAY 20TH 2013 | PAGE 11 Boston, USA | MAY 20th 2013
Private Shared
Shared Private
11/09/2012 | PAGE 12 CEA | 10 AVRIL 2012
CONTRIBUTION: DATA SLIDING MECHANISM
MAY 20TH 2013 | PAGE 12 Boston, USA | MAY 20th 2013
Figure 7: Data Sliding Approach
DATA REPLACEMENT POLICY
" Use two types of counters:
LHC: Local Hit Counter for private data access NHC[4]: One Neighbor Hit Counter per neighbor for shared data access
" Priority based replacement: LHC> sum(NHC): Shared LRU block LHC< sum(NHC): Private LRU block
" Stressed Neighborhood: Distance(LHC, sum(NHC))< Threshold
| PAGE 13 Nce Boston, USA | MAY 20th 2013
Pour insérer une image : Menu « Insertion / Image »
ou Cliquer sur l’icône de la zone
image
Replacement policy: Data replacement policy is based on access frequency to each data set (shared/private).
Shared Private
LHC++
W E
S
N
NHC[W]++
BEST NEIGHBOR SELECTOR POLICY
Best Neighbor policy: Host neighbor is the least stressed one ó Min(NHC[4]). **NHC: Neighbor Hit Counter
MAY 20TH 2013 | PAGE 14 Boston, USA | MAY 20th 2013
Pour insérer une image : Menu « Insertion / Image »
ou Cliquer sur l’icône de la zone
image
Figure 8: Data sliding through closest neighbors
è Block’s destination is selected according to cache stress amount of each neighbor.
" Simulation tool calculates messages/node induced by application access. " No timing is considered
MAY 20TH 2013 | PAGE 15 Boston, USA | MAY 20th 2013
Pour insérer une image : Menu « Insertion / Image »
ou Cliquer sur l’icône de la zone
image
VALIDATION PROCESS: ANALYTICAL SIMULATION
Figure 9: Validation Process Chain
MAY 20TH 2013 | PAGE 16 Boston, USA | MAY 20th 2013
GLOBAL TRAFFIC EVALUATION
Figure 11: Stressed neighborhood
Elastic Cooperative Caching protocol
Baseline protocol
Cooperative Caching with Data Sliding
Number of Messages
4x4 tiled chip
Number of Messages
4x4 tiled chip
Number of Messages
4x4 tiled chip
| PAGE 17 Boston, USA | MAY 20th 2013 MAY 20TH 2013
GLOBAL TRAFFIC EVALUATION
| PAGE 18
Cyclic Replacement Priority-based Replacement
Number of Messages Number of Messages
Traffic is half reduced Lower off-chip eviction rate
PRIORITY-BASED REPLACEMENT EVALUATION
Boston, USA | MAY 20th 2013 MAY 20TH 2013
CONCLUSION
Data Sliding Mechanism: New approach of Cooperative Caching that allows data
migration between neighbors in highly stressed context.
Local/ Neighbor Counters (Local view built on local and remote access frequency):
Data Replacement policy
Best Neighbor Selector
Resulting Improvements:
Reduces by half on-chip communications
Improves cache miss rate (less data evictions)
MAY 20TH 2013 | PAGE 19 Boston, USA | MAY 20th 2013
Pour insérer une image : Menu « Insertion / Image »
ou Cliquer sur l’icône de la zone
image
PERSPECTIVES
MAY 20TH 2013 | PAGE 20 Boston, USA | MAY 20th 2013
Extension of data sliding from 1-Hop to N-Hop.
Allowing access to hosted blocks. Figure 12: N-Hop Data Sliding
P1 P2
?
Figure 13: Shared blocks access
INTRODUCING A DATA SLIDING MECHANISM FOR COOPERATIVE CACHING IN
MANY-CORES ARCHITECTURES
18TH INTERNATIOALWORKSHOP ON HIGH-LEVEL PARALLEL PROGRAMMING MODEL AND SUPPORTIVE ENVIRONMENTS | Safae Dahmani, Loïc Cudennec, CEA LIST Guy Gogniat, LabSTICC University of Bretagne Sud
Boston, USA | MAY 20th 2013 MAY 20TH 2013 | PAGE 21