25
NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

  • View
    222

  • Download
    3

Embed Size (px)

Citation preview

Page 1: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

NoC for Cache Coherence

NoC Seminar Technion

Vainbaum YuriMentor I.Keidar

Page 2: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

Cache coherence problem in NUCA

•The cache coherency problem appears when tasks running on different processors in the SoC share data stored in the system memory. •When a task T1 running on a processor P1 modifies a data shared with task T2 , which runs on the processor P2, that data’s copy on P2 processor’s cache must be either updated or invalidated, before a new access to it.

P1

P2

L2$A

L2$

T1

P3

P4

AT2

L2$ L2$

B

Update other L2$

B

Page 3: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

MESI -maintain the coherence in cached systems

Invalid: It is a non-valid state. The data you are looking for are not in the cache, or the local copy of these data is not correct because another processor has updated the corresponding memoryposition.

Shared: Shared without having been modified. Another processor can have the data into the cache memory and both copies are in their current version.

Exclusive: Exclusive without having been modified. That is, this cache is the only one that has the correct value of the block. Data blocks are according to the existing ones in the main memory.

Modified: Actually, it is an exclusive-modified state. It means that the cache has the only copy that is correct in the whole system. The data which are in the main memory are wrong.

Page 4: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network Cache Coherence

•Propose :•Implementation of the coherence protocol and directories within the network at each router node. •This opens up the possibility of optimizing a protocol with in-transit actions

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

Page 5: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network Read optimization

H A B

Read request

To sheerer dataDirectory based MSI

•Three end-to-end messages

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

Page 6: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network Read optimization

H A B

•Node B “bumps” into node A•While message in-transit to the home node H obtain the data directly from A

Read request

data In-Network MSI

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

Page 7: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network Write optimization

H A B

write request

Inv

InvDirectory based MSIAck

data

C

Ack

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

Page 8: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network Write optimization

H A B

write request

Inv+AckIn-network MSI

data

C

Ack +inv

•This in-transit optimization can reduce write communication from two round-trips to a single round-trip from C to H and back

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

Page 9: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network cache coherence protocol

H

•Idea: move coherence directories from the nodes into the network fabric•Virtual trees, one for each cache line, are maintained within the network in place of coherence directories to keep track of sharers•The virtual tree consists of one root node R which is the node that first loads a cache line from off-chip memory, all nodes that sharing this line and intermediate nodes between root and sharers•Nodes of the tree are connected by virtual links•Virtual trees are stored in virtual tree caches at each router within the network•Reads and writes are routed towards the home node, if they encounter a virtual tree in-transit, the virtual tree takes over as the routing function and steers read requests and write invalidates appropriately towards the sharers instead.

R

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

Page 10: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network-Read access example

H

R1

New read request –Read1

1.Towards home node

2. Load line

from off-chip

3. Constructs virtual tree

H

R1

Second read request to the same line –Read2

5. Steered tonearest copy

6. Returns dataand constructsnew virtualtree links

4. Hits virtual tree on the way to home node

read1

R2

read2

Page 11: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network Router micro-architecture

•Virtual tree cache serves to steer head flits towards the appropriate output ports.•Virtual tree cache points them towards caches housing the most up-to-date data requested•Memory address contained in each packet’s header is first parsed into < tag, index,o f f set > if the tag matches, there is a hit in the tree cache, and its prescribed direction is used as the desired output port

Flit

Page 12: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

In-Network Results & Summary

•Proposed an approach of cache coherence for chip multiprocessors where the coherence protocol and directories are all embedded within network routers.•This approach has a low hardware overhead which quickly leads to hardware savings, compared to the standard directory protocol, as the number of cores per chip increasesIn-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

Page 13: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

DCOS-Directory Cache On a Switch

•To reduce cache-to-cache data transfer time proposed architecture implemented inside each switch •4x2 2D mesh topology MIPS R10000 core model ,Directory based cache coherence MSI protocol

DCOS: Cache Embedded Switch Architecture forDistributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE

Page 14: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

DCOS-Directory Cache On a Switch

•State entry assigned to a memory block holds current state of block : empty, shared, modified /invalid

•No data items are copied to caches or memories :Marked as “E”

DCOS: Cache Embedded Switch Architecture forDistributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE

Page 15: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

DCOS-Directory Cache On a Switch

•Data is shared with other caches and memories

DCOS: Cache Embedded Switch Architecture forDistributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE

Page 16: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

DCOS-Switch architecture

•All directory caches are embedded within crossbar switch

Page 17: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

DCOS-Results

DCOS: Cache Embedded Switch Architecture forDistributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE

Page 18: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

Cache Coherency Communication Cost

•How costly is cache coherency in interconnection terms? •This paper focuses on bringing light onto this question.

•Directory based mechanism to maintain coherence among all caches in the system

Cache Coherency Communication Cost in aNoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.

Page 19: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

Cache Coherency Communication Cost

•The amount of data on the NoC for regular operations is much larger than the amount of data for cache coherence maintenance for almost all the cache sizes•The increase in cache size decreases the amount of data for regular operations, and so the amount of data for cache coherence becomes more significant

Cache Coherency Communication Cost in aNoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.

Page 20: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

Cache Coherency Communication Cost

•Graph shows that the amount of page replacement requests is themost responsible for the cache coherence injected load for smallcache sizes. This happens because the amount of replacementsincreases as cache size decreases

Cache Coherency Communication Cost in aNoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.

8 CPUs,

1 directory

Page 21: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

BeNOC –Bus enhanced Network on Chip

•Low latency, low bandwidth specialized bus, optimized for system-wide distribution of control signals (ack,invl)•High performance distributed network that handles high throughputdata communication between pairs of modules

BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMPIsask'har Walter, Israel Cidon, and Avinoam Kolodny

Page 22: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

BeNOC –Bus enhanced Network on Chip

BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMPIsask'har Walter, Israel Cidon, and Avinoam Kolodny

β -reflects the network-to-bus broadcast latency ration- The number of modules in the system•When broadcast operations are compared,the bus is considerably more energy efficient than the network

Page 23: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

Network topology awareness

P1L2$

P2L2$

P3L2$

Invl.

Invl.

•Wait for furthest invalidation acknowledgment therefore send Invl to P3 first• Cache coherence protocol should be aware of the network topology •Send invalidation messages according to distances from the directory

Page 24: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

Network topology awareness

•Calculate at each transaction furthest sharing node and send invalidation •The total delay will be roundtrip time of invalidation/acknowledge to furthest node •Send long delay roundtrip messages first to mask short delay messages.

Page 25: NoC for Cache Coherence NoC Seminar Technion Vainbaum Yuri Mentor I.Keidar

References

In-Network Cache Coherence, Noel Eisley,

The 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)

DCOS: Cache Embedded Switch Architecture forDistributed Shared Memory Multiprocessor SoCs , Daewook Kim 2006 IEEE

BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMPIsask'har Walter, Israel Cidon, and Avinoam Kolodny

Cache Coherency Communication Cost in aNoC-based MPSoC Platform , Gustavo Girão, SBCCI’07, September 3–6, 2007, Rio de Janeiro, Brazil.TEACHING THE CACHE MEMORY COHERENCE WITH THE MESIPROTOCOL SIMULATOR F. J. JIMÉNEZ1, J. GÓMEZ1, A. MESONES1, E. HERRUZO1, J. I. BENAVIDES1 Y F. J. SÁNCHEZ21Dpto. Electrotecnia y Electrónica. Escuela Politécnica Superior. Universidad de Córdoba. Av.Menéndez Pidal s/n. 14081. Córdoba. Spain.

On cache coherency and memory consistency issues in NoC based sharedmemory multiprocessor SoC architectures Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Exploration of distributed shared memory architecturesfor NoC-based multiprocessorsMatteo Monchiero, Gianluca Palermo, Cristina Silvano *, Oreste Villa