121
Towards Thousand-Core RISC-V Shared Memory Systems Quan Nguyen ([email protected]) Massachusetts Institute of Technology 30 November 2016

Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen ([email protected]) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Towards Thousand-Core RISC-V Shared Memory Systems

Quan Nguyen ([email protected]) Massachusetts Institute of Technology

30 November 2016

Page 2: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Outline

• The Tardis cache coherence protocol – Example – Scalability advantages

• Thousand-core prototype • RISC-V and Tardis

2

Page 3: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis

3

Page 4: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis

• Scalable cache coherence protocol

3

Page 5: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis

• Scalable cache coherence protocol– N-core system: O(log N) storage

3

Page 6: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis

• Scalable cache coherence protocol– N-core system: O(log N) storage

• Enforces consistency through timestamps

3

Page 7: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis

• Scalable cache coherence protocol– N-core system: O(log N) storage

• Enforces consistency through timestamps• Key idea: logical leases

3

Page 8: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis

• Scalable cache coherence protocol– N-core system: O(log N) storage

• Enforces consistency through timestamps• Key idea: logical leases– Can read if have valid lease

3

Page 9: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis

• Scalable cache coherence protocol– N-core system: O(log N) storage

• Enforces consistency through timestamps• Key idea: logical leases– Can read if have valid lease– Can write if lease expires

3Xiangyao Yu and Srinivas Devadas, “Tardis: Time Traveling Coherence Algorithm for Distributed Shared Memory”, PACT 2015.

Page 10: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagram

4

Page 11: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramCore

D$Core

D$

4

Page 12: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramCore

D$Core

D$

Last-level cache

Network-on-chip

4

Page 13: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramCore

D$Core

D$

Last-level cache

Main memory

Network-on-chip

4

Page 14: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

New state

5

Page 15: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

New state

• Per cache line:

5

Page 16: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

New state

• Per cache line:– Client:

tag state wts rts

5

Page 17: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

New state

• Per cache line:– Client:

– Manager:tag state owner wts rts

tag state wts rts

5

Page 18: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

New state

• Per cache line:– Client:

– Manager:

• Per core:

tag state owner wts rts

tag state wts rts

5

pts

Page 19: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

Page 20: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

Core 0: Core 1: store A load B store B load A load B

Page 21: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Page 22: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Page 23: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

Page 24: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

Page 25: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

• Core 0 stores A

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

Page 26: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

• Core 0 stores A• Manager leases A

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

A state: M owner: 0 wts: 0 rts: 0

Page 27: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

• Core 0 stores A• Manager leases A– Sets owner, wts, rts

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

A state: M owner: 0 wts: 0 rts: 0

Page 28: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

• Core 0 stores A• Manager leases A– Sets owner, wts, rts

• Core 0:

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

A state: M owner: 0 wts: 0 rts: 0

Page 29: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

• Core 0 stores A• Manager leases A– Sets owner, wts, rts

• Core 0:– Gets A with

[wts, rts] = [0, 0]

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

A state: M wts: 0 rts: 0

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

A state: M owner: 0 wts: 0 rts: 0

Page 30: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

• Core 0 stores A• Manager leases A– Sets owner, wts, rts

• Core 0:– Gets A with

[wts, rts] = [0, 0]– Writes A’, creates new

version at [1, 1]

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

A state: M wts: 0 rts: 0A’ state: M wts: 1 rts: 1

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

A state: M owner: 0 wts: 0 rts: 0

Page 31: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Example

6

• Core 0 stores A• Manager leases A– Sets owner, wts, rts

• Core 0:– Gets A with

[wts, rts] = [0, 0]– Writes A’, creates new

version at [1, 1]– Updates its pts to 1

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

pts: 1

A state: M wts: 0 rts: 0A’ state: M wts: 1 rts: 1

Manager

A state: I owner: wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

A state: M owner: 0 wts: 0 rts: 0

Page 32: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

7

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

Example

Page 33: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

7

• Core 0 loads B

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

Example

Page 34: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

7

• Core 0 loads B– Sends pts to manager

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0

Example

Page 35: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

7

• Core 0 loads B– Sends pts to manager

• Manager leases B

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0B state: S owner: wts: 1 rts: 11

Example

Page 36: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

7

• Core 0 loads B– Sends pts to manager

• Manager leases B– Sets lease based on pts

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0B state: S owner: wts: 1 rts: 11

Example

Page 37: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

7

• Core 0 loads B– Sends pts to manager

• Manager leases B– Sets lease based on pts– [wts, rts] = [1, 11]

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0B state: S owner: wts: 1 rts: 11

Example

Page 38: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

7

• Core 0 loads B– Sends pts to manager

• Manager leases B– Sets lease based on pts– [wts, rts] = [1, 11]

• Core 0 reads B at pts 1

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: I wts: rts:

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: I owner: wts: 0 rts: 0B state: S owner: wts: 1 rts: 11

B state: S wts: 1 rts: 11

Example

Page 39: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

8

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: S owner: wts: 1 rts: 11

Example

Page 40: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

8

• Core 1 stores B

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: S owner: wts: 1 rts: 11

Example

Page 41: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

8

• Core 1 stores B– Sends pts to manager

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: S owner: wts: 1 rts: 11B state: M owner: 1 wts: 1 rts: 11

Example

Page 42: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

8

• Core 1 stores B– Sends pts to manager

• Instantly grant Core 1 exclusive ownership

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: S owner: wts: 1 rts: 11B state: M owner: 1 wts: 1 rts: 11

B’ state: M wts: 12 rts: 12

Example

Page 43: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

8

• Core 1 stores B– Sends pts to manager

• Instantly grant Core 1 exclusive ownership

• Core 1 writes B’ at pts 12

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: S owner: wts: 1 rts: 11B state: M owner: 1 wts: 1 rts: 11

B’ state: M wts: 12 rts: 12

pts: 12

Example

Page 44: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

8

• Core 1 stores B– Sends pts to manager

• Instantly grant Core 1 exclusive ownership

• Core 1 writes B’ at pts 12

• Different versions of B coexist!

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 0

A state: I wts: rts:

B state: I wts: rts:

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: S owner: wts: 1 rts: 11B state: M owner: 1 wts: 1 rts: 11

B’ state: M wts: 12 rts: 12

pts: 12

Example

Page 45: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

Example

Page 46: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

• Core 1 loads A

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

Example

Page 47: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

• Core 1 loads A• Manager sends Core 0

writeback request

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

Example

Page 48: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

• Core 1 loads A• Manager sends Core 0

writeback request• Core 0 downgrades

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

A’ state: S wts: 1 rts: 1

Example

Page 49: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

• Core 1 loads A• Manager sends Core 0

writeback request• Core 0 downgrades

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

A’ state: S wts: 1 rts: 1

A’ state: S owner: wts: 1 rts: 1

Example

Page 50: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

• Core 1 loads A• Manager sends Core 0

writeback request• Core 0 downgrades• Core 1 receives new

lease based on its pts

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

A’ state: S wts: 1 rts: 1

A’ state: S owner: wts: 1 rts: 1A’ state: S owner: wts: 12 rts: 22

Example

Page 51: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

• Core 1 loads A• Manager sends Core 0

writeback request• Core 0 downgrades• Core 1 receives new

lease based on its pts– [wts, rts] = [12, 22]

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

A’ state: S wts: 1 rts: 1

A’ state: S owner: wts: 1 rts: 1A’ state: S owner: wts: 12 rts: 22

Example

Page 52: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

9

• Core 1 loads A• Manager sends Core 0

writeback request• Core 0 downgrades• Core 1 receives new

lease based on its pts– [wts, rts] = [12, 22]

• Core 1 reads A’ at pts 12

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: M wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A state: I wts: rts:

B’ state: M wts: 12 rts: 12

Manager

A state: M owner: 0 wts: 0 rts: 0

B state: M owner: 1 wts: 12 rts: 12

A’ state: S wts: 1 rts: 1

A’ state: S owner: wts: 1 rts: 1

A’ state: S wts: 12 rts: 22

A’ state: S owner: wts: 12 rts: 22

Example

Page 53: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

10

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: S wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A’ state: S wts: 12 rts: 22

B’ state: M wts: 12 rts: 12

Manager

A’ state: S owner: wts: 12 rts: 22

B state: M owner: 1 wts: 12 rts: 12

Example

Page 54: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

10

• Core 0 loads B

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: S wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A’ state: S wts: 12 rts: 22

B’ state: M wts: 12 rts: 12

Manager

A’ state: S owner: wts: 12 rts: 22

B state: M owner: 1 wts: 12 rts: 12

Example

Page 55: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

10

• Core 0 loads B• Cache hit; simply loads

B from data cache

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: S wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A’ state: S wts: 12 rts: 22

B’ state: M wts: 12 rts: 12

Manager

A’ state: S owner: wts: 12 rts: 22

B state: M owner: 1 wts: 12 rts: 12

Example

Page 56: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

10

• Core 0 loads B• Cache hit; simply loads

B from data cache• Sequential order ( ) ≠ physical order ( )

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: S wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A’ state: S wts: 12 rts: 22

B’ state: M wts: 12 rts: 12

Manager

A’ state: S owner: wts: 12 rts: 22

B state: M owner: 1 wts: 12 rts: 12

Example

Page 57: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

10

• Core 0 loads B• Cache hit; simply loads

B from data cache• Sequential order ( ) ≠ physical order ( )

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: S wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A’ state: S wts: 12 rts: 22

B’ state: M wts: 12 rts: 12

Manager

A’ state: S owner: wts: 12 rts: 22

B state: M owner: 1 wts: 12 rts: 12

Example

Page 58: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

10

• Core 0 loads B• Cache hit; simply loads

B from data cache• Sequential order ( ) ≠ physical order ( )

Core 0: Core 1: store A load B store B load A load B

Core 0 pts: 1

A’ state: S wts: 1 rts: 1

B state: S wts: 1 rts: 11

Core 1 pts: 12

A’ state: S wts: 12 rts: 22

B’ state: M wts: 12 rts: 12

Manager

A’ state: S owner: wts: 12 rts: 22

B state: M owner: 1 wts: 12 rts: 12

Example

Page 59: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

A case for scalability

11

Page 60: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

A case for scalability

• Track only one node: O(log N) storage

11

Page 61: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

A case for scalability

• Track only one node: O(log N) storage• No broadcast invalidations

11

Page 62: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

A case for scalability

• Track only one node: O(log N) storage• No broadcast invalidations• Timestamps not tied to core count

11

Page 63: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

A case for scalability

• Track only one node: O(log N) storage• No broadcast invalidations• Timestamps not tied to core count– Can be compressed

11

Page 64: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

A case for scalability

• Track only one node: O(log N) storage• No broadcast invalidations• Timestamps not tied to core count– Can be compressed– No need for synchronized real-time clocks

11

Page 65: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Outline

• The Tardis cache coherence protocol • Thousand-core prototype • RISC-V and Tardis

12

Page 66: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

13

Page 67: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

• Fit as many cores will fit on a ZC706

13

Page 68: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

• Fit as many cores will fit on a ZC706• Connect in a 3D mesh

13

Page 69: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

• Fit as many cores will fit on a ZC706• Connect in a 3D mesh– Aurora links, six connectors per board

13

Page 70: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

• Fit as many cores will fit on a ZC706• Connect in a 3D mesh– Aurora links, six connectors per board

• Demonstrate shared memory at scale

13

Page 71: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

• Fit as many cores will fit on a ZC706• Connect in a 3D mesh– Aurora links, six connectors per board

• Demonstrate shared memory at scale• Name:

13

Page 72: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

• Fit as many cores will fit on a ZC706• Connect in a 3D mesh– Aurora links, six connectors per board

• Demonstrate shared memory at scale• Name:

13

T-1000

Page 73: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thousand-core shared memory systems

• Fit as many cores will fit on a ZC706• Connect in a 3D mesh– Aurora links, six connectors per board

• Demonstrate shared memory at scale• Name:

13Terminator 2: Judgment Day (1991) Carolco Pictures, Lightstorm Entertainment, Le Studio Canal+, and TriStar Pictures

T-1000

Page 74: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Outline

• The Tardis cache coherence protocol • Thousand-core prototype • RISC-V and Tardis

14

Page 75: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis and RISC-V

15Quan Nguyen, “Synchronization in Timestamp-Based Cache Coherence Protocols”, S.M. thesis, MIT, 2016.

Page 76: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis and RISC-V

• RISC-V: clean, extensible, orthogonal, free

15Quan Nguyen, “Synchronization in Timestamp-Based Cache Coherence Protocols”, S.M. thesis, MIT, 2016.

Page 77: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis and RISC-V

• RISC-V: clean, extensible, orthogonal, free• Chisel simplifies extending hardware

15Quan Nguyen, “Synchronization in Timestamp-Based Cache Coherence Protocols”, S.M. thesis, MIT, 2016.

Page 78: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Tardis and RISC-V

• RISC-V: clean, extensible, orthogonal, free• Chisel simplifies extending hardware• Things to consider: – Release consistency – Atomic instructions – Synchronization (see S.M.)

15Quan Nguyen, “Synchronization in Timestamp-Based Cache Coherence Protocols”, S.M. thesis, MIT, 2016.

Page 79: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Type Ordering rule Tardis rule

Consistency model comparison

16<p : program order <s : global memory order <ts : timestamp order

Page 80: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Type Ordering rule Tardis rule

SC X <p Y ⟹ X <s Y X <p Y ⟹ X <ts Y

Consistency model comparison

16<p : program order <s : global memory order <ts : timestamp order

Page 81: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Type Ordering rule Tardis rule

SC X <p Y ⟹ X <s Y X <p Y ⟹ X <ts Y

Consistency model comparison

16<p : program order <s : global memory order <ts : timestamp order

Page 82: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Type Ordering rule Tardis rule

SC X <p Y ⟹ X <s Y X <p Y ⟹ X <ts Y

RC: ordinary memory ops

respect dependencies

respect dependencies

Consistency model comparison

16<p : program order <s : global memory order <ts : timestamp order

Page 83: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Type Ordering rule Tardis rule

SC X <p Y ⟹ X <s Y X <p Y ⟹ X <ts Y

RC: ordinary memory ops

respect dependencies

respect dependencies

RC: acquires acq <p X ⟹ acq <s X acq <p X ⟹ acq <ts X

Consistency model comparison

16<p : program order <s : global memory order <ts : timestamp order

Page 84: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Type Ordering rule Tardis rule

SC X <p Y ⟹ X <s Y X <p Y ⟹ X <ts Y

RC: ordinary memory ops

respect dependencies

respect dependencies

RC: acquires acq <p X ⟹ acq <s X acq <p X ⟹ acq <ts X

RC: releases X <p rel ⟹ X <s rel X <p rel ⟹ X <ts rel

Consistency model comparison

16<p : program order <s : global memory order <ts : timestamp order

Page 85: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Type Ordering rule Tardis rule

SC X <p Y ⟹ X <s Y X <p Y ⟹ X <ts Y

RC: ordinary memory ops

respect dependencies

respect dependencies

RC: acquires acq <p X ⟹ acq <s X acq <p X ⟹ acq <ts X

RC: releases X <p rel ⟹ X <s rel X <p rel ⟹ X <ts rel

RC: sync S ∈ {acq, rel}

SX <p SY ⟹ SX <s SY SX <p SY ⟹ SX <ts SY

Consistency model comparison

16<p : program order <s : global memory order <ts : timestamp order

Page 86: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Release consistency and Tardis

17

Page 87: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Release consistency and Tardis

• tsmin: minimum timestamp for future ops

17

Page 88: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Release consistency and Tardis

• tsmin: minimum timestamp for future ops

• tsmax: maximal timestamp of preceding ops (in timestamp order)

17

Page 89: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Release consistency and Tardis

• tsmin: minimum timestamp for future ops

• tsmax: maximal timestamp of preceding ops (in timestamp order)

• Fences: tsmin ← tsmax

17

Page 90: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Release consistency and Tardis

• tsmin: minimum timestamp for future ops

• tsmax: maximal timestamp of preceding ops (in timestamp order)

• Fences: tsmin ← tsmax

• Track acquires/releases with tsrel

17

Page 91: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Release consistency and Tardis

• tsmin: minimum timestamp for future ops

• tsmax: maximal timestamp of preceding ops (in timestamp order)

• Fences: tsmin ← tsmax

• Track acquires/releases with tsrel

– Release: tsrel ← tsmax

17

Page 92: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Release consistency and Tardis

• tsmin: minimum timestamp for future ops

• tsmax: maximal timestamp of preceding ops (in timestamp order)

• Fences: tsmin ← tsmax

• Track acquires/releases with tsrel

– Release: tsrel ← tsmax

– Acquire: tsmin ← tsrel

17

Page 93: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Load-reserved and store-conditional

18

Page 94: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Load-reserved and store-conditional

• Tardis gives neat solution to LR/SC livelock

18

Page 95: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Load-reserved and store-conditional

• Tardis gives neat solution to LR/SC livelock• wts tracks cache line version

18

Page 96: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Load-reserved and store-conditional

• Tardis gives neat solution to LR/SC livelock• wts tracks cache line version• SC success condition: wtslr == wtsbefore sc

18

Page 97: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

Page 98: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

Page 99: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

Page 100: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M owner: 0 wts: 0 rts: 0

Page 101: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

Page 102: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

Page 103: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

Page 104: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

Page 105: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

C state: S wts: 0 rts: 0

Page 106: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: S wts: 0 rts: 0

Page 107: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

Page 108: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

Page 109: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc– core 1 downgraded

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

Page 110: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc– core 1 downgraded

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: S wts: 0 rts: 0

Page 111: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc– core 1 downgraded

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

Page 112: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc– core 1 downgraded– succeeds; wtslr == wtsC

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

Page 113: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc– core 1 downgraded– succeeds; wtslr == wtsC

– writes C’ at pts 1

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

Page 114: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc– core 1 downgraded– succeeds; wtslr == wtsC

– writes C’ at pts 1

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

C’ state: M wts: 1 rts: 1

Page 115: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

19

• Core 0 performs lr on C– exclusive ownership– wtslr = 0

• Core 1 performs lr on C– core 0 downgraded

• Core 0 performs sc– core 1 downgraded– succeeds; wtslr == wtsC

– writes C’ at pts 1

loop: lr.d x1, 0(C) <do stuff to x1> sc.d x2, x1, 0(C) bnez x2, loop

Core 0 pts: 0

C state: I wts: 0 rts: 0

Core 1 pts: 0

C state: I wts: 0 rts: 0

Manager

C state: I owner: wts: 0 rts: 0

LR/SC example

C state: M wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0C state: M owner: 1 wts: 0 rts: 0

C state: M wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: S wts: 0 rts: 0

C state: M owner: 0 wts: 0 rts: 0

C’ state: M wts: 1 rts: 1

pts: 1

Page 116: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramRocket Core

HellaCache D$Rocket Core

HellaCache D$

Last-level cache

Main memory

TileLink NoC

20

Page 117: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramRocket Core

HellaCache D$Rocket Core

HellaCache D$

Last-level cache

Main memory

TileLink NoC

20

timestamps

Page 118: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramRocket Core

HellaCache D$Rocket Core

HellaCache D$

Last-level cache

Main memory

TileLink NoC

20

timestamps

metadata, hit/miss logic

Page 119: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramRocket Core

HellaCache D$Rocket Core

HellaCache D$

Last-level cache

Main memory

TileLink NoC

20

timestamps

metadata, hit/miss logic

message timestamps

Page 120: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Block diagramRocket Core

HellaCache D$Rocket Core

HellaCache D$

Last-level cache

Main memory

TileLink NoC

20

timestamps

metadata, hit/miss logic

message timestamps

new coherence logic

Page 121: Towards Thousand-Core RISC-V Shared Memory Systems€¦ · Quan Nguyen (qmn@mit.edu) Massachusetts Institute of Technology 30 November 2016. Outline • The Tardis cache coherence

Thanks!

• Special thanks to Xiangyao Yu and Srini Devadas for their extensive advice and input

21