Synchronized Progress in Interconnection Networks...

Preview:

Citation preview

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for

deadlock freedom

Aniruddh Ramrakhyani Georgia Tech

(aniruddh@gatech.edu)

Tushar Krishna Georgia Tech

(tushar@ece.gatech.edu)

Paul V. Gratz Texas A&M University

(pgratz@tamu.edu)

ISCA 2018 Session 8B: Interconnection Networks

Network Routing2

Network Routing2

B

A

C

F

E

D

G

H

I

J

K

Network Routing2

B

A

C

F

E

D

G

H

I

J

K

Network Routing2

B

A

C

F

E

D

G

H

I

J

K

Network Routing2

B

A

C

F

E

D

G

H

I

J

K

Deadlock

Routing Deadlocks3

Routing Deadlocks3

A

C D

BE

F

Routing Deadlocks3

A

C D

B

Deadlock

E

F

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

3

A

C D

B

Deadlock

E

F

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

4

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.

4

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.

Cause system breakdown and kill chips.

4

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.

Cause system breakdown and kill chips.

Deadlocks are hard to detect.

4

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.

Cause system breakdown and kill chips.

Deadlocks are hard to detect.

Manifest after a long use time.

4

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.

Cause system breakdown and kill chips.

Deadlocks are hard to detect.

Manifest after a long use time.

Show up due to system wear out faults and power-gating of network elements which are hard to simulate.

4

Routing DeadlocksA Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible.

Deadlocks are a fundamental problem in both off-chip and on-chip interconnection networks.

Cause system breakdown and kill chips.

Deadlocks are hard to detect.

Manifest after a long use time.

Show up due to system wear out faults and power-gating of network elements which are hard to simulate.

Need a solution for functional correctness !!

4

Solution I: Dally’s Theory5

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.       

5

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.       

5

A

C D

BE

F

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.       

5

A

C D

BE

F1

2

3

45

6

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.       

5

A

C D

BE

F1

2

3

45

6

Higher to Lower not

allowed

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.       

5

A

C D

BE

F1

2

3

45

6

Higher to Lower not

allowed

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.  

6

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.  

Implementations: Turn model [5], XY routing, Up-Down routing [20].

6

Solution I: Dally’s TheoryDefines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created.  

Implementations: Turn model [5], XY routing, Up-Down routing [20].

Limitations:

Routing Restrictions: Increased Latency, Throughput loss, Energy overhead

Require large no. of VCs for fully adaptive routing.

6

Solution II: Duato’s Theory7

Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

7

Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

7

A

C D

BE

F

Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

7

A

C D

BE

F

Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

7

E

FVC0

VC0

Escape-VC

Escape-VC

A

C D

BE

F

Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

7

E

FVC0

VC0

Escape-VC

Escape-VC

A

C D

BE

F

Solution II: Duato’s TheoryAdds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

7

E

FVC0

VC0

Escape-VC

Escape-VC

Solution II: Duato’s Theory8

Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

Solution II: Duato’s Theory

Limitations:

Energy and Area overhead of escape VCs.

Additional routing tables/logic for routing within escape-VC.

8

Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks.

Implementation: turn restrictions in escape-VC.

Other Solutions9

Other SolutionsSolution III: Flow Control

9

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

9

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

Implementation: Bubble Flow Control [9]

9

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

Implementation: Bubble Flow Control [9]

Limitation: Implementation Complexity.

9

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

Implementation: Bubble Flow Control [9]

Limitation: Implementation Complexity.

Solution IV: Deflection Routing

9

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

Implementation: Bubble Flow Control [9]

Limitation: Implementation Complexity.

Solution IV: Deflection Routing

Assign every flit to some output port even if they get misrouted.

9

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

Implementation: Bubble Flow Control [9]

Limitation: Implementation Complexity.

Solution IV: Deflection Routing

Assign every flit to some output port even if they get misrouted.

Implementation: BLESS [10], CHIPPER [35]

9

Other SolutionsSolution III: Flow Control

Restrict injection when no. of empty buffers fall below a threshold

Implementation: Bubble Flow Control [9]

Limitation: Implementation Complexity.

Solution IV: Deflection Routing

Assign every flit to some output port even if they get misrouted.

Implementation: BLESS [10], CHIPPER [35]

Limitation: Livelocks

9

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Theory

Metric

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

Theory

Metric

1 6

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

Duato

Theory

Metric

1 61 2

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

DuatoFlow

Control

Theory

Metric

1 61 2

2 2

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

DuatoFlow

ControlDeflection Routing

Theory

Metric

1 61 2

2 2

1

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

DuatoFlow

ControlDeflection Routing

Theory

Metric

1 61 2

2 2

1

Can we do better ??

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

DuatoFlow

ControlDeflection Routing

Theory

Metric

1 61 2

2 2

1

Can we do better ??

Comparison of Deadlock Freedom Theories10

Acyclic CDG not Required

No Packet Injection

RestrictionsLivelock

Free

VC cost for Mesh Routing

Minimal Adaptive

Topology Indepen-

dent

Dally

DuatoFlow

ControlDeflection Routing

SPIN

Theory

Metric

1 61 2

2 2

1

1 1

Can we do better ??

OutlineRouting Deadlocks

State of the Art

Dally’s Theory

Duato’s Theory

Flow Control Routing

Deflection Routing

SPIN : Synchronized Progress in Interconnection Networks

Evaluations

Conclusion

11

SPIN : Key Idea12

SPIN : Key Idea12

A

C D

B

Deadlock

E

F

SPIN : Key Idea12

A

C D

B

Deadlock

E

F

What if: We coordinate the movement of every packet to the next hop at a given time ??

SPIN : Key Idea12

A

C D

B

Deadlock

E

F

What if: We coordinate the movement of every packet to the next hop at a given time ??

SPIN : Key Idea12

A

CDB Deadlock

EF

What if: We coordinate the movement of every packet to the next hop at a given time ??

SPIN : Key Idea12

A

CDB Deadlock

EF

What if: We coordinate the movement of every packet to the next hop at a given time ??

Simultaneous Synchronized

Movement

SPIN : Key Idea12

A

CDB Deadlock

EF

What if: We coordinate the movement of every packet to the next hop at a given time ??

Simultaneous Synchronized

Movement

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

SPIN : Key Idea12

A

C

D

BDeadlock

EF

What if: We coordinate the movement of every packet to the next hop at a given time ??

Simultaneous Synchronized

Movement

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

SPIN : Key Idea12

A

C

D

BDeadlock

EF

What if: We coordinate the movement of every packet to the next hop at a given time ??

Simultaneous Synchronized

Movement

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

spin complete

SPIN : Key Idea13

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

SPIN : Key Idea13

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

Each spin leads to one hop forward movement of all deadlock packets.

SPIN : Key Idea13

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

Each spin leads to one hop forward movement of all deadlock packets.

One spin may not resolve the deadlock. If so, spin can be repeated

SPIN : Key Idea13

Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin.

Each spin leads to one hop forward movement of all deadlock packets.

One spin may not resolve the deadlock. If so, spin can be repeated

Deadlock is guaranteed to be resolved in a finite number of spins [proof in paper, Sec. III]

SPIN : Key Idea14

SPIN : Key Idea14

A

C D

BE

F

SPIN : Key Idea14

A

C

D

B

EF

First spin complete

SPIN : Key Idea14

A

C

D

B

EF

First spin complete

SPIN : Key Idea14

A

C

D

B

E

F

First spin complete

SPIN : Key Idea14

A

C

D

B

E

F

First spin complete

Second spin complete

SPIN : Key Idea15

A

C

D

B

E

F

SPIN : Key Idea15

A

C

D

B

E

F

SPIN : Key Idea15

A

C

D

B

E

F

Packets E &B exit the loop

SPIN : Key Idea15

A

C

D

B

E

F

Packets E &B exit the loop

Deadlock Resolved

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Key Idea

Implementation Example

Micro-architecture

FAvORS

Evaluations

Conclusion

16

SPIN: Implementation Example17

SPIN: Implementation ExampleSPIN is a generic deadlock freedom theory that can have multiple implementations.

17

SPIN: Implementation ExampleSPIN is a generic deadlock freedom theory that can have multiple implementations.

SPIN Implementation:

Detect the Deadlock.

Coordinate a time for spin.

Execute the spin.

17

SPIN: Implementation ExampleSPIN is a generic deadlock freedom theory that can have multiple implementations.

SPIN Implementation:

Detect the Deadlock.

Coordinate a time for spin.

Execute the spin.

We choose a recovery approach as deadlocks are rare scenarios (See Sec. II-F).

17

Implementation Example : Detect Deadlocks18

Implementation Example : Detect Deadlocks

Use counters.

18

Implementation Example : Detect Deadlocks

Use counters.

Placed at every node at design time.

18

Implementation Example : Detect Deadlocks

Use counters.

Placed at every node at design time.Optimize by exploiting topology symmetry (See Static Bubble [6]).

18

Implementation Example : Detect Deadlocks

Use counters.

Placed at every node at design time.Optimize by exploiting topology symmetry (See Static Bubble [6]).

If packet does not leave in threshold time (configurable), it indicates a potential deadlock.

18

Implementation Example : Detect Deadlocks

Use counters.

Placed at every node at design time.Optimize by exploiting topology symmetry (See Static Bubble [6]).

If packet does not leave in threshold time (configurable), it indicates a potential deadlock.

Counter expired ? Send probe to verify deadlock.

18

Implementation Example : Probe Msg.19

A

C D

BE

F1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

ProbeSend Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

ProbeSend Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Probe Msg.19

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Probe Returns: Deadlock Confirmed

Implementation Example : Probe Msg.20

Implementation Example : Probe Msg.Probe is a special message that tracks the buffer dependency.

20

Implementation Example : Probe Msg.Probe is a special message that tracks the buffer dependency.

Probe returns to sender:

20

Implementation Example : Probe Msg.Probe is a special message that tracks the buffer dependency.

Probe returns to sender: Cyclic buffer dependence, hence deadlock.

20

Implementation Example : Probe Msg.Probe is a special message that tracks the buffer dependency.

Probe returns to sender: Cyclic buffer dependence, hence deadlock.

Next, send a move msg. to convey the spin time

20

Implementation Example : Probe Msg.Probe is a special message that tracks the buffer dependency.

Probe returns to sender: Cyclic buffer dependence, hence deadlock.

Next, send a move msg. to convey the spin timeUpon receiving move msg., router sets its counter to count to spin cyle.

20

Implementation Example : Move Msg.21

A

C D

BE

F1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

F

Send Move

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

F

Move

Send Move

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

FMove

Send Move

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

FMove

Send Move

Set counter to count to spin cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

FMove

Send Move

Set counter to count to spin cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

F

MoveSend Move

Set counter to count to spin cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

F

MoveSend Move

Set counter to count to spin cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : Move Msg.21

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

Move returns

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : spin22

A

C D

BE

F1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : spin22

A

C D

BE

F

Counters expire together in the spin

cycle

1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Implementation Example : spin23

A

C

D

B

EF1. Deadlock Detection

2. Coordinating the spin.

3. Executing the spin.

Multiple SPIN Optimization24

Multiple SPIN OptimizationResolving a deadlock may require multiple spins

24

Multiple SPIN OptimizationResolving a deadlock may require multiple spins

After spin, router can resume normal operation.

24

Multiple SPIN OptimizationResolving a deadlock may require multiple spins

After spin, router can resume normal operation.

Counter expires again, process repeated.

24

Multiple SPIN OptimizationResolving a deadlock may require multiple spins

After spin, router can resume normal operation.

Counter expires again, process repeated.

Optimization: send probe_move after spin is complete.

24

Multiple SPIN OptimizationResolving a deadlock may require multiple spins

After spin, router can resume normal operation.

Counter expires again, process repeated.

Optimization: send probe_move after spin is complete.

probe_move checks if deadlock still exists and if so, sets the time for the next spin.

24

Multiple SPIN OptimizationResolving a deadlock may require multiple spins

After spin, router can resume normal operation.

Counter expires again, process repeated.

Optimization: send probe_move after spin is complete.

probe_move checks if deadlock still exists and if so, sets the time for the next spin.

Details in paper (Sec. IV-B).

24

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Key Idea

Implementation Example

Micro-architecture

FAvORS

Evaluations

Conclusion

25

Implementation Micro-architecture26

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.

26

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.

Spl. Msgs. have higher priority in link usage over regular flits.

26

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.

Spl. Msgs. have higher priority in link usage over regular flits.

Links are anyways idle during deadlocks.

26

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.

Spl. Msgs. have higher priority in link usage over regular flits.

Links are anyways idle during deadlocks.

Bufferless Forwarding: Spl. Msgs. are not buffered anywhere (either forwarded or dropped).

26

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.

Spl. Msgs. have higher priority in link usage over regular flits.

Links are anyways idle during deadlocks.

Bufferless Forwarding: Spl. Msgs. are not buffered anywhere (either forwarded or dropped).

Distributed Design: any router can initiate the recovery.

26

Implementation Micro-architectureNo additional links: Spl. Msgs. use the same links as regular flits.

Spl. Msgs. have higher priority in link usage over regular flits.

Links are anyways idle during deadlocks.

Bufferless Forwarding: Spl. Msgs. are not buffered anywhere (either forwarded or dropped).

Distributed Design: any router can initiate the recovery.

4% area overhead compared to traditional mesh router in 15nm Nangate [42].

26

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Key Idea

Walkthrough Example

Micro-architecture

FAvORS

Evaluations

Conclusion

27

FAvORS Routing Algorithm28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors:

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors: Minimal Adaptive

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors: Minimal AdaptiveNon-minimal Adaptive.

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors: Minimal AdaptiveNon-minimal Adaptive.

Route Selection Metrics:

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors: Minimal AdaptiveNon-minimal Adaptive.

Route Selection Metrics:Credit turn-around time

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors: Minimal AdaptiveNon-minimal Adaptive.

Route Selection Metrics:Credit turn-around timeHop Count

28

FAvORS Routing AlgorithmSPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology.

FAvORS : Fully Adaptive One-vc Routing with SPIN.

Algorithm has two flavors: Minimal AdaptiveNon-minimal Adaptive.

Route Selection Metrics:Credit turn-around timeHop Count

More details in paper (Sec. V).

28

OutlineRouting Deadlocks

State of the Art

SPIN : Synchronized Progress in Interconnection Networks

Evaluations

Conclusion

29

Evaluations 30

Simulator

Topologies

Link Latency

Traffic

Network Configuration:

Evaluations 30

Simulator

Topologies

Link Latency

Traffic

gem5 + Garnet 2.0 Network simulator

8x8 Mesh

1-cycle

Synthetic + Multi-threaded (PARSEC)

Network Configuration:

Evaluations 30

Simulator

Topologies

Link Latency

Traffic

gem5 + Garnet 2.0 Network simulator

8x8 Mesh

1-cycle

Synthetic + Multi-threaded (PARSEC)

1024 node Off-chip Dragon-fly

Inter-group: 1-cycleIntra-group: 3-cycle

Synthetic

Network Configuration:

Evaluations : Baselines31

Evaluations : Baselines8x8 Mesh:

31

Design Routing Adaptivity

Minimal Theory Deadlock Freedom Type

West-first Routing Partial Yes Dally Avoidance

Escape-VC Full Yes Duato Avoidance

Static-Bubble [6] Full Yes Flow-Control Recovery

Evaluations : Baselines8x8 Mesh:

31

Design Routing Adaptivity

Minimal Theory Deadlock Freedom Type

West-first Routing Partial Yes Dally Avoidance

Escape-VC Full Yes Duato Avoidance

Static-Bubble [6] Full Yes Flow-Control Recovery

1024 Node Off-chip Dragon-fly:

Design Routing Adaptivity

Minimal Theory Deadlock Freedom Type

UGAL [37] Full No Dally Avoidance

Saturation Throughput 1024-node Off-chip Dragon-fly:

32

Saturation Throughput 1024-node Off-chip Dragon-fly:

32

0255075

100

0.01 0.03 0.05 0.07 0.09 0.11

Bit-complement

Late

ncy

(cyc

les)

Inj. Rate (flits/node/cycle)

UGAL_3VCSPIN

UGAL_3VCDally

FAvORS_NMin_1VCSPIN

Minimal_1VCSPIN

Saturation Throughput 1024-node Off-chip Dragon-fly:

32

0255075

100

0.01 0.03 0.05 0.07 0.09 0.11

Bit-complement

Late

ncy

(cyc

les)

Inj. Rate (flits/node/cycle)

UGAL_3VCSPIN

UGAL_3VCDally

FAvORS_NMin_1VCSPIN

Minimal_1VCSPIN

50% higher throughput compared to UGAL_Dally

Saturation Throughput 1024-node Off-chip Dragon-fly:

32

0255075

100

0.01 0.03 0.05 0.07 0.09 0.11

Bit-complement

Late

ncy

(cyc

les)

Inj. Rate (flits/node/cycle)

0255075

100

0.01 0.08 0.15 0.22 0.29 0.36Inj. Rate (flits/node/cycle)

Neighbor

Late

ncy

(cyc

les)

UGAL_3VCSPIN

UGAL_3VCDally

FAvORS_NMin_1VCSPIN

Minimal_1VCSPIN

50% higher throughput compared to UGAL_Dally

Saturation Throughput 1024-node Off-chip Dragon-fly:

32

0255075

100

0.01 0.03 0.05 0.07 0.09 0.11

Bit-complement

Late

ncy

(cyc

les)

Inj. Rate (flits/node/cycle)

0255075

100

0.01 0.08 0.15 0.22 0.29 0.36Inj. Rate (flits/node/cycle)

Neighbor

Late

ncy

(cyc

les)

UGAL_3VCSPIN

UGAL_3VCDally

FAvORS_NMin_1VCSPIN

Minimal_1VCSPIN

50% higher throughput compared to UGAL_Dally

25% higher throughput compared to UGAL_Dally

Saturation Throughput 1024-node Off-chip Dragon-fly:

32

0255075

100

0.01 0.03 0.05 0.07 0.09 0.11

Bit-complement

Late

ncy

(cyc

les)

Inj. Rate (flits/node/cycle)

0255075

100

0.01 0.08 0.15 0.22 0.29 0.36Inj. Rate (flits/node/cycle)

Neighbor

Late

ncy

(cyc

les)

UGAL_3VCSPIN

UGAL_3VCDally

FAvORS_NMin_1VCSPIN

Minimal_1VCSPIN

50% higher throughput compared to UGAL_Dally

62% higher throughput compared to

Minimal Routing 1-VC

25% higher throughput compared to UGAL_Dally

Saturation Throughput 33

8x8 On-chip Mesh:

Saturation Throughput 33

8x8 On-chip Mesh:

Transpose (3-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

West-First_3VCDally

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Static_Bubble_3VCFlow-Control

EscapeVC_3VCDuato

FAvORS_Min_3VCSPIN

Saturation Throughput 33

8x8 On-chip Mesh:

Transpose (3-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

West-First_3VCDally

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Static_Bubble_3VCFlow-Control

EscapeVC_3VCDuato

FAvORS_Min_3VCSPIN

68% higher throughput compared

to West-first 3-VC

Saturation Throughput 33

8x8 On-chip Mesh:

Transpose (3-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

West-First_3VCDally

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Static_Bubble_3VCFlow-Control

EscapeVC_3VCDuato

FAvORS_Min_3VCSPIN

68% higher throughput compared

to West-first 3-VC

10% higher throughput

compared to Static-Bubble 3-VC

Saturation Throughput 33

8x8 On-chip Mesh:

Transpose (3-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

West-First_3VCDally

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Static_Bubble_3VCFlow-Control

EscapeVC_3VCDuato

FAvORS_Min_3VCSPIN

68% higher throughput compared

to West-first 3-VC

8% higher throughput compared

to Escape-VC 3-VC

10% higher throughput

compared to Static-Bubble 3-VC

Saturation Throughput 33

8x8 On-chip Mesh:

Transpose (3-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

West-First_3VCDally

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Static_Bubble_3VCFlow-Control

EscapeVC_3VCDuato

FAvORS_Min_3VCSPIN

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Transpose (1-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

FAvORS_Min_1VCSPIN

West-First_1VCDally

68% higher throughput compared

to West-first 3-VC

8% higher throughput compared

to Escape-VC 3-VC

10% higher throughput

compared to Static-Bubble 3-VC

Saturation Throughput 33

8x8 On-chip Mesh:

Transpose (3-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

West-First_3VCDally

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Static_Bubble_3VCFlow-Control

EscapeVC_3VCDuato

FAvORS_Min_3VCSPIN

0

25

50

75

100

0.001 0.031 0.061 0.091 0.121

Transpose (1-VC)

Inj. Rate (flits/node/cycle)

Late

ncy

(cyc

les)

FAvORS_Min_1VCSPIN

West-First_1VCDally

68% higher throughput compared

to West-first 3-VC

8% higher throughput compared

to Escape-VC 3-VC

80% higher throughput compared

to West-First 1-VC

10% higher throughput

compared to Static-Bubble 3-VC

Conclusion34

ConclusionDeadlocks are a fundamental problem in Interconnection Networks.

34

ConclusionDeadlocks are a fundamental problem in Interconnection Networks.

SPIN is a generic deadlock freedom theory

Scalable: Distributed Deadlock Resolution

Plug-n-Play: topology agnostic

Enables true one-VC fully adaptive routing for any topology

34

ConclusionDeadlocks are a fundamental problem in Interconnection Networks.

SPIN is a generic deadlock freedom theory

Scalable: Distributed Deadlock Resolution

Plug-n-Play: topology agnostic

Enables true one-VC fully adaptive routing for any topology

Performance (Saturation Throughput):

On-chip mesh: upto 68% higher

Off-chip Dragon-fly: upto 62% higher

34

ConclusionPractical Applications:

35

ConclusionPractical Applications:

35

On-Chip Mesh (Intel SCC, Tilera Tile64)

Super-computers Dragon-fly (Cray XC Networks)

Datacenters JellyFish (HP), Fat Tree (Google)

Irregular Topologies Faults (Static Bubble [6])Power-gating (Router Parking[29])

NoC Generators FlexNoc (ARTERIS),Sonics GN

Domain specific Accelerators

Corelink Interconnect (ARM)

ConclusionPractical Applications:

35

On-Chip Mesh (Intel SCC, Tilera Tile64)

Super-computers Dragon-fly (Cray XC Networks)

Datacenters JellyFish (HP), Fat Tree (Google)

Irregular Topologies Faults (Static Bubble [6])Power-gating (Router Parking[29])

NoC Generators FlexNoc (ARTERIS),Sonics GN

Domain specific Accelerators

Corelink Interconnect (ARM)

Thank you !!

Back-up36

SPIN : Applications37

SPIN : ApplicationsSPIN is a generic deadlock freedom theory

Scalable: distributed deadlock resolution

Plug-n-Play: doesn’t require knowledge of topology

37

SPIN : ApplicationsSPIN is a generic deadlock freedom theory

Scalable: distributed deadlock resolution

Plug-n-Play: doesn’t require knowledge of topology

SPIN can thus be used in :

On-chip networks: Mesh (Intel SCC, Tilera Tile64)

Supercomputers: Dragon-fly (Cray XC Networks)

Datacenters: Jellyfish (HP), Fat Tree (Google)

Static & Dynamically Changing Irregular topologies due to faults (Static Bubble [6]) & power-gating (Router Parking [29])

NoC Generators (Opensmart [13]) & Domain specific accelerator (Eyeriss[15])

37

Implementation Example : Probe Msg.38

A

C D

BE

F

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Send Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

ProbeSend Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

ProbeSend Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

Implementation Example : Probe Msg.38

A

C D

BE

FCounter

Expires at Node 5

Probe

Send Probe

Probe Returns: Deadlock Confirmed

Implementation Example : Move Msg.39

A

C D

BE

F1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

F

Send Move

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

F

Move

Send Move

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

FMove

Send Move

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

FMove

Send Move

Set counter to count to spin cycle

Cntr

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

FMove

Send Move

Set counter to count to spin cycle

Cntr

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

Cntr

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

F

MoveSend Move

Set counter to count to spin cycle

Cntr

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

F

MoveSend Move

Set counter to count to spin cycle

Cntr

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

Cntr

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Implementation Example : Move Msg.39

A

C D

BE

F

Move

Send Move

Set counter to count to spin cycle

Cntr

Move returns

1. Counter Expires2. Send Probe3. Send Move4. Counter expires

in spin cycle5. Spin

Recommended