Upload
elmer-bunce
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
NC STATE UNIVERSITYASPLOS-XII
Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage
Fault Tolerance
Vimal ReddySailashri Parthasarathy†
Eric Rotenberg
Dept. of Electrical & Computer EngineeringNorth Carolina State UniversityRaleigh, NC
† Architecture Modeling Infrastructure Group Intel Corporation Hudson, MA
© 2006 Vimal Reddy
2
NC STATE UNIVERSITYASPLOS-XII
Transient Fault Tolerance
• Transient faults– Temporary hardware faults– Worsening with shrinking technology
• Soft errors• Noise
• Prominent solution: Redundant Multithreading– Full program duplication– Complete fault tolerance– High overheads
© 2006 Vimal Reddy
3
NC STATE UNIVERSITYASPLOS-XII
Full Redundant Execution
• Full duplication• 100% fault coverageA(f1)
B(f1)
Main Thread
B(f2)
A(f2)
Redundant Thread
!=
fault
fault
!=
detected
detected
© 2006 Vimal Reddy
4
NC STATE UNIVERSITYASPLOS-XII
Partial Redundant Threading (PRT)
• Only partially duplicate a program• Shorter the redundant thread, lesser the
overhead• Approach taken to create partial thread affects
fault tolerance
© 2006 Vimal Reddy
5
NC STATE UNIVERSITYASPLOS-XII
fault
!=
Conventional PRT
Partial Redundant
Thread
copy state
detected
A(f)
B(f)
Main Thread
B(p)
© 2006 Vimal Reddy
6
NC STATE UNIVERSITYASPLOS-XII
fault
Conventional PRT
B(p)
copy state
==
• Arbitrary duplication• Non-duplicated portions
lose fault coverage
undetected
A(f)
B(f)
Main Thread
Partial Redundant
Thread
© 2006 Vimal Reddy
7
NC STATE UNIVERSITYASPLOS-XII
Prediction-based PRT
copy state
PAA(f)
B(f)
Main Thread
Confident prediction of A
Freq. Case: Correct prediction(e.g., 99.9% of the time)
Partial Redundant
Thread
B(p)
fault
!=detected
© 2006 Vimal Reddy
8
NC STATE UNIVERSITYASPLOS-XII
fault
Prediction-based PRT
PAA(f)
B(f)
Main Thread
!=detected
Partial Redundant
Thread
B(p)
Confident prediction of A
Freq. Case: Correct prediction(e.g., 99.9% of the time)
© 2006 Vimal Reddy
9
NC STATE UNIVERSITYASPLOS-XII
fault
Prediction-based PRT
PAConfident prediction of A
Rare case: Incorrect prediction(e.g., 0.1% of the time)
A(f)
B(f)
Main Thread
==
undetected
Partial Redundant
Thread
B(p)
© 2006 Vimal Reddy
10
NC STATE UNIVERSITYASPLOS-XII
Prediction-based PRT
PAA(f)
B(f)
Main Thread
• Confident predictions are good proxies for redundant execution
• Predictions break thread inter-dependence
• Near-100% fault coverage
Partial Redundant
Thread
B(p)
© 2006 Vimal Reddy
11
NC STATE UNIVERSITYASPLOS-XII
Relaxing checking constraints
Partial Redundant Thread
Main Thread
- S(p) and T(p) are removableS(f)
T(f)
A(f)
B(f)
PA
B(p)
S(p)
T(p)
© 2006 Vimal Reddy
12
NC STATE UNIVERSITYASPLOS-XII
Relaxing checking constraints
Partial Redundant Thread
Main Thread
- S(p) and T(p) are removable
- Such removal can shorten partial thread significantly
S(f)
T(f)
A(f)
B(f)
PA
B(p)
S(p)
T(p)
How to check S(f) and T(f)?
Checks both S(f) and T(f)
- But, no predictions to replace them
fault
!=detected
fault
© 2006 Vimal Reddy
13
NC STATE UNIVERSITYASPLOS-XII
PRT SpectrumPARTIAL REDUNDANT THREADING (PRT) SPECTRUM
ConfidentPredictions
PartialDuplication
PartialDuplication
&Confident
Predictions
EX: Opportunistic EX: Slipstream EX: ReStore
• Case study: PRT on Slipstream
M. Gomaa T. N. Vijaykumar
ISCA 2005
Z. PurserK. Sundermoorthy E. Rotenberg
MICRO 2000 ASPLOS 2000
N. Wang S. J. Patel
DSN 2004
© 2006 Vimal Reddy
14
NC STATE UNIVERSITYASPLOS-XII
Slipstream Overview
DelayBuffer
Branch + Valueoutcomes
verifySMT
Processor
VerifiedArchitectural
State
R-stream
UnverifiedArchitectural
State
A-stream
Outcome mismatch
copy
restart
© 2006 Vimal Reddy
15
NC STATE UNIVERSITYASPLOS-XII
Slipstream Overview
DelayBuffer
Branch + Valueoutcomes
verify
VerifiedArchitectural
State
R-streamA-stream
UnverifiedArchitectural
State
Slipstream Components
remove
• Detect predictable instructions
• Predict based on repeated detection (confidence)
SMT Processor
monitor
© 2006 Vimal Reddy
16
NC STATE UNIVERSITYASPLOS-XII
Slipstream Overview
DelayBuffer
Branch + Valueoutcomes
verify
VerifiedArchitectural
State
R-streamA-stream
UnverifiedArchitectural
State
Slipstream Components
remove
SMT Processor
Predictions act as proxies
• Detect predictable instructions
• Predict based on repeated detection (confidence)
monitor
© 2006 Vimal Reddy
17
NC STATE UNIVERSITYASPLOS-XII
Slipstream Overview
DelayBuffer
Branch + Valueoutcomes
verify
VerifiedArchitectural
State
R-streamA-stream
UnverifiedArchitectural
State
Slipstream Components
remove
• Confident branches
• Confident dead writes
• Confident silent writes
• Slices whose leaves are any of the above
SMT Processor
Predictions act as proxies
• Detect predictable instructions
• Predict based on repeated detection (confidence)
monitor
© 2006 Vimal Reddy
18
NC STATE UNIVERSITYASPLOS-XII
Confident Branch
detected
!=Correct Branch Prediction (Taken)
Main (R-stream) Partial (A-stream)
(Not Taken)
H(f)
Branch
fault
H(p)
Branch
© 2006 Vimal Reddy
19
NC STATE UNIVERSITYASPLOS-XII
Confident BranchMain (R-stream) Partial (A-stream)
(Not Taken)
H(f)
Branch
fault
Incorrect Branch Prediction (Not Taken)
==
undetected
© 2006 Vimal Reddy
20
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
fault
DeadWrite
H(f)
not detected, but safe
Correct Prediction(Dead)
DeadWrite
H(p)
Confident Dead Write
© 2006 Vimal Reddy
21
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
fault
DeadWrite
H(f)
Incorrect Prediction(Not Dead)
J(f)
?
==undetected
Confident Dead Write
J(p)
© 2006 Vimal Reddy
22
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
Confident Silent Write/Store
SilentWrite
H(f)
72
SilentWrite
H(p)
72
J(p)J(f)
7272
© 2006 Vimal Reddy
23
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
Confident Silent Write/Store
SilentWrite
H(f)
3
SilentWrite
H(p)
3
J(p)J(f)
33
© 2006 Vimal Reddy
24
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
Confident Silent Write/Store
SilentWrite
H(f)
V
SilentWrite
H(p)
J(p)J(f)
VV
Correct Prediction(Silent)
fault
XVX
detected!=
© 2006 Vimal Reddy
25
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
Confident Silent Write/Store
SilentWrite
H(f)
stale
J(p)J(f)
Incorrect Prediction (Not Silent)
fault
X
== undetected
© 2006 Vimal Reddy
26
NC STATE UNIVERSITYASPLOS-XII
Slipstream Fault Detection Coverage
• We showed confident predictions detect faults– Additional coverage =
Correctly predicted confident instr. + Their backward slices– Mispredictions vulnerable, but rare in Slipstream– Mispredictions + backward slices = only 0.1% instr.
• Hence, non-duplicated instr. well covered
Slipstream has high fault detection coverage (99.9%)
• Prior research: Only duplicated instr. covered
© 2006 Vimal Reddy
27
NC STATE UNIVERSITYASPLOS-XII
Fault Coverage: Detection vs. Recovery
Detection Coverage
• Represents ability to detect faults
• Slipstream: 99.9% of instr.
Recovery Coverage
• Ability to rollback to ‘golden’ state on fault detection
• Slipstream’s recovery coverage?
© 2006 Vimal Reddy
28
NC STATE UNIVERSITYASPLOS-XII
Slipstream Fault RecoveryPartial Redundant
ThreadMain
Threadfault
detectedFlawedRollback(Conventional Slipstream)
CorrectRollback
!=A(f)
B(f)
C(f)
S(f)
T(f)
PA
B(p)
C(p)
© 2006 Vimal Reddy
29
NC STATE UNIVERSITYASPLOS-XII
Reorder Buffer (ROB)
faulthead
commit to arch. state
tail
S(f)
T(f)
A(f)
B(f)
C(f)
PAA(f) !=
FlushRestart
Arch. State Corrupted
Base SlipstreamRecovery
detected
© 2006 Vimal Reddy
30
NC STATE UNIVERSITYASPLOS-XII
Reorder Buffer (ROB)
headcommit to arch. state
tail
S(f)
T(f)
A(f)
B(f)
C(f)
PAA(f) !=
Flush to ROB headRestart
fault
Enhancement:ROB-head Recovery
detected
© 2006 Vimal Reddy
31
NC STATE UNIVERSITYASPLOS-XII
Reorder Buffer (ROB)
headcommit to arch. state
tail
S(f)
T(f)
A(f)
B(f)
C(f)
PAA(f) !=
Flush to ROB head
fault
Enhancement:ROB-head Recovery
detected
Arch. State Corrupted
Limited rollback distance: • R-stream retires quickly – accelerated by leading A-stream
© 2006 Vimal Reddy
32
NC STATE UNIVERSITYASPLOS-XII
Reorder Buffer (ROB)
headcommit to arch. state
tail
S(f)
T(f)
A(f)
B(f)
C(f)
PAA(f) !=
Enhancement:ROB occupancy management
R(f)
P(f)
Threshold
• Delay in retirement, hence performance hit
Increased rollback distance
detected
© 2006 Vimal Reddy
33
NC STATE UNIVERSITYASPLOS-XII
Reorder Buffer (ROB)
headcommit to arch. state
tail
S(f)
T(f)
A(f)
B(f)
C(f)
PAA(f) !=
fault
Enhancement:History Buffer
History Buffer
headtail
Undo changes
Performance friendly: R-stream mispredictions are rare
© 2006 Vimal Reddy
34
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
Indirect Check of Silent Write/Store
SilentWrite
H(f)
V
J(p)J(f)
Correct Prediction(Silent)
fault
detected !=
X
Base SlipstreamRecovery
© 2006 Vimal Reddy
35
NC STATE UNIVERSITYASPLOS-XII
Main (R-stream) Partial (A-stream)
Direct Check of Silent Write/Store
SilentWrite
H(f)
V
J(p)J(f)
Correct Prediction(Silent)
fault
X
existing value
new value!= V
Direct Check
Base SlipstreamRecovery
detected
Need Enhanced Recovery
© 2006 Vimal Reddy
36
NC STATE UNIVERSITYASPLOS-XII
Novel Framework to Analyze Coverage
• Each instr. considered “candidate faulty”
• Coverage = # of instr. checked before committal to arch. state
• Mispredicted instr. and backward slices marked unchecked
© 2006 Vimal Reddy
37
NC STATE UNIVERSITYASPLOS-XII
Prior Work New Analysis
Checkers
Non-checkersCoverageAnalysis
A
R
Dup
A
R
Dup
A
R
Dup
R
Conf. Dead
A pred.
R
Conf. Br.
R
Conf. SW
R
NonDup
R
NonDup
.R
Conf. SW
A pred
© 2006 Vimal Reddy
38
NC STATE UNIVERSITYASPLOS-XII
Logically masked Masked?
CoverageAnalysis
Rollback point
A
R
Dup
A
R
Dup
A
R
Dup
R
Conf. Dead
A pred.
R
Conf. Br.
R
Conf. SW
R
NonDup
R
NonDup
Masked?
.R
Conf. SW
A pred
© 2006 Vimal Reddy
39
NC STATE UNIVERSITYASPLOS-XII
Clarification
• Analysis framework is a coverage measurement tool
• Not in the actual hardware
© 2006 Vimal Reddy
40
NC STATE UNIVERSITYASPLOS-XII
Results: Microarchitecture models
L1 I & D caches
64KB, 4-way, 64B line, LRU,L1hit = 1 cycle, L1miss/L2hit = 10 cycles
L2 unified cache
1MB, 8-way, 64B line, LRU,L1miss/L2miss = 100 cycles
superscalar core
dispatch/issue/retire bandwidth: 8 (4)reorder buffer (ROB): 256 (128)load/store queue: 64 (32)issue queue: 64 (32)cache ports (read/write): 4 (2)
© 2006 Vimal Reddy
41
NC STATE UNIVERSITYASPLOS-XII
Breakdown of Instructions
SPEC2K Integer Benchmarks
0
10
20
30
40
50
60
70
80
90
100
bzip gap gcc gzip mcf parser perl twolf vortex vpr
% o
f to
tal i
nst
ruct
ion
s
Non-Dup
Dup
AVG
65% duplicated
35% non-duplicated
> 50% non-duplicated on some bench.
© 2006 Vimal Reddy
42
NC STATE UNIVERSITYASPLOS-XII
Breakdown of Instructions
0
10
20
30
40
50
60
70
80
90
100
bzip gap gcc gzip mcf parser perl twolf vortex vpr
% o
f to
tal i
nst
ruct
ion
s
bs_Conf_Pred
Conf_Pred
Dup
AVG
23% confident predictions
12% backward slices of confident predictions
See paper for more detailed breakdown
© 2006 Vimal Reddy
43
NC STATE UNIVERSITYASPLOS-XII
Fault Coverage
0
10
20
30
40
50
60
70
80
90
100
PriorWork
Slip Slip +Direct
RH0 RH32 RH48 HB16 HB32
% f
ault
co
vera
ge
otherbs_SWbs_SSbs_BSWSSDBDup
Slip + Direct +Recovery Enhancements
Prior Work (65%): Only dup. instr.
New result (78%): Correct confidentpredictions covered
ROB-head recovery improves with occupancy threshold: 95% to 98%
Direct silent write/ store checks improve coverage(88%)
65%
78%
88%
95% 97% 98% 98% 99%
History buffer schemes provide high coverage (98% to 99%)
© 2006 Vimal Reddy
44
NC STATE UNIVERSITYASPLOS-XII
Slipstream Performance (SMT 8-wide)
Average slowdown: Full redundant execution:14%
Slipstream:1.3%0
1
2
3
4
bzip
gap
gcc
gzip
mcf
pars
er perl
twolf
vorte
xvp
rAvg
IPC
Single
Full
Slip
© 2006 Vimal Reddy
45
NC STATE UNIVERSITYASPLOS-XII
Performance Impact of Enhanced Recovery
ROB-occupancy management delays retirement, causes slowdown(gradual decrease from RH0 to RH48)
History buffer approach is performance friendly (negligible slowdown)
0
1
2
3
4
bzip
gap
gcc
gzip
mcf
pars
er perl
twolf
vorte
xvp
rAvg
IPC
SlipSlip:RH0Slip:RH32Slip:RH48Slip:HB16Slip:HB32