Upload
doanh
View
227
Download
0
Embed Size (px)
Citation preview
CADRE: Cycle-Accurate Deterministic Replay for Hardware DebuggingReplay for Hardware Debugging
Smruti R. SarangiB i L G kBrian L. Greskamp
Josep Torrellas
University of Illinois Urbana Champaign
http://iacoma.cs.uiuc.edu
MotivationMotivation
50-70% effort spent on verification1-2 year verification timesMore features on-chipVerification speed not keeping up with complexitySome bugs inevitably slip throughg y p g
Pentium fdiv bugPrefetching bug in Pentium 4 XeonIBM G-3 frequency bug
Smruti R. Sarangi 2
Error Rate vs TimeError Rate vs Timed
s de
tect
ed
8 weeks
% o
f bug
s%
Reduced time to debug Vital ingredient of profitability
Smruti R. Sarangi 3
profitability
OutlineOutline
Problems in Debugginggg gSources of Non-DeterminismHandling Non-Determinism in Busesg
CADRE ArchitectureEvaluationEvaluation
Space OverheadPerformance OverheadPerformance Overhead
Smruti R. Sarangi 4
Design BugsAn example of a design bug in IBM G3
Power manager shuts down L1Power manager shuts down L1AND L1 is waiting for dataAND L2 is being invalidatedAND L2 is being invalidated
All the L2 lines might not be invalidated
Two FeaturesTwo FeaturesInfrequent conditionsL d t ti l t D t ti bLarge detection latency ─ Data corruption bugs
are detected only after an observable event: program crash HW hang wrong output etc
Smruti R. Sarangi 5
program crash, HW hang, wrong output, etc.
Problems in Debugging HardwareProblems in Debugging Hardware
Infrequent conditions RTL simulators are very slow, roughly 30 cyc/sNeed to test as many paths as possibleAt breakpoints transfer state to RTL simulatorProbably want to put it on the field and send
bug reports backbug reports backLarge detection latency
Large debugging window & modest storageLarge debugging window & modest storageSome bugs are NOT reproducible
Smruti R. Sarangi 6
Existing Debugging FrameworksExisting Debugging Frameworks
Pentium M debugging framework GolanPentium-M debugging framework – GolanLog all the signals at the pinsReplay themReplay them
DisadvantagesExpensive pin snooping electronicsExpensive pin snooping electronicsFor very high speed buses
Hard to snoop anymorep yLog signals inside the processor Extra pins required to send data to stable storage
Smruti R. Sarangi 7
CADRELog all updates
CADRE
DIMMs CheckpointDIMMs Checkpoint
CMP chipMemory
Controller (MCH)
IOController
(ICH)
To IO devices
Agent: CMP MCH Each agent has its own clockGolan CADRE
Agent: CMP, MCH, … . Each agent has its own clockAn input/output is deterministic if it is always observed at the same clock cycle w.r.t the agent. An agent is deterministic if deterministic inputs imply deterministic
Smruti R. Sarangi 8
age s de e s c de e s c pu s p y de e s coutputs
Ideal Hardware DebuggerIdeal Hardware DebuggerHigh speed execution till the “buggy point”Minimal storage & large debugging windowMinimal storage & large debugging windowExecutions are completely reproducible
RTL G l CADRERTL Simulator
Golan CADRE
Speed Very Low High HighSpeed Very Low High HighStorage Very High High LowDebugging Low Medium HighDebugging Window
Low Medium High
Reproducibility High High High
Smruti R. Sarangi 9
OutlineOutline
Problems in Debugginggg gSources of Non-DeterminismHandling Non-Determinism in Busesg
CADRE ArchitectureEvaluationEvaluation
Space OverheadPerformance OverheadPerformance Overhead
Smruti R. Sarangi 10
Sources of Non-Determinism
DIMM
Non-deterministic message delay
DIMMs
CMP chipMemory
Controller (MCH)
IOController
(ICH)
To IO devices
IO and Interrupts
Power/Thermal EventsVoltage Freq. Scaling
Soft ErrorsRefresh/Scrubbing
Smruti R. Sarangi 11
Non-Determinism in BusesNon Determinism in Buses
Transmitter Receiver Bus Interface
Source Synchronous BusReceiver
Data
ClockPLL FIFO Queue Clock
Data
PLLPLL O Queue PLL
Temperature variation
Power supply noiseTemperature variation
Power supply noiseTemperature variation
pInter-symbol interferenceProcess variationCrosstalk
The probability of non-determinism is very high for buses in the future.
Smruti R. Sarangi 12
Enforcing DeterminismEnforcing DeterminismCPU
Design DeterministicallyProperly initialize all elements
DETRST instruction (to set a deterministic state)DETRST instruction (to set a deterministic state)Log all exceptions, power/thermal events
MemoryMemoryStart every checkpoint interval with a refreshCheckpoint scrub registerCheckpoint scrub register
IOLog all the data along with cycle counts
Smruti R. Sarangi 13
Log all the data along with cycle counts
Enforcing Determinism in BusesEnforcing Determinism in Buses
Transmitter
xT
Receiverece e
DeterministicProcessingNon-deterministic
P i
yR
xT
θ1
ProcessingUncertainty
IntervalW=θ2-θ1+1
Transmitter Receiver
log (W)θ2
xT zR=xT+θ2
log2(W)
Optimal
Smruti R. Sarangi 14
T R T 2
Schemes to Enforce DeterminismSchemes to Enforce Determinism
Assume fTransmitter= fReceiverTransmitter Receiver
Receiver needs to compute: zR = xT + θ2
Trivial Solution: Send xT along with every message. Better Solution: After first message is delivered deterministically
Transmitter sends a message every cycleTransmitter sends a message every cycleReceiver then processes the messages at the rate of transmissionDisadvantages
This scheme requires a line between all pairs of nodesA receiver has to be aware of all the transmitters
Smruti R. Sarangi 15
Offset SchemeOffset SchemeyR Receiver
Transmittermin(xT) max(xT)
θ1θ2
θ θθ2-θ1
W W W θ θ +1
xTθ2 zR
W W W=θ2-θ1+1
Case 2xT
Case 1xTρDisjoint
ρ
(yR - θ1 - ρ) is in the same window as xTx = ⎣(y θ ρ)/W⎦*W + ρ
Smruti R. Sarangi 16
xT = ⎣(yR - θ1 – ρ)/W⎦*W + ρ
Implementation of Offset Scheme
θ
Mod-W Counter Domain Counter
-ρ+
-θ1
yR
Cor
e
yR - θ1 - ρ
Circular Queue+θ2
ecei
ver
ρ+
zRData
=
Re
Bus Interface
z = ⎣(y - θ – ρ)/W⎦*W + ρ + θSmruti R. Sarangi 17
zR = ⎣(yR - θ1 – ρ)/W⎦ W + ρ + θ2
Architecture Memory y Lo
g
ArchitectureReg. Ckpt
CPU Log
CMP Synchronizer
y
Mem
ory
CPU
er
Synchronizer
g
ynch
roni
ze
Synchronizer CADRE Controller
Memory Controller
emor
y Lo
gC
ontro
ller
Synchronous Bus
IO Log
PCIdevices
Sy Me C
yIO Controller Hub
devices
Asynchronous BusSource Sync. Bus
B li HW f d t i i HW for checkpointing
Smruti R. Sarangi 18
Baseline HW for determinism HW for checkpointing
OutlineOutline
Problems in Debugginggg gSources of Non-DeterminismHandling Non-Determinism in Busesg
CADRE ArchitectureEvaluationEvaluation
Space OverheadPerformance OverheadPerformance Overhead
Smruti R. Sarangi 19
Evaluation – CADREEvaluation CADREConfiguration
2 8 GHz dual proc Pentium 4 Xeon server with2.8 GHz dual proc. Pentium-4 Xeon server with hyper-threadingIntel E7525 chipset with 800 MHz FSBWe estimate the overheads of a 4-processor CMPBenchmarks – Spec: Int, FP, JBB, OMP and Web
To estimate overheadsTo estimate overheads Reconfigure MCH and ICHUse memory mapped IO to access PCIX registersy pp gSee our paper in WARP ’06 (workshop along with ISCA ’06)
Smruti R. Sarangi 20
Space OverheadSpace Overhead
Space overhead of mem. checkpointingp p gSafetyNet : 50MB/s/procRevive : 38-120 MB/s/procpaverage IO bandwidth – 100 kB/s
Design PointDesign Point1 sec checkpoint interval 50 MB * 4 = 200 MB for memory ckpt50 MB 4 = 200 MB for memory ckpt.4 MB log for IO traffic
Smruti R. Sarangi 21
Performance OverheadPerformance Overhead
Periodic cache flushing overhead for proc. checkpointing.
negligible ─ once per second
For every message assume worst case delay increase bus latency y y
assume 1-4 (MCH) cycles of non-determinismadd an extra 1-4 cycles of bus latencyy yIncrease programmable read pointer delay,
clock guard band, RAS-CAS delay
Smruti R. Sarangi 22
Performance - IIPerformance II
With a 1 sec. ckpt. interval, CADRE has a 1% slowdown and requires 200 MB of storageAs compared to 64 ms for Golan for same overheads
Smruti R. Sarangi 23
As compared to 64 ms for Golan for same overheads
Using CADREUsing CADRE
Hardware debuggingRecord and replay executionsTransfer state to an RTL simulatorUse scan chains to observe certain latchesUse CADRE in deployed systems in the field
Send a hardware core dump back to the vendorSend a hardware core dump back to the vendor
Lock-stepped execution in TMR systemsSoft are deb gging CADRE hard areSoftware debugging – CADRE hardware guarantees determinism
Smruti R. Sarangi 24