Modeling Soft-Error Propagation in...

ModelingSoft-ErrorPropagationinProgramsGuanpeng (Justin)LiKarthik Pattabiraman

SivaHariMichaelSullivanTimothyTsai

Motivation:SoftErrors

= 0001 = 0101

Softerrorsbecomingmorecommoninprocessors

[1] http://aviral.lab.asu.edu/soft-error-resilience/

SilentDataCorruption(SDC)

NormalExecution

ErrorPropagation

Benign

IncorrectOutput

CorrectOutput

Exceptions,NoOutput

AmazonS3Incident

SoftwareSolutions

Device/CircuitLevel

ArchitecturalLevel

OperatingSystemLevel

ApplicationLevel

ImpactfulErrors

Protectio

nOverhead

SoftError

Increasing

Softwareprotection techniquesaremoreflexibleandcost-effective!

SelectiveInstructionDuplication

“TheGoldenCurve”

SDCCoverage

ProtectionOverhead

ApplicationSpecific!

*MeasuredinLibquantum,SPEC

InstructionSequence InstructionDuplication

Instruction:SDCRate=X%Overhead=Y%

SelectedInstructionsforGivenTargetSDCCoverage

AKnapsackProblem

DevelopingFault-TolerantApplications

DevelopmentofApplication EvaluateProgramSDCRate

SelectiveProtection

Acceptable

NewRelease

MeasureInstruction SDCRates

1. Thousandsoffaultinjectionsneedtobedone2.Repeateverytimecodeismodified

EstimatingSDCRate

OurGoal

Accuracy

AVF/PVF/ePVF

[MICRO’03,HPCA’10,DSN’16]

SymPLFIED/Relyzer/GangES

[DSN’08,ASPLOS’12,ISCA’14]

Noexistingtechniquemodelserrorpropagationinbothfastandaccurateway!

FastpredictionofSDCwithoutfaultinjection!

Challenges

• TrackingSDCpropagationishard

• Overbillionsofexecutedinstructions

• Everyinstructionmaypropagateerrorswithdifferentprobabilities

• Dynamicnatureofprogramexecution

• Control-flowdivergence

… …

Corruptingsubsequentstates

… …… …… …… …

Trident:KeyInsight

• Errorpropagationscanbedecomposedintomodules,whichcan

beabstractedintoprobabilisticevents

• Decomposition

• Abstraction

Trident:Workflow

SourceCode

ProgramInput

OutputInsn.

Insn.SDCRates

OverallSDCRate

Insn.forPrediction

Profiling Prediction

… …

Trident:OurApproach

• Three-levelmodeling

• Register-communication

• Control-flow

• Memorydependency

Mem.Contl.

$2=LOAD0x04

$3=ADD$2,4

CMP$4,$3,4

BR$4,BB5,BB10

$5=MUL$6,16

… …

...=LOAD0x08

BB11STORE…,0x08

fs =100%*100%*25%*100%=25%

… …BB11STORE…,0x08

$2=LOAD0x04

$3=ADD$2,4

CMP$4,$3,4

BR$4,BB5,BB10

$5=MUL$6,16

… …

...=LOAD0x08

<100%>

PropagationprobabilitywithinBB4?

Mem.Contl.

Trident:RegisterCommn.

Trident:Control-Flow

$2=LOAD0x04

$3=ADD$2,4

CMP$4,$3,4

BR$4,BB5,BB10

$5=MUL$6,16

… …

...=LOAD0x08

CorruptionprobabilityofSTORE?

80% 20%

30% 70%

<100%>

<100%>=

*Fornon-loop-terminatingbranches

Mem.Contl.

Contl.

STOREexec.prob.F1*T2

BRdom.prob.F1

Corrupted

Trident:Memory-Dependency

$2=LOAD0x04

$3=ADD$2,4

CMP$4,$3,4

BR$4,BB5,BB10

$5=MUL$6,16

… …

...=LOAD0x08

DependentLOAD&STORE

80% 20%

30% 70%

<100%>

Mem.Contl.

P(In) = fS (In)* fC (In2)* fS (In3)* fC (In4) … …

*ncorrespondstotheindexofdynamicinstructions

ExperimentalSetup

BenchmarkApplication Domains

• FaultModel• Singlebit-flipinjections– accurate[DSN’17]

• Randominsn.– oneperprogramexecution

• Benchmarks• 11open-sourcebenchmarksfromvariousdomains

• Comparisonwithfaultinjection• Accuracy

• Speed(wallclocktime)

ExperimentalMethodology

Mem.Contl.

fS+fCTwoSimplerModelsforComparison

GoalistopredictSDCrateasperfaultinjection

[1]LLVMFaultInjector[DSN’14]

Reminder:

• Baseline:FaultinjectionderivedbyLLFI[1]

• ThecloserSDCratetofaultinjection, thebetterprediction

• Createdtwosimplermodels

• Accuracyofeachsub-model

• Asproxytopriorwork

Evaluation:Accuracy

• MeanAbsoluteError• Trident:4.75%• SimplerModels:15.13%and19.13%

• t-TestonIndividualInstructions• Trident:8outof11arestatisticallyindistinguishable• SimplerModels(fS andfS+fC):Only2and4

ProgramSDCRate;3,000Sampled Instructions;ErrorBar:+/-0.07%~+/-1.76%at95%ConfidenceInterval

Trident isclosetofaultinjectionresults,andsignificantlybetterthanthesimplermodels!

3,000randomlysampledinstructionsforfaultinjection

andthemodels

Evaluation:Speed

• Program’sOverallSDCRate:• 6.7xfasterat3,000samples

• Per-InstructionSDCRate:• Onaverage,380xfasterat100samples

perinstruction

• Benchmarks:FItakesnearly100hourswhereasTridenttakes<20mins

Trident isfasterthanfaultinjectionby2ordersofmagnitude!

Wall-Clock TimeofEstimatingProgramSDCRate

UseCase:SelectiveInstructionDuplication

SDCCoverage

ProtectionOverhead

*MeasuredinLibquantum,SPEC

ByFaultInjections

ByTrident

“TheGoldenCurve”

ByfS+fCByfS

SelectiveInstructionDuplication

Recap:

Extension

• Understandhowerrorpropagationisaffectedbymultipleinputs

• ExtensionforboundingSDCratewithmultipleinputs

Session6:ModelingandVerificationWednesday,June27th

“ModelingInput-DependentErrorPropagationinPrograms”

Summary

• Faultinjectionsaretooslowtointegrateintosoftwaredevelopmentcycle

• Trident isbothaccurateandfastinpredictingSDCrates

• Canguideselectiveprotectionofinstructionsinprograms– comparable

tofaultinjectioninaccuracyforfractionofcost

• OpenSource:https://github.com/DependableSystemsLab/Trident

Guanpeng (Justin)LiUniversityofBritishColumbia (UBC)

gpli@ece.ubc.ca21

Modeling Soft-Error Propagation in...

Documents

Interiors karthik enterprises

KARTHIK FULL PRO

Karthik _BKB0909007

Performance Karthik

DerSecureDataContainer(SDC) - uni-stuttgart.de · DerSecureDataContainer(SDC) 111 AnhanddieserDefinitionwirdklar,dassdasVorge-henvonAngryBirdsnichtgegendieChartaverstößt:Der

SDC Publications

Karthik Jayaraman Thesis - UMD

SDC & Dgroups

Karthik SPV

Portfolio l Karthik Mahadevan

2 jazz karthik-k

Evaluation 2011/1 SDC Humanitarian Aid: Emergency Relief · Evaluation 2011/1 SDC Humanitarian Aid: Emergency Relief. Evaluation of SDC Humanitarian Aid: Emergency Relief ... SDC

Karthik Solar Ponds Report

Parvathi karthik

Karthik Apple Nilesh

SUPPLEMENTAL DIGITAL CONTENT (SDC) SDC-METHODS …

Analysis of Goal-directed Manipulation in Clutter using ... · Analysis of Goal-directed Manipulation in Clutter using Scene Graph Belief Propagation Karthik Desingh Anthony Opipari

08.23.2012 - Karthik Muralidharan

Modeling Input-Dependent Error Propagation in Programsblogs.ubc.ca/karthik/files/2018/06/DSN18_vtrident_talk.pdf · Evaluation: Performance • Wall -Clock Time • Sample 3,000 faults

Jazz Overview- Karthik K