26
A Memory Consistency Model For RISC-V Formally Evaluated with TriCheck Caroline Trippel Princeton University November 29, 2016 Caroline Trippel, Yatin Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. “TriCheck: Memory Model Verification at the Trisection Software, Hardware, and ISA”. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17).

A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

AMemoryConsistencyModelForRISC-VFormallyEvaluatedwithTriCheck

CarolineTrippelPrincetonUniversityNovember29,2016

CarolineTrippel,Yatin Manerkar,DanielLustig,MichaelPellauer,andMargaretMartonosi.“TriCheck:MemoryModelVerificationattheTrisectionSoftware,Hardware,andISA”.In ProceedingsoftheTwenty-SecondInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems (ASPLOS'17).

Page 2: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

RoleoftheInstructionSetArchitecture(ISA)

Software/HLL

HardwareISA

Weak PPO(e.g.,ARM,POWER)

More orderingprimitives(e.g.,fences/barriers)insertedbycompiler

• Introducedin1964byIBM• 1setofsoftware• >1hardwareimplementations

• Definitivespec.ofhardwareasseenbysoftware:

• Specificationofwhathardwaremustimplement

• Targetforcompilertranslation

Software/HLL

Hardware

ISA

Strong PPO(e.g.,SC,TSO)

Fewer orderingprimitives(e.g.,fences/barriers)insertedbycompiler

Page 3: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

OurWork:MemoryConsistencyModelVerification

Software/HLLMemoryModel

ISAMemoryModel

HardwareMemoryModel

Compilation

MicroarchitecturalImplementation PipeCheck [Lustig etal.MICRO-47]CCICheck [Manerkar etal.MICRO-48]

COATCheck [Lustig etal.ASPLOS‘16]

ArMOR [Lustig etal.ISCA‘15]

OperatingSystem

TriCheck [Trippeletal.ASPLOS‘17]

Page 4: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

MemoryModelsBugsObservedinPractice

ARMRead-after-ReadHazard[Alglave etal.TOPLAS‘14]• AmbiguousISAspec.regardingsame-addressLdàLd ordering

• ARMcompilersdidnotinsertsynchronizationprimitives(e.g.,fences/barriers)• SomeARMimplementationsrelaxedsame-addressLdàLd ordering(e.g.,Cortex-A9,Snapdragon805)

• C/C++atomicsrequiresame-addressLdàLd ordering• ARMissuederrata1:Rewritecompilerstoinsertfences(withperformancepenalties)

We’veidentifiedandcharacterizedflawsinthecurrentRISC-Vmemorymodel(i.e.,thememorymodeldefinedinthecurrentmanual)[Trippeletal.ASPLOS‘17]

1ARM.Cortex-A9MPCore,programmeradvicenotice,read-after-readhazards.ARMReference761319.,2011.http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf.

Notethatthemodificationstofixtheseissueswillbemostlycompatiblewithcurrentimplementations.

Page 5: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 6: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

SequentialConsistency

• Memory models specify the allowed behavior of a multithreaded program executing with shared memory

• First defined by [Lamport 1979], execution is the same as if:(R1) Memory ops of each processor appear in program order(R2) Memory ops of all processors were executed in some global sequential order

Thread 0x=1y=1

Thread 1r1=yr2=x

x=1y=1r1=yr2=x

x=1r1=yy=1r2=x

x=1r1=yr2=xy=1

r1=yr2=xx=1y=1

r1=yx=1r2=xy=1

r1=yx=1y=1r2=x

Program Legal Executions

Page 7: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

TwoCategoriesofMemoryModelRelaxation

PreservedProgramOrder:DefinesprogramorderingsthathardwaremustpreservebydefaultStoreAtomicity:Definesorderinwhichstoresbecomevisibletocores• Multiple-copyatomic:

• Allcoresseestoresimultaneously• Read-Own-Write-Early-multiple-copyatomic:

• Storingcorecanreaditsownstorebeforeothercores• Storesmadevisibletoallremotecoressimultaneously

• Non-multiple-copyatomic:• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers

E.g.,monolithicmemory

E.g.,privatestorebuffer

E.g.,sharedstorebuffer

Page 8: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

RISC-VProposedPreservedProgramOrderandStoreAtomicityPreservedProgramOrder:

StoreAtomicity:Non-multiple-copyatomic:

• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers

Page 9: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

EffectsofNon-Multiple-CopyAtomicStores

Initial conditions: x=0, y=0 T0 T1 T2 T3

st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]F R, R F R, R

ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

ThisoutcomecorrespondstothecaseinwhichthestoresonthreadsT0andT1arrivetothreadsT2andT3indifferentorders

L1$

L1$

Page 10: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

WhyAllowNon-Multiple-CopyAtomicStores?

• CommercialISAsallownon-multiple-copyatomicstores(e.g.ARM,POWER)

• RISC-VisintendedtobeintegratedwithothervendorISAs• Potentialdeploymentinnon-multiple-copyatomicmemorysystems• Ifsharingmemorysystem,awarenessthatstoresmaybeobservedinordersthatdifferfromothercores

Page 11: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 12: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

FencestoRestoreMultiple-CopyAtomicity

Initial conditions: x=0, y=0 T0 T1 T2 T3

st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]pscF RW, RW pscF RW, RW

ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Predecessor-/Successor- CumulativeFence:NecessarytoRestoreSCforNon-Multiple-CopyAtomicMemorySystems

Page 13: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

OtherFences/Barriers/OrderingPrimitives

• BaselineMemoryModel• PPOrequiressame-addressR-Rordertobemaintained• PPOrequiresordertobemaintainedbetweenmostdependentinstructions• Predecessor-/Successor-CumulativeFRW,RW;FIO,IO;FIORW,IORW

• Baseline+AtomicsExtension• Predecessor-CumulativeFRW,W

Page 14: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 15: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

TriCheck comparesHLLoutcomestoISA-leveloutcomes

foraspectrumoflegalISAmicroarchitectures.

Page 16: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

ISADOESNOTALLOWoutcomesprohibitedbytheISA

Page 17: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

ISAALLOWSoutcomesprohibitedbytheISA

Page 18: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

RISC-VBase:LackofCumulativeFences

050100150200250

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours

wrc

RISC-VBaseline(Base)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe

releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite

• BaseRISC-VISAlackscumulativefences• Minimally,theISArequiresaPredecessor-/SuccessorCumulativeFRW,RW• Cannotfixbugsby modifyingcompilercurrently

OurcurrentRISC-VproposalrequiresonlyaP-/S-CumulativeFRW,RWintheRISC-VBaseISA,andincludesaweakerP-CumulativeFRW,WFenceintheBase+Atomics extension.

Page 19: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 20: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

On-GoingWork&Conclusions

• WehaveformulatedanEnglishlanguagediff.ofthecurrentspec.withourproposedchanges

• CurrentlyweareconstructingaformalmodelinHerd[Alglave etal.,TOPLAS‘14]ofourproposedmemorymodelmodifications

• Memorymodeldesignchoicesarecomplicatedandinvolvereasoningaboutthesubtleinterplaybetweenmanydiversefeatures

• DefininganISAspecificationinlightoftheevaluationofasinglemicroarchitectureisnotsufficient

• TriCheck isgeneralizabletoanyISA anduncovered/quantifiedflawsintheRISC-Vmemorymode.

Page 21: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

[email protected]://check.cs.princeton.edu/

21

Page 22: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

RISC-VBase+A:LackofTransitiveReleases

• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe

releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite

0

50

100

150

200

250

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours

wrc

RISC-VBaseline+Atomics(Base+A)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

• Base+A RISC-VISAlackstransitivereleases• i.e.,RISC-VacquiresdonotsynchronizewithRISC-VreleasesasrequiredbyC/C++• AMO.rl andstrongerAMO.aq.rl arebothinsufficeint• Cannotfixbugsby modifyingcompiler

• Oursolution: redefinereleaseoperationsintheBase+A RISC-VISAtobetransitive

Page 23: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

0153045607590

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours riscv-curr riscv-ours

mp sb

RISC-VBaseline+Atomics(Base+A)

TestVariations

Bugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

RISC-VBase+A:NoRoach-MotelMovementforSCAtomics• Roach-motelmovement=expansionofacquire-release

criticalsection• C++SCloadhaveC++Acquiresemantics• C++SCstoreshaveC++Releasesemantics

• RISC-VSCloadsandstoresrequirebothaq andrl bitssetonAMOs• Operationhasacquireandreleasesemantics• Prohibitsroach-motelmovement

• Oursolution: addansc bitforimplementingAMO.aq.sc andAMO.rl.sc instructionswhicharecapableofimplementingC/C++SCloadsandstores

Page 24: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

RISC-VBase:SameAddressLdàLd Re-Ordering

• C/C++forbidssame-addressLdàLd reordering• BugsalwayswhenC/C++loadsaremappedtoregularRISC-Vloads

0153045607590

WR rWRrWMrMMnWRnMM A9 WR rWRrWMrMMnWRnMM A9

riscv-curr riscv-ours

corr

RISC-VBaseline(Base)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

Initial conditions: x=0, y=0

T0 T1

a: sw x1, (x5) c: lw x3, (x5)

b: sw x2, (x5) d: lw x4, (x5)

Forbidden HLL Outcome: x1=1, x2=2, x3=2, x4=1• BaseRISC-VISAincludesFR,R• Possibletofixbugsbymodifyingcompilerwithpotentialperformancepenalty• 20.3%preliminaryestimateoffenceinsertionperformancepenaltyforARM

• Oursolution: modifyBaseRISC-Vmemorymodeltorequiresame-addressLdàLd ordering

OurcurrentRISC-Vproposalelimites FR,RfromtheRISC-VBaseISA,andrequireshardwaretoenforcesame-addressLdàLdorderbydefault.

Page 25: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Re-orderingDependentOperations• RISC-Vdoesnotrequireorderingfordependentinstructions• ManycommercialISAs– x86,ARM,Power– respectdependencies

• Canalsobeusedaslightweightsynchronization

• Explicitsynchronization/fencesneededwhendependencyorderingisrequiredbutnotenforcedbydefault,e.g.,Linux

• Macroread_barrier_depends()optionallyinsertsabarrier• InsertsafenceforAlpha,whichdoesnotrespectdependencies1

• InsertsnothingforRISC-V,whichdoesnotrespectdependencies2

• Oursolution:modifyBaseRISC-Vmemorymodeltorequirethepreservationofdependencyorderings.

1LinusTorvaldsetal.Linuxkernel,2016.https://github.com/torvalds/linux/blob/master/arch/alpha/include/asm/barrier.h2RISC-VFoundation.RISC-VportofLinuxkernel,2016.https://github.com/riscv/riscv-linux/blob/master/rch/riscv/include/asm/barrier.h

Page 26: A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2] F R, R F R, R ld y à [r1] ld x à [r3] Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

[email protected]://check.cs.princeton.edu/

26