A Memory Consistency Model For RISC-V...T0 T1 T2 T3 st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à...

Preview:

Citation preview

AMemoryConsistencyModelForRISC-VFormallyEvaluatedwithTriCheck

CarolineTrippelPrincetonUniversityNovember29,2016

CarolineTrippel,Yatin Manerkar,DanielLustig,MichaelPellauer,andMargaretMartonosi.“TriCheck:MemoryModelVerificationattheTrisectionSoftware,Hardware,andISA”.In ProceedingsoftheTwenty-SecondInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems (ASPLOS'17).

RoleoftheInstructionSetArchitecture(ISA)

Software/HLL

HardwareISA

Weak PPO(e.g.,ARM,POWER)

More orderingprimitives(e.g.,fences/barriers)insertedbycompiler

• Introducedin1964byIBM• 1setofsoftware• >1hardwareimplementations

• Definitivespec.ofhardwareasseenbysoftware:

• Specificationofwhathardwaremustimplement

• Targetforcompilertranslation

Software/HLL

Hardware

ISA

Strong PPO(e.g.,SC,TSO)

Fewer orderingprimitives(e.g.,fences/barriers)insertedbycompiler

OurWork:MemoryConsistencyModelVerification

Software/HLLMemoryModel

ISAMemoryModel

HardwareMemoryModel

Compilation

MicroarchitecturalImplementation PipeCheck [Lustig etal.MICRO-47]CCICheck [Manerkar etal.MICRO-48]

COATCheck [Lustig etal.ASPLOS‘16]

ArMOR [Lustig etal.ISCA‘15]

OperatingSystem

TriCheck [Trippeletal.ASPLOS‘17]

MemoryModelsBugsObservedinPractice

ARMRead-after-ReadHazard[Alglave etal.TOPLAS‘14]• AmbiguousISAspec.regardingsame-addressLdàLd ordering

• ARMcompilersdidnotinsertsynchronizationprimitives(e.g.,fences/barriers)• SomeARMimplementationsrelaxedsame-addressLdàLd ordering(e.g.,Cortex-A9,Snapdragon805)

• C/C++atomicsrequiresame-addressLdàLd ordering• ARMissuederrata1:Rewritecompilerstoinsertfences(withperformancepenalties)

We’veidentifiedandcharacterizedflawsinthecurrentRISC-Vmemorymodel(i.e.,thememorymodeldefinedinthecurrentmanual)[Trippeletal.ASPLOS‘17]

1ARM.Cortex-A9MPCore,programmeradvicenotice,read-after-readhazards.ARMReference761319.,2011.http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf.

Notethatthemodificationstofixtheseissueswillbemostlycompatiblewithcurrentimplementations.

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

SequentialConsistency

• Memory models specify the allowed behavior of a multithreaded program executing with shared memory

• First defined by [Lamport 1979], execution is the same as if:(R1) Memory ops of each processor appear in program order(R2) Memory ops of all processors were executed in some global sequential order

Thread 0x=1y=1

Thread 1r1=yr2=x

x=1y=1r1=yr2=x

x=1r1=yy=1r2=x

x=1r1=yr2=xy=1

r1=yr2=xx=1y=1

r1=yx=1r2=xy=1

r1=yx=1y=1r2=x

Program Legal Executions

TwoCategoriesofMemoryModelRelaxation

PreservedProgramOrder:DefinesprogramorderingsthathardwaremustpreservebydefaultStoreAtomicity:Definesorderinwhichstoresbecomevisibletocores• Multiple-copyatomic:

• Allcoresseestoresimultaneously• Read-Own-Write-Early-multiple-copyatomic:

• Storingcorecanreaditsownstorebeforeothercores• Storesmadevisibletoallremotecoressimultaneously

• Non-multiple-copyatomic:• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers

E.g.,monolithicmemory

E.g.,privatestorebuffer

E.g.,sharedstorebuffer

RISC-VProposedPreservedProgramOrderandStoreAtomicityPreservedProgramOrder:

StoreAtomicity:Non-multiple-copyatomic:

• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers

EffectsofNon-Multiple-CopyAtomicStores

Initial conditions: x=0, y=0 T0 T1 T2 T3

st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]F R, R F R, R

ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

ThisoutcomecorrespondstothecaseinwhichthestoresonthreadsT0andT1arrivetothreadsT2andT3indifferentorders

L1$

L1$

WhyAllowNon-Multiple-CopyAtomicStores?

• CommercialISAsallownon-multiple-copyatomicstores(e.g.ARM,POWER)

• RISC-VisintendedtobeintegratedwithothervendorISAs• Potentialdeploymentinnon-multiple-copyatomicmemorysystems• Ifsharingmemorysystem,awarenessthatstoresmaybeobservedinordersthatdifferfromothercores

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

FencestoRestoreMultiple-CopyAtomicity

Initial conditions: x=0, y=0 T0 T1 T2 T3

st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]pscF RW, RW pscF RW, RW

ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Predecessor-/Successor- CumulativeFence:NecessarytoRestoreSCforNon-Multiple-CopyAtomicMemorySystems

OtherFences/Barriers/OrderingPrimitives

• BaselineMemoryModel• PPOrequiressame-addressR-Rordertobemaintained• PPOrequiresordertobemaintainedbetweenmostdependentinstructions• Predecessor-/Successor-CumulativeFRW,RW;FIO,IO;FIORW,IORW

• Baseline+AtomicsExtension• Predecessor-CumulativeFRW,W

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

TriCheck comparesHLLoutcomestoISA-leveloutcomes

foraspectrumoflegalISAmicroarchitectures.

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

ISADOESNOTALLOWoutcomesprohibitedbytheISA

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

ISAALLOWSoutcomesprohibitedbytheISA

RISC-VBase:LackofCumulativeFences

050100150200250

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours

wrc

RISC-VBaseline(Base)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe

releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite

• BaseRISC-VISAlackscumulativefences• Minimally,theISArequiresaPredecessor-/SuccessorCumulativeFRW,RW• Cannotfixbugsby modifyingcompilercurrently

OurcurrentRISC-VproposalrequiresonlyaP-/S-CumulativeFRW,RWintheRISC-VBaseISA,andincludesaweakerP-CumulativeFRW,WFenceintheBase+Atomics extension.

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

On-GoingWork&Conclusions

• WehaveformulatedanEnglishlanguagediff.ofthecurrentspec.withourproposedchanges

• CurrentlyweareconstructingaformalmodelinHerd[Alglave etal.,TOPLAS‘14]ofourproposedmemorymodelmodifications

• Memorymodeldesignchoicesarecomplicatedandinvolvereasoningaboutthesubtleinterplaybetweenmanydiversefeatures

• DefininganISAspecificationinlightoftheevaluationofasinglemicroarchitectureisnotsufficient

• TriCheck isgeneralizabletoanyISA anduncovered/quantifiedflawsintheRISC-Vmemorymode.

ctrippel@princeton.eduhttp://check.cs.princeton.edu/

21

RISC-VBase+A:LackofTransitiveReleases

• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe

releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite

0

50

100

150

200

250

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours

wrc

RISC-VBaseline+Atomics(Base+A)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

• Base+A RISC-VISAlackstransitivereleases• i.e.,RISC-VacquiresdonotsynchronizewithRISC-VreleasesasrequiredbyC/C++• AMO.rl andstrongerAMO.aq.rl arebothinsufficeint• Cannotfixbugsby modifyingcompiler

• Oursolution: redefinereleaseoperationsintheBase+A RISC-VISAtobetransitive

0153045607590

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours riscv-curr riscv-ours

mp sb

RISC-VBaseline+Atomics(Base+A)

TestVariations

Bugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

RISC-VBase+A:NoRoach-MotelMovementforSCAtomics• Roach-motelmovement=expansionofacquire-release

criticalsection• C++SCloadhaveC++Acquiresemantics• C++SCstoreshaveC++Releasesemantics

• RISC-VSCloadsandstoresrequirebothaq andrl bitssetonAMOs• Operationhasacquireandreleasesemantics• Prohibitsroach-motelmovement

• Oursolution: addansc bitforimplementingAMO.aq.sc andAMO.rl.sc instructionswhicharecapableofimplementingC/C++SCloadsandstores

RISC-VBase:SameAddressLdàLd Re-Ordering

• C/C++forbidssame-addressLdàLd reordering• BugsalwayswhenC/C++loadsaremappedtoregularRISC-Vloads

0153045607590

WR rWRrWMrMMnWRnMM A9 WR rWRrWMrMMnWRnMM A9

riscv-curr riscv-ours

corr

RISC-VBaseline(Base)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

Initial conditions: x=0, y=0

T0 T1

a: sw x1, (x5) c: lw x3, (x5)

b: sw x2, (x5) d: lw x4, (x5)

Forbidden HLL Outcome: x1=1, x2=2, x3=2, x4=1• BaseRISC-VISAincludesFR,R• Possibletofixbugsbymodifyingcompilerwithpotentialperformancepenalty• 20.3%preliminaryestimateoffenceinsertionperformancepenaltyforARM

• Oursolution: modifyBaseRISC-Vmemorymodeltorequiresame-addressLdàLd ordering

OurcurrentRISC-Vproposalelimites FR,RfromtheRISC-VBaseISA,andrequireshardwaretoenforcesame-addressLdàLdorderbydefault.

Re-orderingDependentOperations• RISC-Vdoesnotrequireorderingfordependentinstructions• ManycommercialISAs– x86,ARM,Power– respectdependencies

• Canalsobeusedaslightweightsynchronization

• Explicitsynchronization/fencesneededwhendependencyorderingisrequiredbutnotenforcedbydefault,e.g.,Linux

• Macroread_barrier_depends()optionallyinsertsabarrier• InsertsafenceforAlpha,whichdoesnotrespectdependencies1

• InsertsnothingforRISC-V,whichdoesnotrespectdependencies2

• Oursolution:modifyBaseRISC-Vmemorymodeltorequirethepreservationofdependencyorderings.

1LinusTorvaldsetal.Linuxkernel,2016.https://github.com/torvalds/linux/blob/master/arch/alpha/include/asm/barrier.h2RISC-VFoundation.RISC-VportofLinuxkernel,2016.https://github.com/riscv/riscv-linux/blob/master/rch/riscv/include/asm/barrier.h

ctrippel@princeton.eduhttp://check.cs.princeton.edu/

26

Recommended