30
Cache Coherence Protocols 1 Cache Coherence Cache Coherence Protocols Protocols in in Shared Memory Multiprocessors Shared Memory Multiprocessors Mehmet Şenvar Mehmet Şenvar

Mehmet Senvar - Cache Coherence Protocols

Embed Size (px)

Citation preview

Page 1: Mehmet Senvar - Cache Coherence Protocols

Cache Coherence Protocols 1

Cache Coherence Cache Coherence ProtocolsProtocols

in in Shared Memory MultiprocessorsShared Memory Multiprocessors

Mehmet ŞenvarMehmet Şenvar

Page 2: Mehmet Senvar - Cache Coherence Protocols

2Cache Coherence Protocols

OutlineOutline IntroductionIntroduction Background InformationBackground Information

The cache coherence problemThe cache coherence problem Cahce Enforcement StrategiesCahce Enforcement Strategies Consistency modelsConsistency models

Simple SolutionsSimple Solutions Hardware ProtocolsHardware Protocols

Snooping protocolsSnooping protocols Directory-based protocolsDirectory-based protocols

Compiler and Software protocolsCompiler and Software protocols Future work and conclusionsFuture work and conclusions

Page 3: Mehmet Senvar - Cache Coherence Protocols

3Cache Coherence Protocols

The Cache Coherence The Cache Coherence ProblemProblem

Caches allow greater performance by storing Caches allow greater performance by storing frequently used data in faster memoryfrequently used data in faster memory

Since all processors share the same address Since all processors share the same address space, it is possible for more than one space, it is possible for more than one processor to cache an address (or data item) processor to cache an address (or data item) at a timeat a time

If one processor updates the data item If one processor updates the data item without informing the other processor, without informing the other processor, inconsistencies may result and cause inconsistencies may result and cause incorrect executionsincorrect executions

Page 4: Mehmet Senvar - Cache Coherence Protocols

4Cache Coherence Protocols

Cache Coherence Cache Coherence ProblemProblem

Page 5: Mehmet Senvar - Cache Coherence Protocols

5Cache Coherence Protocols

Cache Coherence (cont.)Cache Coherence (cont.) For correct execution, coherence must be For correct execution, coherence must be

enforced between the cachesenforced between the caches Two major factors are:Two major factors are:

performanceperformance implementation costimplementation cost

Four primary design issues are:Four primary design issues are: coherence detection strategycoherence detection strategy coherence enforcement strategycoherence enforcement strategy precision of block-sharing informationprecision of block-sharing information cache block sizecache block size

Page 6: Mehmet Senvar - Cache Coherence Protocols

6Cache Coherence Protocols

Cache Enforcement Cache Enforcement StrategiesStrategies

A cache enforcement strategy is the A cache enforcement strategy is the mechanism which makes caches consistentmechanism which makes caches consistent write-update (WU)write-update (WU) write-invalidate (WI)write-invalidate (WI) hybrid protocols, competitive-update (CU)hybrid protocols, competitive-update (CU)

Performance of WU and WI vary Performance of WU and WI vary depending on the application and the depending on the application and the number of writesnumber of writes

Hybrid protocols switch between WU and Hybrid protocols switch between WU and WI based on the # of writes to a blockWI based on the # of writes to a block

Page 7: Mehmet Senvar - Cache Coherence Protocols

7Cache Coherence Protocols

Consistency ModelsConsistency Models A consistency model defines how the A consistency model defines how the

consistency of data values is consistency of data values is maintainedmaintained

Some consistency models are:Some consistency models are: sequential consistencysequential consistency weak consistencyweak consistency release consistencyrelease consistency

Weak consistency models are more Weak consistency models are more efficient to implement and require efficient to implement and require fewer coherence messagesfewer coherence messages

Page 8: Mehmet Senvar - Cache Coherence Protocols

8Cache Coherence Protocols

Shared Caches (1)Shared Caches (1)Processors share a single cache, essentially puntingthe problem.• Useful for very small machines.• E.g., DPC in the Encore, Alliant FX/8.• Problems are limited cache bandwidth and cache interference• Benefits are fine-grain sharing and prefetch effects

Page 9: Mehmet Senvar - Cache Coherence Protocols

9Cache Coherence Protocols

Non-cacheable Items (2)Non-cacheable Items (2)

Make shared data Make shared data nonnon--cacheablecacheable One of the simplest software One of the simplest software

solutionsolution Also at hardware, make cache Also at hardware, make cache

locations unreachablelocations unreachable

Page 10: Mehmet Senvar - Cache Coherence Protocols

10Cache Coherence Protocols

Broadcast Writes (3)Broadcast Writes (3)

Every cache write request is sent to Every cache write request is sent to all other cachesall other caches

Firstly need to discover whether Firstly need to discover whether each cache hold this dataeach cache hold this data

Other copies are either updated or Other copies are either updated or invalidatedinvalidated

Significant additional memory Significant additional memory transactions occurtransactions occur

Page 11: Mehmet Senvar - Cache Coherence Protocols

11Cache Coherence Protocols

Hardware ProtocolsHardware Protocols

Snoop Bus MechanismSnoop Bus Mechanism Directory Based MethodsDirectory Based Methods

Full DirectoryFull Directory Limited DirectoryLimited Directory Chained DirectoryChained Directory

Page 12: Mehmet Senvar - Cache Coherence Protocols

12Cache Coherence Protocols

Snoop Bus ProtocolSnoop Bus Protocol Snooping protocols rely on a shared bus Snooping protocols rely on a shared bus

between the processors for coherencebetween the processors for coherence On a processor write, the write is passed On a processor write, the write is passed

through the cache to main memory on the busthrough the cache to main memory on the bus Any processor caching the address may update Any processor caching the address may update

or invalidate its cache entry as appropriateor invalidate its cache entry as appropriate Snooping protocols do not scale well beyond Snooping protocols do not scale well beyond

32 processors because of the shared bus32 processors because of the shared bus The choice between WU, WI, and CU is The choice between WU, WI, and CU is

especially important to reduce especially important to reduce communicationcommunication

Page 13: Mehmet Senvar - Cache Coherence Protocols

13Cache Coherence Protocols

MESI (4-state) Invalidation Protocol

Each line in the cache can be in one Each line in the cache can be in one of 4 statesof 4 states Modifed (exclusive) : only in 1 cache, Modifed (exclusive) : only in 1 cache,

modifiedmodified Exclusive (unmodified) : only in 1 cache, Exclusive (unmodified) : only in 1 cache,

unmodifiedunmodified Shared (unmodified) Shared (unmodified) InvalidInvalid

Page 14: Mehmet Senvar - Cache Coherence Protocols

14Cache Coherence Protocols

MESI State Transition MESI State Transition DiagramDiagram

Page 15: Mehmet Senvar - Cache Coherence Protocols

15Cache Coherence Protocols

MESI ExampleMESI Example

Page 16: Mehmet Senvar - Cache Coherence Protocols

16Cache Coherence Protocols

Directory-Based Directory-Based ProtocolsProtocols

Directory-based protocols do not rely on a Directory-based protocols do not rely on a shared bus to exchange coherence shared bus to exchange coherence information (use point-to-point information (use point-to-point connections)connections) more scaleable (can have hundreds of more scaleable (can have hundreds of

processors)processors) each processor can have its own memoryeach processor can have its own memory implement weak consistency for efficiencyimplement weak consistency for efficiency

Page 17: Mehmet Senvar - Cache Coherence Protocols

17Cache Coherence Protocols

Directory-Based Directory-Based Protocols (cont.)Protocols (cont.)

Each node maintains a directory storing Each node maintains a directory storing cache information and memory informationcache information and memory information

A processor communicates with the A processor communicates with the directory to access memorydirectory to access memory if a processor requests a non-local memory page, if a processor requests a non-local memory page,

the directory uses its information to find the pagethe directory uses its information to find the page Then, it uses messages to retrieve the page and Then, it uses messages to retrieve the page and

insure all other processors have consistent info.insure all other processors have consistent info. Since the directory maintains which processors Since the directory maintains which processors

are caching the page, it only needs to send are caching the page, it only needs to send messages to those processorsmessages to those processors

Page 18: Mehmet Senvar - Cache Coherence Protocols

18Cache Coherence Protocols

Directory-Based Directory-Based Protocols (cont.)Protocols (cont.)

Designing a directory requires defining:Designing a directory requires defining: cache block granularitycache block granularity cache controller designcache controller design directory structuredirectory structure

Cache block granularity is the size of the Cache block granularity is the size of the cache and the size of a cache linecache and the size of a cache line CC-NUMA machines have a separate, smaller CC-NUMA machines have a separate, smaller

cache from main memorycache from main memory COMA machines use node’s entire memory as COMA machines use node’s entire memory as

cache for remote pagescache for remote pages Block size affects performance (false sharing)Block size affects performance (false sharing)

Page 19: Mehmet Senvar - Cache Coherence Protocols

19Cache Coherence Protocols

Directory-Based Directory-Based Protocols (cont.)Protocols (cont.)

Cache controller is hardware that Cache controller is hardware that maintains the directory and processes maintains the directory and processes memory requestsmemory requests custom hardwarecustom hardware programmable protocol processorprogrammable protocol processor

The directory structure is how the cache The directory structure is how the cache and memory information is organizedand memory information is organized p+1-bit full directoryp+1-bit full directory linked-list directorieslinked-list directories tagged directoriestagged directories

Page 20: Mehmet Senvar - Cache Coherence Protocols

20Cache Coherence Protocols

Directory ModelsDirectory Models

Full DirectoryFull Directory Link to all caches for all shared Link to all caches for all shared

locationslocations Limited DirectoryLimited Directory

To some caches having shared data, n < To some caches having shared data, n < NN

Chained (linked)DirectoryChained (linked)Directory To one chache, form ths cache to To one chache, form ths cache to

others, single/double linkothers, single/double link

Page 21: Mehmet Senvar - Cache Coherence Protocols

21Cache Coherence Protocols

Directory Sample (full)Directory Sample (full)

Page 22: Mehmet Senvar - Cache Coherence Protocols

22Cache Coherence Protocols

Lock-Based ProtocolsLock-Based Protocols New work that promises to be more New work that promises to be more

scaleable than directory protocolsscaleable than directory protocols Implements scope consistency which is Implements scope consistency which is

similar to lazy release consistencysimilar to lazy release consistency Coherence information exchanged by Coherence information exchanged by

reading and writing notices from the lock reading and writing notices from the lock which protects the shared memorywhich protects the shared memory

Currently, implemented in software similar Currently, implemented in software similar to DSM, but may move to hardware if to DSM, but may move to hardware if performance gains can be realizedperformance gains can be realized

Page 23: Mehmet Senvar - Cache Coherence Protocols

23Cache Coherence Protocols

Software ProtocolsSoftware Protocols Software protocols enforce consistency Software protocols enforce consistency

with limited hardware support by relying with limited hardware support by relying either on the compiler or specialized either on the compiler or specialized software handlerssoftware handlers

Similar to distributed shared memory Similar to distributed shared memory (DSM) systems but at a lower level(DSM) systems but at a lower level sharing usually in blocks not pagessharing usually in blocks not pages needs to be more efficient for better needs to be more efficient for better

performanceperformance architecture support for sharingarchitecture support for sharing

Page 24: Mehmet Senvar - Cache Coherence Protocols

24Cache Coherence Protocols

Classification of Software Classification of Software ProtocolsProtocols

Several criteria distinguish software protocols:Several criteria distinguish software protocols: dynamismdynamism - compile-time or run-time analysis - compile-time or run-time analysis selectivityselectivity - level of coherence actions - level of coherence actions restrictivenessrestrictiveness - conservative or as-needed consistency - conservative or as-needed consistency

enforcementenforcement adaptivityadaptivity - can protocol adapt to access patterns - can protocol adapt to access patterns granularitygranularity - size and structure of coherence data - size and structure of coherence data blockingblocking - program block on which coherence is - program block on which coherence is

enforcedenforced positioningpositioning - position of coherence instructions - position of coherence instructions updatingupdating - how memory is updated after a write - how memory is updated after a write checking checking - how incoherence is detected- how incoherence is detected

Page 25: Mehmet Senvar - Cache Coherence Protocols

25Cache Coherence Protocols

Software Coherence with Software Coherence with Limited Hardware Limited Hardware

SupportSupport Compiler must generate consistent code as no Compiler must generate consistent code as no

hardware coherence providedhardware coherence provided Hardware maintains time tags which are updated Hardware maintains time tags which are updated

on every writeon every write On a read, compiler generates coherence reads On a read, compiler generates coherence reads

which check time tags to insure data is consistentwhich check time tags to insure data is consistent Relies on the compiler to detect read which may Relies on the compiler to detect read which may

be inconsistent, and the hardware must maintain be inconsistent, and the hardware must maintain these time tagsthese time tags

Using tags, it is also possible to perform dynamic Using tags, it is also possible to perform dynamic self-invalidation of blocksself-invalidation of blocks

Many techniques based on using these time tagsMany techniques based on using these time tags

Page 26: Mehmet Senvar - Cache Coherence Protocols

26Cache Coherence Protocols

Software Coherence with Software Coherence with Limited Hardware Limited Hardware

Support (cont.)Support (cont.) If hardware has no time tags, Petersen and Li If hardware has no time tags, Petersen and Li

developed an algorithm which uses only page developed an algorithm which uses only page translation hardware and page status tablestranslation hardware and page status tables

Sharing information is maintained by a Sharing information is maintained by a software handler at the page-levelsoftware handler at the page-level

On a page access or fault, the software On a page access or fault, the software handler checks the sharing information, handler checks the sharing information, updates page tables, and performs coherence updates page tables, and performs coherence actionsactions

Slower than hardware as software handlers Slower than hardware as software handlers involve the OS and are on the critical memory involve the OS and are on the critical memory access pathaccess path

Page 27: Mehmet Senvar - Cache Coherence Protocols

27Cache Coherence Protocols

Enforcing Coherence by Enforcing Coherence by Restricting ParallelismRestricting Parallelism

Compilers can also guarantee coherence by Compilers can also guarantee coherence by structuring the language to limit parallelismstructuring the language to limit parallelism easier to enforce coherenceeasier to enforce coherence limits the programmer and potential parallelismlimits the programmer and potential parallelism simplifies compiler designsimplifies compiler design good performance can be achieved with no good performance can be achieved with no

hardware supporthardware support Parallel language restrictions include:Parallel language restrictions include:

doall parallel loopsdoall parallel loops master/slave processesmaster/slave processes

Page 28: Mehmet Senvar - Cache Coherence Protocols

28Cache Coherence Protocols

Optimizing CompilersOptimizing Compilers Optimizing compilers are designed to Optimizing compilers are designed to

maintain coherence with limited hardware maintain coherence with limited hardware support without overly restricting the support without overly restricting the programmerprogrammer rely on detecting data dependenciesrely on detecting data dependencies may use synchronization variables (locks, may use synchronization variables (locks,

barriers)barriers) can provide the hardware with hintscan provide the hardware with hints can detect when coherence is not neededcan detect when coherence is not needed may have problems with dynamic sharingmay have problems with dynamic sharing offer good performance, but are hard to designoffer good performance, but are hard to design

Page 29: Mehmet Senvar - Cache Coherence Protocols

29Cache Coherence Protocols

Future WorkFuture Work Hardware protocols are well defined, and the Hardware protocols are well defined, and the

directory structure is near optimaldirectory structure is near optimal Cost improvements can be obtained by mass Cost improvements can be obtained by mass

producing cache controller chipsproducing cache controller chips Software protocols are a good area for future Software protocols are a good area for future

research because they are also applicable at research because they are also applicable at higher-levels of sharing (DSM, databases, ...)higher-levels of sharing (DSM, databases, ...)

Optimizing compilers need to be improved to Optimizing compilers need to be improved to detect data dependencies and optimize code detect data dependencies and optimize code for the parallel environmentfor the parallel environment

Page 30: Mehmet Senvar - Cache Coherence Protocols

30Cache Coherence Protocols

ConclusionsConclusions Hardware protocols offer the best Hardware protocols offer the best

performance but require high hardware performance but require high hardware costscosts

Software protocols can be used when there Software protocols can be used when there is no hardware support with a slight is no hardware support with a slight performance penaltyperformance penalty

Optimizing compilers can enforce Optimizing compilers can enforce coherence or provide hints to the hardwarecoherence or provide hints to the hardware

A combination of hardware and compiler A combination of hardware and compiler optimizations is the bestoptimizations is the best