36
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Embed Size (px)

Citation preview

Page 1: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Distributed Shared Memory:A Survey of Issues and Algorithms

B,. Nitzberg and V. LoUniversity of Oregon

Page 2: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

INTRODUCTION

• Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space

Page 3: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Why bother with DSM?

• Key idea is to build fast parallel computers that are– Cheaper than shared memory multiprocessor

architectures– As convenient to use

Page 4: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

CPU

Shared memory

Conventional parallel architecture

CACHE CACHE CACHE CACHE

CPU CPU CPU

Page 5: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Today’s architecture

• Clusters of workstations are much more cost effective– No need to develop complex bus and cache

structures– Can use off-the-shelf networking hardware

• Gigabit Ethernet • Myrinet (1.5 Gb/s)

– Can quickly integrate newest microprocessors

Page 6: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Limitations of cluster approach

• Communication within a cluster of workstation is through message passing– Much harder to program than concurrent

access to a shared memory• Many big programs were written for shared

memory architectures– Converting them to a message passing

architecture is a nightmare

Page 7: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Distributed shared memory

DSM = one shared global address space

main memories

Page 8: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Distributed shared memory

• DSM makes a cluster of workstations look like a shared memory parallel computer– Easier to write new programs– Easier to port existing programs

• Key problem is that DSM only provides the illusion of having a shared memory architecture– Data must still move back and forth among

the workstations

Page 9: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Basic approaches

• Hardware implementations:– Use extensions of traditional hardware

caching architecture• Operating system/library implementations:

– Use virtual memory mechanisms• Compiler implementations

– Compiler handles all shared accesses

Page 10: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Design Issues (I)

1. Structure and granularity– Big units are more efficient

• Virtual memory pages– Can have false sharing whenever page

contains different variables that are accessed at the same time by different processors

Page 11: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

False Sharing

accesses x accesses y

x y

page containing x and y will move back and forthbetween main memories of workstations

Page 12: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Design Issues (II)

1. Structure and granularity (cont'd)– Shared objects can also be

• Objects from a distributed object-oriented system

• Data types from an extant language

Page 13: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Design Issues (III)

2. Coherence semantics– Strict consistency is not possible– Various authors have proposed weaker

consistency models• Cheaper to implement• Harder to use in a correct fashion

Page 14: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Design Issues (IV)

3. Scalability– Possibly very high but limited by

• Central bottlenecks• Global knowledge operation and storage

Page 15: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Design Issues (V)

4. Heterogeneity– Possible but complex to implement

Page 16: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Portability Issues

• Portability of programs– Some DSMs allow programs written for a

multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks)

– More efficient DSMs require more changes• Portability of DSM

– Some DSMs require specific OS features

Not in paper

Page 17: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Implementation Issues (I)

1. Data Location and Access:• Keep data a single centralized location • Let data migrate (better) but must have way to

locate them• Centralized server (bottleneck)• Have a "home" node associated with

each piece of data • Will keep track of its location

Page 18: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Implementation Issues (II)

1. Data Location and Access (cont'd):• Can either

• Maintain a single copy of each piece of data• Replicate it on demand

• Must either• Propagate updates to all replicas• Use an invalidation protocol

Page 19: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Invalidation protocol

• Before update:

• At update time

X = 0 X = 0 X = 0

X = 5 X = 0 X = 0INVALID INVALID

Page 20: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Main advantage

• Locality of updates:– A page that is being modified has a high

likelihood of being modified again• Invalidation mechanism minimizes consistency

overhead– One single invalidation replaces many

updates

Page 21: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

A realization: Munin

• Developed at Rice University• Based on software objects (variables)• Used the processor virtual memory to detect

access to the shared objects• Included several techniques for reducing

consistency-related communication• Only ran on top of the V kernel

Page 22: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Munin main strengths

• Excellent performance • Portability of programs

– Allowed programs written for a multiprocessor architecture to run on a cluster of workstations with a minimum number of changes(dusty decks)

Page 23: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Munin main weakness

• Very poor portability of Munin itself– Depended of some features of the V kernel

• Not maintained since the late 80's

Page 24: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Consistency model

• Munin uses software release consistency– Only requires the memory to be consistent at

specific synchronization points

Page 25: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

SW release consistency (I)

• Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables– P(&mutex) and V(&mutex)– lock(&csect) and unlock(&csect) – acquire( ) and release( )

• Unprotected accesses can produce unpredictable results

Page 26: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

SW release consistency (II)

• SW release consistency will only guarantee correctness of operations performed within a request/release pair

• No need to export the new values of shared variables until the release

• Must guarantee that workstation has received the most recent values of all shared variables when it completes a request

Page 27: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

SW release consistency (III)

shared int x;acquire( );

x = 1;release ( );// export x=1

shared int x;

acquire( );// wait for new value of x

x++;release ( );// export x=2

Page 28: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

SW release consistency (IV)

• Must still decide how to release updated values– Munin uses eager release:

• New values of shared variables were propagated at release time

Page 29: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

SW release consistency (V)

Eagerrelease

Each release forwards the update to the two other processors.

Page 30: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Multiple write protocol

• Designed to fight false sharing• Uses a copy-on-write mechanism• Whenever a process is granted access to write-

shared data, the page containing these data is marked copy-on-write

• First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).

Page 31: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Creating a twin Not in paper

Page 32: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

x = 1

y = 2

x = 1

y = 2

First write access

twin

x = 3

y = 2

Before

After

Compare with twinNew value of x is 3

Example Not in paper

Page 33: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Other DSM Implementations (I)

• Software release consistency with lazy release (Treadmarks)– Faster and designed to be portable

• Sequentially-Consistent Software DSM (IVY):– Sends messages to other copies at each write– Much slower

Page 34: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Other DSM Implementations (II)

• Entry consistency (Midway):– Requires each variable to be associated to a

synchronization object (typically a lock)– Acquire/release operations on a given

synchronization object only involve the variables associated with that object

– Requires less data traffic– Does not handle well dusty decks

Page 35: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

Other DSM Implementations (III)

• Structured DSM Systems (Linda):– Offer to the programmer a shared tuple space

accessed using specific synchronized methods

– Require a very different programming style

Page 36: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon

TODAY'S IMPACT

• Very low:– According to W. Zwaepoel. truth is that

computer clusters are "only suitable for coarse-grained parallel computation" and this is "[a] fortiori true for DSM"

– DSM competed with OpenMP model and OPenMP model won