Upload
sophie-cobb
View
218
Download
0
Embed Size (px)
Citation preview
Distributed Shared Memory:A Survey of Issues and Algorithms
B,. Nitzberg and V. LoUniversity of Oregon
INTRODUCTION
• Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space
Why bother with DSM?
• Key idea is to build fast parallel computers that are– Cheaper than shared memory multiprocessor
architectures– As convenient to use
CPU
Shared memory
Conventional parallel architecture
CACHE CACHE CACHE CACHE
CPU CPU CPU
Today’s architecture
• Clusters of workstations are much more cost effective– No need to develop complex bus and cache
structures– Can use off-the-shelf networking hardware
• Gigabit Ethernet • Myrinet (1.5 Gb/s)
– Can quickly integrate newest microprocessors
Limitations of cluster approach
• Communication within a cluster of workstation is through message passing– Much harder to program than concurrent
access to a shared memory• Many big programs were written for shared
memory architectures– Converting them to a message passing
architecture is a nightmare
Distributed shared memory
DSM = one shared global address space
main memories
Distributed shared memory
• DSM makes a cluster of workstations look like a shared memory parallel computer– Easier to write new programs– Easier to port existing programs
• Key problem is that DSM only provides the illusion of having a shared memory architecture– Data must still move back and forth among
the workstations
Basic approaches
• Hardware implementations:– Use extensions of traditional hardware
caching architecture• Operating system/library implementations:
– Use virtual memory mechanisms• Compiler implementations
– Compiler handles all shared accesses
Design Issues (I)
1. Structure and granularity– Big units are more efficient
• Virtual memory pages– Can have false sharing whenever page
contains different variables that are accessed at the same time by different processors
False Sharing
accesses x accesses y
x y
page containing x and y will move back and forthbetween main memories of workstations
Design Issues (II)
1. Structure and granularity (cont'd)– Shared objects can also be
• Objects from a distributed object-oriented system
• Data types from an extant language
Design Issues (III)
2. Coherence semantics– Strict consistency is not possible– Various authors have proposed weaker
consistency models• Cheaper to implement• Harder to use in a correct fashion
Design Issues (IV)
3. Scalability– Possibly very high but limited by
• Central bottlenecks• Global knowledge operation and storage
Design Issues (V)
4. Heterogeneity– Possible but complex to implement
Portability Issues
• Portability of programs– Some DSMs allow programs written for a
multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks)
– More efficient DSMs require more changes• Portability of DSM
– Some DSMs require specific OS features
Not in paper
Implementation Issues (I)
1. Data Location and Access:• Keep data a single centralized location • Let data migrate (better) but must have way to
locate them• Centralized server (bottleneck)• Have a "home" node associated with
each piece of data • Will keep track of its location
Implementation Issues (II)
1. Data Location and Access (cont'd):• Can either
• Maintain a single copy of each piece of data• Replicate it on demand
• Must either• Propagate updates to all replicas• Use an invalidation protocol
Invalidation protocol
• Before update:
• At update time
X = 0 X = 0 X = 0
X = 5 X = 0 X = 0INVALID INVALID
Main advantage
• Locality of updates:– A page that is being modified has a high
likelihood of being modified again• Invalidation mechanism minimizes consistency
overhead– One single invalidation replaces many
updates
A realization: Munin
• Developed at Rice University• Based on software objects (variables)• Used the processor virtual memory to detect
access to the shared objects• Included several techniques for reducing
consistency-related communication• Only ran on top of the V kernel
Munin main strengths
• Excellent performance • Portability of programs
– Allowed programs written for a multiprocessor architecture to run on a cluster of workstations with a minimum number of changes(dusty decks)
Munin main weakness
• Very poor portability of Munin itself– Depended of some features of the V kernel
• Not maintained since the late 80's
Consistency model
• Munin uses software release consistency– Only requires the memory to be consistent at
specific synchronization points
SW release consistency (I)
• Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables– P(&mutex) and V(&mutex)– lock(&csect) and unlock(&csect) – acquire( ) and release( )
• Unprotected accesses can produce unpredictable results
SW release consistency (II)
• SW release consistency will only guarantee correctness of operations performed within a request/release pair
• No need to export the new values of shared variables until the release
• Must guarantee that workstation has received the most recent values of all shared variables when it completes a request
SW release consistency (III)
shared int x;acquire( );
x = 1;release ( );// export x=1
shared int x;
acquire( );// wait for new value of x
x++;release ( );// export x=2
SW release consistency (IV)
• Must still decide how to release updated values– Munin uses eager release:
• New values of shared variables were propagated at release time
SW release consistency (V)
Eagerrelease
Each release forwards the update to the two other processors.
Multiple write protocol
• Designed to fight false sharing• Uses a copy-on-write mechanism• Whenever a process is granted access to write-
shared data, the page containing these data is marked copy-on-write
• First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).
Creating a twin Not in paper
x = 1
y = 2
x = 1
y = 2
First write access
twin
x = 3
y = 2
Before
After
Compare with twinNew value of x is 3
Example Not in paper
Other DSM Implementations (I)
• Software release consistency with lazy release (Treadmarks)– Faster and designed to be portable
• Sequentially-Consistent Software DSM (IVY):– Sends messages to other copies at each write– Much slower
Other DSM Implementations (II)
• Entry consistency (Midway):– Requires each variable to be associated to a
synchronization object (typically a lock)– Acquire/release operations on a given
synchronization object only involve the variables associated with that object
– Requires less data traffic– Does not handle well dusty decks
Other DSM Implementations (III)
• Structured DSM Systems (Linda):– Offer to the programmer a shared tuple space
accessed using specific synchronized methods
– Require a very different programming style
TODAY'S IMPACT
• Very low:– According to W. Zwaepoel. truth is that
computer clusters are "only suitable for coarse-grained parallel computation" and this is "[a] fortiori true for DSM"
– DSM competed with OpenMP model and OPenMP model won