Upload
tocho
View
28
Download
0
Embed Size (px)
DESCRIPTION
CRL (C Region Library). Chao Huang, James Brodman, Hassan Jafri CS498LVK. Introduction. CRL is an all-software distributed shared memory (DSM) system Provides shared address space Built on PVM “Region”: an arbitrarily sized, continuous area of memory Consistent cached copy at local nodes. - PowerPoint PPT Presentation
Citation preview
CRL CRL (C Region Library)(C Region Library)
Chao Huang, James Brodman, Chao Huang, James Brodman, Hassan JafriHassan Jafri
CS498LVKCS498LVK
IntroductionIntroduction• CRL is an all-software distributed
shared memory (DSM) system– Provides shared address space– Built on PVM
• “Region”: an arbitrarily sized, continuous area of memory– Consistent cached copy at local nodes
FunctionsFunctions• Environment
– crl_init– crl_num_nodes, crl_self_addr
• Basic region operations– rid_t rgn_create(unsigned size)– void rgn_destroy(rid_t rgn_id)– rid_t rgn_rid(void *rgn)– unsigned rgn_size(void *rgn)– void rgn_flush(void* rgn)
FunctionsFunctions• Region mapping
– void* rgn_map(rid_t rgn_id)– void rgn_unmap(void* rgn)
• Region read and write– void rgn_start_read(void *rgn)– void rgn_end_read(void *rgn)– void rgn_start_write(void *rgn)– void rgn_end_write(void *rgn)
FunctionsFunctions• Global synchronization
– void rgn_barrier(void)– void rgn_bcast_send(int len, void *buf)– void rgn_bcast_recv(int len, void *buf)– double rgn_reduce_dadd(double arg)– double rgn_reduce_dmin(double arg)– double rgn_reduce_dmax(double arg)
ExampleExample/* Compute the dot product of * two n-element vectors, each * of which is represented by * appropriately-sized region * x: region identifier for 1st
vector * y: address at which 2nd vector
is already mapped */double dotprod(rid_t x, double *y,
int n){
int i;double *z;double rslt;/* map 1st vector and
initiate read operation */z = (double *) rgn_map(x);rgn_start_read(z);
/* initiate read operation on 2nd vector */
rgn_start_read(y);/* compute dot product */rslt = 0;for (i=0; i<n; i++) rslt += z[i] * y[i];/* terminate read operations
and unmap 1st vector */rgn_end_read(y);rgn_end_read(z);rgn_unmap(z);return rslt;
}
DiscussionsDiscussions• All-software: latency of communication
operations may be higher than hardware based system
• Region size can be chosen to correspond to user data structures (programmer’s responsibility)
• Fixed-home, directory-based invalidate protocol
• Ordered message delivery: 32-bit version number tags each region
• Unmapped region cache : unique mapping can be cached after unmapped
URCURC• Enables Lazy Release Consistency for
CRL• rgn_start_op can be satisfied locally if
region is not invalidated before next time it is mapped
• Even if data/region is invalidated, later accesses can be satisfied more quickly
SoftwareSoftware• Prototype implementation available• Platforms
– CM-5 Thinking Machines (message passing multicomputer)
– Alewife (Distributed memory multiprocessor). Provides Native shared memory support
– TCP/Unix Implementation for SunOS• Expect a Linux port soon
Machine CharacteristicsMachine CharacteristicsCM-5 Alewife
Throughput 34us 14us
Latency 8MB/sec 18MB/sec
Basic Ops LatenciesBasic Ops LatenciesCM-5 (us) Alewife
(us)Alewife native(us)
Start read hit 2.5 2.3
End read hit 3 2.5
Start read miss 0 inv 55.1 29 1.9
Start write miss 1 inv 108.1 48.9 3.3
Start write miss 6 inv 129.9 96.7 35.4
ApplicationsApplications• 32-way completion time of apps with
CRL on Alewife comparable to that of Alewife native shared memory – How? Upto 5 remote headers supported
by LimitLESS (Alewife’s software-based cache-coherence subsystem)