View
240
Download
0
Tags:
Embed Size (px)
Citation preview
CVM(Coherent Virtual Machine)
CVM
• CVM is a user-level library
• Enable the program to exploit shared-memory semantics over message-passing hardware.
• Page-based DSM
• Written in C++
• Built on top of UDP or MPI
CVM
• CVM was created by Pete Keleher in 1995.
• CVM was created specifically as a platform for protocol experimentation.
• These slides are based on the material in CVM manual, which can be found on website (http://www.cs.umd.edu/projects/cvm)
CVM Routines
Initialization / Termination
• Initialization– cvm_startup(int, char**)
• Called after the program processes its own argument.• program <opt> -- <CVM opt> -- <protocol opt>
• Termination– cvm_finish()
• Called by master process, it will wait until all processes are completed
– cvm_exit(char*, …)• A quick exit for error
Example
• Most program are in the following form
int main(int argc,char*argv[]) {…cvm_startup(argc,argv);……cvm_finish();}
Process Creation
• cvm_create_procs(func_ptr worker)– Create the execution entries on all slave mac
hines.– The function should be in the form
• void (*worker)()
– There are some pre-defined macro and variables can be used.
• cvm_num_procs, cvm_proc_id, PID, TID
Shared memory allocation
• cvm_alloc(int sz)– Generally, all shared data in CVM programs is
necessarily dynamically allocated.– All calls to cvm_alloc() must be completed bef
ore cvm_create_procs()– The usage is the same as malloc()
• int *buf = (int*)cvm_alloc( sizeof(int) * N )
Synchronization
• cvm_lock(int id), cvm_unlock(int id)– Acquire and release the global lock specified
by id;– Current maximum number of lock is 4110.
• Can be modified in cvm.h
• cvm_barrier(int id)– Perform a global barrier.– The id parameter is currently ignored.
Access shared data
• The processes should lock the same ‘id’ when they access the shared data.– As the shared-memory, mutex is need to be e
nsure.
lock() unlock()
Memoryoperation
lock()
Lazy Release Consistency
Without this lock,The memory info
can’t be renew
Cont.
• Using barrier to exchange all info among machines.
Barrier()
Barrier()
Barrier()
Barrier()
All shared dataare synchronized.
P[0:9]=1
P[10:19]=2
P[20:29]=3
P[30:39]=4
synchronization
• Wait & signal– cvm_signal_pause(), cvm_signal(int pid)
• The signal can be buffered. (only one)
– The order doesn’t matter.
signal()
buffered..
signal_pause()
signal()
buffered..
signal_pause()
Works fine!
signal()
signal_pause()
Blocks at the second pause
CVM arguments
• the command line– $ ./cvmprog <opt> -- <CVM args> -- <prot args>
• -d : turn on the debugging output• -n<num> : specify the # of procs• -P<num> : specify the size of pages <8192>• -t<num> : use per-node multithreading
– hide communication latency.
• -X<num> : specify the protocol
Consistency protocol
• Default is lazy multi-writer (0)– Allowing multiple writer to simultaneously access the
same page without communication• Using diff
• Lazy single-writer (1)– Only a single writer can access the page at a time.
(false sharing)
• Sequentially consistent single-writer (2)– Every write will invoke invalidation. (lots of comm.)
Home-based RC
• Home-based multi-writer (3)
• Sometimes, the LRC still needs to send lots of diffs.
Lock()
Lock()
Lock()
unlock()
unlock()
unlock()
diffs
Two sets of diffs
Cont.
• Every page has its own home(-node), which take care of it.– All diffs are sent to the home.
Lock() Lock()
Lock()
unlock() unlock()
unlock()
diffs
Diffs orwhole page
diffsHome-node
Example code#include “cvm.h”#include<stdio.h>#define DATA_SZ 1000int *data,*psum,*gidx;
void worker() { int lidx; psum[cvm_proc_id] = 0; do { cvm_lock(0); lidx=*gidx++; cvm_unlock(0); if( lidx > DATA_SZ) break;
psum[cvm_proc_id]+=data[lidx]; }while(1); cvm_barrier(0); // the psum need to be synchronized}
int main(int argc, char *argv[]) { int sum, i;
cvm_startup(argc,argv); // allocation of shared data gidx = cvm_alloc(sizeof(int)); data = cvm_alloc(sizeof(int)*DATA_SZ); psum = cvm_alloc(sizeof(int)*cvm_num_procs);
// data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1; cvm_create_procs(worker); worker();
for(sum=0,i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum);
cvm_finish();}
Without contention#include “cvm.h”#include<stdio.h>#define DATA_SZ 1000
int *psum, *data;
void worker() { int i; psum[PID] = 0; // PID is the same as cvm_proc_id for(i=PID;i<DATA_SZ;i+=cvm_num_procs) psum[PID] += data[i]; cvm_barrier(0); // still for psum}
int main(int argc, char *argv[]) { int sum,i;
cvm_startup(argc,argv); // allocation of shared data psum = cvm_alloc(sizeof(int)*cvm_num_procs); data = cvm_alloc(sizeof(int)*DATA_SZ); // data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1;
cvm_create_procs(worker); worker();
for(sum=0, i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum); cvm_finish();}
cvm_reduce
• cvm_reduce(void *global, void *local, int rtype, int dtype, int num)– Similar to MPI_Reduce– Four operations are provided.
• min, max, sum, product
• E.g.cvm_reduce(sum, psum, REDUCE_sum, REDU
CE_int, 1);– Need #include ”reduce.h”