240-322 Cli/Serv.: Dist. Prog./2 1
Client/Server Distributed SystemsClient/Server Distributed Systems
ObjectivesObjectives– explain the general meaning of distributed explain the general meaning of distributed
programming beyond client/serverprogramming beyond client/server– look at the history of distributed programminglook at the history of distributed programming
240-322, Semester 1, 2005-2006
2. Distributed Programming Concepts
240-322 Cli/Serv.: Dist. Prog./2 2
OverviewOverview1. Definition1. Definition
2.2. From Parallel to Distributed From Parallel to Distributed
3.3. Forms of Communication Forms of Communication
4. Data Distribution4. Data Distribution
5. Algorithmic Distribution5. Algorithmic Distribution
6. Granularity6. Granularity
7. Load Balancing7. Load Balancing
8. Brief History of Distributed Programming8. Brief History of Distributed Programming
240-322 Cli/Serv.: Dist. Prog./2 3
1. Definition1. Definition
Distributed programming is the spreading of a Distributed programming is the spreading of a computational task across several programs, computational task across several programs, processes, or processors.processes, or processors.
Includes parallel (concurrent) and networked Includes parallel (concurrent) and networked programming.programming.
Definition is a bit vague.Definition is a bit vague.
240-322 Cli/Serv.: Dist. Prog./2 4
2. From Parallel to Distributed2. From Parallel to Distributed
Most parallel languages talk about processes:Most parallel languages talk about processes:– these can be on different processors these can be on different processors oror on different on different
computerscomputers
The implementor may choose to add language The implementor may choose to add language features to explicitly say features to explicitly say wherewhere a process should run. a process should run.
May also choose to address network issues May also choose to address network issues (bandwidth, failure, etc.) at the language level.(bandwidth, failure, etc.) at the language level.
continued
240-322 Cli/Serv.: Dist. Prog./2 5
Often resources required by programs Often resources required by programs are distributed, which means that the are distributed, which means that the programs must be distributed.programs must be distributed.
continued
240-322 Cli/Serv.: Dist. Prog./2 6continued
240-322 Cli/Serv.: Dist. Prog./2 7
Network TransparencyNetwork Transparency
Most users want networks to be as Most users want networks to be as transparenttransparent (invisible) as possible:(invisible) as possible:– users do not want to care which machine is used to users do not want to care which machine is used to
store their filesstore their files
– they do not want to know where a process is runningthey do not want to know where a process is running
240-322 Cli/Serv.: Dist. Prog./2 8
3. Forms of Communication3. Forms of Communication
1-to-1 communication1-to-1 communication
1-to-many communication1-to-many communication
continued
These can be supported on top of shared memory oror distributed memory platforms.
processes
240-322 Cli/Serv.: Dist. Prog./2 9
many-to-1 communicationmany-to-1 communication
many-to-many communicationmany-to-many communication
240-322 Cli/Serv.: Dist. Prog./2 10
4. Data Distribution4. Data Distribution
Divide input data between identical separate Divide input data between identical separate processes.processes.
Examples:Examples:– database searchdatabase search– edge detection in an imageedge detection in an image– builders making a room with bricksbuilders making a room with bricks
240-322 Cli/Serv.: Dist. Prog./2 11
Boss-WorkersBoss-Workers
boss
workers(all database search engines)
send partpart of database
send answer
240-322 Cli/Serv.: Dist. Prog./2 12
workers often need to talk to one anotherworkers often need to talk to one another
boss
workers (all builders)
send bricks
done
talking
240-322 Cli/Serv.: Dist. Prog./2 13
Boss - Eager WorkersBoss - Eager Workers
boss
workers (all builders)
ask for bricks
send bricks
talking
240-322 Cli/Serv.: Dist. Prog./2 14
Things to NoteThings to Note
The code is duplicated in every process.The code is duplicated in every process.
The maximum no. of processes depends on the The maximum no. of processes depends on the size of the task and difficulty of dividing data.size of the task and difficulty of dividing data.
Talking can be very hard to code.Talking can be very hard to code.
Talking is usually called communication, Talking is usually called communication, synchronisationsynchronisation or or cooperationcooperation
continued
240-322 Cli/Serv.: Dist. Prog./2 15
Communication is almost always implemented Communication is almost always implemented using message passing.using message passing.
How are processes assigned to processors?How are processes assigned to processors?
240-322 Cli/Serv.: Dist. Prog./2 16
5. Algorithmic Distribution5. Algorithmic Distribution Divide algorithm into parallel parts / processesDivide algorithm into parallel parts / processes
– e.g. UNIX pipese.g. UNIX pipes
collector washer
DrierStacker
dirtyplates
clean wetplates
wipe dryplates
dirty plateson table
plates incupboard
240-322 Cli/Serv.: Dist. Prog./2 17
Things to NoteThings to Note
Talking is simple: pass data to next process Talking is simple: pass data to next process which ‘wakes up’ that process.which ‘wakes up’ that process.
Talking becomes harder to code if there are Talking becomes harder to code if there are loops.loops.
How to assign processes to processors?How to assign processes to processors?
240-322 Cli/Serv.: Dist. Prog./2 18
Several Workers per Sub-taskSeveral Workers per Sub-task
Use both algorithmic and data distribution.Use both algorithmic and data distribution. Problems: how to divide data?Problems: how to divide data?
how to combine data?how to combine data?
collectorwasher
DrierStacker
Drier
Drier
collector
240-322 Cli/Serv.: Dist. Prog./2 19
Parallelise Separate Sub-tasksParallelise Separate Sub-tasks
bricklaying
electricalwiring
plumbing
Build a house:
paint
b | (pl & e) | pt
240-322 Cli/Serv.: Dist. Prog./2 20
6. Granularity6. Granularity
Amount of data handled by a process:Amount of data handled by a process:
Course grainedCourse grained: lots of data per process: lots of data per process– e,g, UNIX processese,g, UNIX processes
Fine grainedFine grained: small amounts of data per : small amounts of data per processprocess– e.g. UNIX threads, Java threadse.g. UNIX threads, Java threads
240-322 Cli/Serv.: Dist. Prog./2 21
7. Load Balancing7. Load Balancing
How to assign processes to processors?How to assign processes to processors?
Want to ‘even out’ work so that each processor Want to ‘even out’ work so that each processor does about the same amount of work.does about the same amount of work.
But: But: – different processors have different capabilitiesdifferent processors have different capabilities
– must consider cost of moving a process to a processormust consider cost of moving a process to a processor(e.g. network speed, load)(e.g. network speed, load)
240-322 Cli/Serv.: Dist. Prog./2 22
8. Brief History of (UNIX) 8. Brief History of (UNIX) Distributed ProgrammingDistributed Programming
1970’s: UNIX was a multi-user, time-sharing OS1970’s: UNIX was a multi-user, time-sharing OS– &, pipes&, pipes
– interprocess communication (IPC) on a single interprocess communication (IPC) on a single processorprocessor
mid 1980’s: System V UNIXmid 1980’s: System V UNIX– added extra IPC mechanisms: shared memory, added extra IPC mechanisms: shared memory,
messages, queues, etc.messages, queues, etc.
continued
240-322 Cli/Serv.: Dist. Prog./2 23
late 1970's to mid 1980’s: ARPA late 1970's to mid 1980’s: ARPA – US Advanced Research Projects AgencyUS Advanced Research Projects Agency– funded research that produced TCP/IP, socketsfunded research that produced TCP/IP, sockets– added to BSD Unix 4.2added to BSD Unix 4.2
mid-late 1980’s: utilities developedmid-late 1980’s: utilities developed– telnettelnet, , ftpftp
– r* utilities: r* utilities: rloginrlogin, , rcprcp, , rshrsh
– client-server model based on socketsclient-server model based on sockets
continued
240-322 Cli/Serv.: Dist. Prog./2 24
1986: System V UNIX1986: System V UNIX– released TL1, a set of socket-based libraries that released TL1, a set of socket-based libraries that
support OSIsupport OSI– not widely usednot widely used
late 1980’s: Sun Microsystemslate 1980’s: Sun Microsystems– NFS (Network File System)NFS (Network File System)– RPC (Remote Procedure Call)RPC (Remote Procedure Call)– NIS (Network Information Services)NIS (Network Information Services)
continued
240-322 Cli/Serv.: Dist. Prog./2 25
early 1990’searly 1990’s– POSIX threads (light-weight processes)POSIX threads (light-weight processes)– Web client-server model based on TCP/IPWeb client-server model based on TCP/IP
mid 1990's: Javamid 1990's: Java– Java threadsJava threads– Java Remote Method Invocation (RMI)Java Remote Method Invocation (RMI)– CORBACORBA
continued
240-322 Cli/Serv.: Dist. Prog./2 26
late 1990's / early 2000'slate 1990's / early 2000's– J2EE, .NETJ2EE, .NET– peer-to-peer (P2P)peer-to-peer (P2P)
Napster, Gnutella, etc.Napster, Gnutella, etc.
– JXTAJXTA