29
CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Embed Size (px)

Citation preview

Page 1: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

CIS669Distributed and Parallel

Processing

Spring 2002

Professor Yuan Shi

Page 2: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Distributed Processing

Transaction-oriented

Geographically dispersed locations

I/O intense

Database-central

Page 3: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Parallel Processing

Non-transactional, single goal computing

Computing intense and/or data-intense

May or may not involve databases

Page 4: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Is There a Real Difference?

Not in terms of functionality and resource use intensity.For transactional systems, there are OLAP (Online Analysis Processing) and data mining tools that is computing intense and single goal-oriented.For parallel processing, many scientific/engineering applications need to interact with databases to make more accurate calculations.

Page 5: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Parallelism and Programming DifficultiesFor distributed processing, parallelism is given and usually cannot easily change. Programming is relatively easy.

For parallel processing, the programmer defines parallelism by partitioning the serial program(s). Parallel programming in general is more difficult than transaction applications.

Page 6: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

This picture is changing…

Industrial-strength distributed applications are evolving into more parallel-like.

Lab-based parallel applications are blending into industrial strength applications by incorporating transactions.

Page 7: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Why Clusters (the textbook)?

We have tried all others: vector, dataflow, NUMA, hypercube, 3D-torus, etc.

Parallel programming does not get easier with any configuration.

Cluster promises the most potential for cost/performance. Check this out:

Page 8: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi
Page 9: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Types of Parallelism(Flynn,1972*)

1.SIMD (Single Instruction Multiple Data)

2.MIMD (Multiple Instruction Multiple Data)

3.MISD (Pipeline) *Flynn, M., Some Computer Organizations and Their Effectiveness, IEEE Trans. Comput., Vol. C-21, pp. 94, 1972.

** Other taxonomies to categorize parallel machines. (see http://csep1.phy.ornl.gov/ca/node11.html)

Page 10: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

SIMDI

I

I

I

D1

D2

D3

D4

Tseq=4

Tpar=1

Sp = Tseq/Tpar=4=P

Page 11: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

MIMDI1

I2

I3

I4

D1

D2

D3

D4

Tseq=4

Tpar=1

Sp = Tseq/Tpar=4=P

Page 12: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Pipeline(MISD)

I1 I2 I3 I4D1D2D3D4

Tseq=4x4=16

Tpar=4+3=7

Sp = Tseq/Tpar~=2.3

Page 13: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Machines that can work in parallel

Cray: X-MP, Y-MP, T3D.

TMC: CM1-CM5.

Kendal Square Research: KSR-1

SGI: Power Challenge,Origin

IBM: 3090, SP2…

PCs

Page 14: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

History

Single CPU: Smaller size -> faster speed (Cray, remember Moore’s Law?)

Muti-CPU: Share memory or not share memory?

The war between Big-iron and Many irons: Cray against TMC.

Result: All lost. Cluster won by survival.

Page 15: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

State of ArtSymmetric Multiprocessing is still the only practical industrial application. Vendors include HP, Sun, SGI, IBM, Compaq/Tandem, Status.Special purpose, small scale multiprocessors: CISCO routers, SSL processors, MPEG decoders, etc.Special purpose massively parallel Processors are designed for special types of applications, such as human genome classification, nuclear accelerator simulation, fluid-dynamic simulations, etc.

Page 16: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Hardware Technology Advances*

Computing LawsComputing Laws

Transistor density doubles every 18 months 60% increase per year– Chip density transistors/die – Micro processor speeds

Exponential growth:– The past does not matter– 10x here, 10x there … means REAL change

PC costs decline faster than any other platform– Volume and learning curves– PCs are the building bricks of all future systems

Moore’s First Law

128KB128KB

128MB128MB

200020008KB8KB

1MB1MB

8MB8MB

1GB1GB

19701970 19801980 19901990

1M1M 16M16Mbits: 1Kbits: 1K 4K4K 16K16K 64K64K 256K256K 4M4M 64M64M 256M256M

1 chip memory size1 chip memory size( 2 MB to 32 MB)( 2 MB to 32 MB)

* Credit: Gordon Bell

Page 17: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Region/Region/IntranetIntranet

CampusCampusHome…Home… buildingsbuildings

BodyBody

WorldWorld

ContinentContinent

Everything cyberizable will Everything cyberizable will be in Cyberspace and be in Cyberspace and

covered by a hierarchy of covered by a hierarchy of computers!computers!

Cars… Cars… phys. nets phys. nets

* Credit: Gordon Bell

Page 18: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Distributed Programming Tools

•C/C++ with TCP/IP

•Perl with TCP/IP

•Java

•Corba

•ASP

•.Net

Page 19: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Parallel Programming Tools

PVM

MPI

Synergy

Others (proprietary hardware)

Page 20: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Semester Outline

Parallel programming

Architecture and performance evaluation

Distributed programming

Architecture and performance evaluation

Project selection

Project implementation

Presentation

Page 21: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Parallel Programming Difficulties

Program partition and allocation

Data partition and allocation

Program(process) synchronization

Data access mutual exclusion

Dependencies

Process(or) failures

Scalability…

Page 22: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Meeting the ChallengeUse the Stateless Parallel Processing principle. (U.S. Patent: #5,517,656, May 1996).

Advantages: High performance – automatic formation of SIMD, MIMD

and MISD clusters at runtime. Runtime add/subtract processors allows for ultimate

scalability. It is the ONLY multiprocessor architecture designed with

fault tolerance in mind. Ease of programming – no mutual exclusion problems,

automatic tools possible.

Page 23: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Stateless Parallel Processing

A stateless program is any program whose execution does not hard-wire and does not incur side-effects on ANY global information.

Non-stateless program example: All PVM/MPI programs. Since they create processes with IDs(global information).

Page 24: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Why Stateless Programs?

A stateless program can execute on any processor. This allows dynamic formation of SIMD, MIMD and MISD clusters at runtime.

Only stateless programs can promise the ultimate scalability (adding a processor on the fly) and fault tolerance (loosing a processor on the fly).

Page 25: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Stateless Parallel Processor

High Speed Switch

Processor Processor

Processor Processor

Processor Processor

Unidirectional Ring

Page 26: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Operations of A Stateless Parallel Processor

The shared disk stores ALL stateless programs.The unidirectional ring flows control tuples of two types: read and exclusive read. Read tuples drops off the ring after on rotation. Exclusive-read tuples drops of the ring after being consumed.Each processor can execute ANY stateless program from the shared disk.Control tuples carry data locations to allow direct data access via high speed switch.

Page 27: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

How does a stateless system start?

An initialization program sends initial ER tuple(s) onto the ring.

It fires up all dependent programs on multiple processors (MIMD).

Newly generated tuples fire up more programs.

A SIMD cluster forms when a stateless program can accept multiple tuple values (MD).

MISD (pipeline) forms when multiple processors form a chain of dependency with sufficient data supply.

Page 28: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

How do you get your hands on a SPP?

Synergy. Synergy is a close-approximation of SPP. It uses a tuple space to replace the unidirectional ring (same function, but slower). Multiple tuple spaces are used to simulate the high speed switch.

Note: The absence of the high speed switch costs great deal on performance.

Page 29: CIS669 Distributed and Parallel Processing Spring 2002 Professor Yuan Shi

Next: Parallel Program Performance Analysis

Next week no lecture.Home Work1 (Due 2/4/02, submit .doc file to [email protected] with subject: 669 HW1)Reading: Textbook chapters 1-4.Problems:

1. What is the most likely performance bottleneck of an SPP machine? Explain.

2. Why the unidirectional ring? Explain.3. Is it possible to build an SPP system using cluster of

PCs? How? What would you propose to make Synergy a true SPP system? Justify.

4. Compare SMP (symmetric multiprocessor) with SPP. Explain pros and cons. Are they compatible?

5. Compare SPP with Massively Parallel Processors. Explain pros and cons. Restrict discussion at architecture level.

6. Design a stateless matrix multiplication system. How many programs do you need? Explain.How many forms of parallelisms can you find?