79
The Grid as a The Grid as a Parallel Computer Parallel Computer Francis C.M. Lau Department of Computer Scienc e The University of Hong Kong www.cs.hku.hk/~fcmlau

The Grid as a Parallel Computer

  • Upload
    ebony

  • View
    27

  • Download
    4

Embed Size (px)

DESCRIPTION

The Grid as a Parallel Computer. Francis C.M. Lau Department of Computer Science The University of Hong Kong www.cs.hku.hk/~fcmlau. Greetings from Hong Kong!. Systems research @ HKU. www.srg.cs.hku.hk. hkgrid.org. HKGrid – the initial setup (2004). www.cngrid.org. - PowerPoint PPT Presentation

Citation preview

Page 1: The Grid as a Parallel Computer

The Grid as a Parallel The Grid as a Parallel ComputerComputerFrancis C.M. Lau

Department of Computer ScienceThe University of Hong Kong

www.cs.hku.hk/~fcmlau

Page 2: The Grid as a Parallel Computer

Greetings from Hong Kong!

Systems research @ HKU

Page 3: The Grid as a Parallel Computer

www.srg.cs.hku.hk

Page 4: The Grid as a Parallel Computer
Page 5: The Grid as a Parallel Computer

hkgrid.org

Page 6: The Grid as a Parallel Computer

HKGrid – the initial setup (2004)

Page 7: The Grid as a Parallel Computer
Page 8: The Grid as a Parallel Computer

www.cngrid.org

Page 9: The Grid as a Parallel Computer
Page 10: The Grid as a Parallel Computer
Page 11: The Grid as a Parallel Computer
Page 12: The Grid as a Parallel Computer

The 500th machine at 11/2005 has a peak of 2.9 Tflops

The 1st (DOE/BlueGene) has 0.37 Pflops

The 500th machine at 11/2005 has a peak of 2.9 Tflops

The 1st (DOE/BlueGene) has 0.37 Pflops

Page 13: The Grid as a Parallel Computer

(11/2002)

Page 14: The Grid as a Parallel Computer
Page 15: The Grid as a Parallel Computer
Page 16: The Grid as a Parallel Computer

Agenda

Parallel computing state of affairs

Parallel computing many faces

Grid as a parallel computer

Our first attempt – G-JavaMPI

Some thoughts for the future

Page 17: The Grid as a Parallel Computer

The State of High Performance Computing

Page 18: The Grid as a Parallel Computer

“Oxen vs. chickens”

• “If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?” - Seymour Cray (’25–’96)

• Your choice?

Page 19: The Grid as a Parallel Computer
Page 20: The Grid as a Parallel Computer

Will time tell?

• “At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers … Most cluster [experts] know now that users are fortunate to get more than 8% of the peak performance in sustained performance.” - Dr. Paul  Terry, CTO, Cray Canada, 2004

Page 21: The Grid as a Parallel Computer

Never to predict the future?

• “No one will need more than 640 kb of memory for a personal computer.” (Bill Gates, 1981, wrongly attributed?)

Page 22: The Grid as a Parallel Computer

You need cpu, cpu, …

Subramanian, 1999

Software complexity

Page 23: The Grid as a Parallel Computer
Page 24: The Grid as a Parallel Computer
Page 25: The Grid as a Parallel Computer

Is Grid New?

Page 26: The Grid as a Parallel Computer

Many faces of “parallel” computing

• Distributed computing (DC)– Multiple computers remote from each other, each having a role

in a computation problem– Loose parallelism

• Cluster computing (CC)– DC on a LAN, with homogeneous processing nodes (typically

PCs), to form what appears to be a single, highly-available system

• Grid computing (GC)– A potentially very large DC operating

as an anarchy– As large as the Internet/WWW– Parallelism at stake?

Page 27: The Grid as a Parallel Computer

• Cluster: chicken farm

• Grid: animal zoo– Enterprise grid: a private zoo in the backyard

• Distributed system: a “static” zoo where the animals are tame

Page 28: The Grid as a Parallel Computer

From cluster to grid

• One of the main ideas of cluster computing is that, to the outside world, the cluster appears to be a single system, which is also the reason for clustering’s extreme successes

• A cluster can be programmed like a single computer, almost

• Can a grid? Should a grid?

Page 29: The Grid as a Parallel Computer

Grid vs. service oriented computing

• To many, the two are almost synonymous– Just as Web and the Internet are almost synonymous

• SOC refers to binding to Web services at runtime– Grid is about the provisioning of resources– The current grid’s use of Web services was out of

convenience (my opinion)– But the service paradigm should

not be the only possible form ofcomputing with the grid

Page 30: The Grid as a Parallel Computer

• You want a hamburger– you can either go toMacdonalds or do it yourself

• SOC applied to the Web (as a grid) is probably best for commercial applications (Macdonalds)

• For scientific or grand challenge problems, we need to program the grid (DIY)

Page 31: The Grid as a Parallel Computer

The Grid as a Computer?

Page 32: The Grid as a Parallel Computer

• Cluster: more nodes than microprocessors in each node (MPI)

• Constellation: A node has more microprocessors than # nodes (OpenMP)

• Tightly integrated MPP

• Grid?

Page 33: The Grid as a Parallel Computer

Grid vs. clustering

• Grid: heterogeneous resources (computation, storage, networking, OS, etc.)

• Grid: dynamic (resources come and go)• Grid: distributed over a local or wide area• Grid: increased scalability (no

latency/proximity limits)• Grid: multiple ownerships• Grid and cluster are complementary

Page 34: The Grid as a Parallel Computer

Issues

• Heterogeneity

• Availability

• Latencies

• Security and trustworthiness

• Load balancing!

• Towards single system image (SSI)

Grid: heterogeneous resources (computation, storage, networking, OS)Grid: dynamic (resources come and go)Grid: distributed over a local or wide areaGrid: increased scalability (no latency/proximity limits)Grid: multiple ownershipsGrid and cluster are complementary

Page 35: The Grid as a Parallel Computer

Load Balancing is Key

Page 36: The Grid as a Parallel Computer

Parallel applications

• Multiple processes, multiple threads

• Application types– SIMD (Single Instruction, Multiple Data)

• SPMD (Single program, multiple data)

– MIMD (Multiple Instruction, Multiple Data)

Page 37: The Grid as a Parallel Computer

MIMD

Page 38: The Grid as a Parallel Computer

Need for process/thread migration

• SIMD: Remapping (re-partitioning) of data works

• For MIMD, “processes” might grow or shrink, or come and go– Remapping of processes = process migration– Processes with large footprints (i.e., many

threads) might benefit from spreading their threads across machines

Page 39: The Grid as a Parallel Computer

• Process migration– Initially (load distribution)– Dynamic– State capture and resume

• Thread migration– Threads are often tightly coupled and share m

uch data– Beneficial?– A big challenge

Page 40: The Grid as a Parallel Computer

Sidetrack: Thread Migration

Page 41: The Grid as a Parallel Computer

Thread migration works!

• Probably not suitable for grid, fine for cluster where latencies are upper-bounded

• Our experience: the JESSICA2 system– A distributed JVM– Dynamic Java thread migration– JIT compilation– Global object space– I/O redirection

JavaEnabledSingleSystemImageComputingArchitecture

Page 42: The Grid as a Parallel Computer
Page 43: The Grid as a Parallel Computer

JESSICA2 Architecture

Thread Migration

Global Object Space

JESSICA2JVM

A Multithreaded Java Program

JESSICA2JVM

JESSICA2JVM

JESSICA2JVM

JESSICA2JVM

JESSICA2JVM

Master Worker Worker Worker Worker Worker

JIT Compiler ModePortable Java Frame

Page 44: The Grid as a Parallel Computer
Page 45: The Grid as a Parallel Computer
Page 46: The Grid as a Parallel Computer
Page 47: The Grid as a Parallel Computer

G-JavaMPI

Towards “grid as a parallel computer”

Page 48: The Grid as a Parallel Computer

• M-JavaMPI– “M” stands for mi

gration– For cluster

• G-JavaMPI– An outgrowth of

M-JavaMPI– “G” for grid

Page 49: The Grid as a Parallel Computer

G-JavaMPI

Organization Organization

Identity mapping

Policy space

Warranted

Task migration (Grid traveler)

A Grid Middleware for Transparent MPI Task Migration and Runtime Scheduling

Page 50: The Grid as a Parallel Computer

• Grid-enabled implementation of the Java language bindings of the MPI v1.1 standard

• On top of Globus Toolkit (e.g., job startup, security) and MPICH-G2 (MPI communication)

• Combining the high-level message passing interface with the Java language to support portable messaging-passing programming in a grid

• It allows you to run MPI applications written in Java across multiple machines with different architectures belonging to multiple organizations

• Classes of problems implemented in C-MPI (for example, MPICH) can be easily ported to G-JavaMPI, but with additional support of process migration

• A better choice for those people who enjoy object-oriented programming style more

• Grid-enabled implementation of the Java language bindings of the MPI v1.1 standard

• On top of Globus Toolkit (e.g., job startup, security) and MPICH-G2 (MPI communication)

• Combining the high-level message passing interface with the Java language to support portable messaging-passing programming in a grid

• It allows you to run MPI applications written in Java across multiple machines with different architectures belonging to multiple organizations

• Classes of problems implemented in C-MPI (for example, MPICH) can be easily ported to G-JavaMPI, but with additional support of process migration

• A better choice for those people who enjoy object-oriented programming style more

Page 51: The Grid as a Parallel Computer

Special features

• Transparent dynamic process migration– Load balancing– Fault tolerance– Resource co-allocation

• Fine-grain access control through delegation– Multi-hop delagation– Cross-organization resource sharing

Page 52: The Grid as a Parallel Computer

G-PASS

• Globus operates at the level of users, G-PASS at the level of processes

• A process can be migrated multiple times across multiple grid nodes

• The process (a “traveler”) obtains his/her privileges via a security instance (the “passport”) instead of from the hosts

• Permission to access a resource in the destination host is granted by simply checking the signature in the security instance

Page 53: The Grid as a Parallel Computer

Instance-oriented delegation

Page 54: The Grid as a Parallel Computer

GSI = Grid Security Infrastructure

Page 55: The Grid as a Parallel Computer

Main components of G-JavaMPI

Page 56: The Grid as a Parallel Computer

Runtime analysis

• Based on JVMTI (Tool Interface) – dynamically add instrument code in Java bytecode

• Identify the execution hotspots in the process

• Analyze process synchronization relationships for per-process computational requirement, and communication workload

Page 57: The Grid as a Parallel Computer
Page 58: The Grid as a Parallel Computer

Dynamic instrumentation

hotspothotspot

Page 59: The Grid as a Parallel Computer

Communication performance

Page 60: The Grid as a Parallel Computer

JMPI-BLAST cost breakdown

Page 61: The Grid as a Parallel Computer
Page 62: The Grid as a Parallel Computer
Page 63: The Grid as a Parallel Computer

Ray tracing experiment

Page 64: The Grid as a Parallel Computer

Message passing daemons

• Manage messages in queues

• Send/receive messages on behalf of processes

• Support multiple simultaneous applications

• Profiling of communication behaviors

Daemon Daemon Daemon Daemon

MPICH-G2

Gridnode

Gridnode

Gridnode

Gridnode

Messaging

Page 65: The Grid as a Parallel Computer

Migration

• Capture process status through JVMDI (JVMTI in latest Sun J2SE 1.5)

• Recognize branching code in Java bytecode, find appropriate location to stop execution

• Recognize file operations• Instrumentation of

migrationexceptionhandlers

Process

JVMDI

File File

JVMDI

Process

Status

dump

Status

restoration

ProcessMigration

Page 66: The Grid as a Parallel Computer

Frames and Runtime States Restoration

Page 67: The Grid as a Parallel Computer

Migration in action

Page 68: The Grid as a Parallel Computer

Migration-transparent message passing

• Mapping virtual process ranks to physical locations in location tables

• Processes during message passing not allowed to migrate

• Sequencing the messages, collecting legacy messages in previous node, re-sending them to new node

Page 69: The Grid as a Parallel Computer

•N-body simulation (body shape: loop, 10000 bodies, 16 processes)•Periodical random process migrations, for demo purpose

Page 70: The Grid as a Parallel Computer

•Ray tracing application on CNGrid•The scheduler periodically checks the workload in grid nodes and moves some processes to idle nodes•The Java applet displays the result and migration information

Page 71: The Grid as a Parallel Computer

To find out more

• L. Chen, T.C. Ma, C.L. Wang, F.C.M. Lau, and S.P. Li, “G-JavaMPI: A Grid Middleware for Transparent MPI Task Migration”, in Engineering the Grid: Status and Perspective, American Scientific Publishers, 2006, to appear.

• T.C. Ma, C.L. Wang, L. Chen, and F.C.M. Lau, “G-PASS: An Instance-oriented Security Infrastructure for Grid Travelers”, Concurrency and Computation: Practice and Experience, to appear.

http://www.cs.hku.hk/~lchen2/G-JavaMPI/

Page 72: The Grid as a Parallel Computer

The Future of Grid

Page 73: The Grid as a Parallel Computer
Page 74: The Grid as a Parallel Computer

What next?

• Grid computing today is like a “pot luck” supper– Everyone brings and

contributes a dish– And … surprise!

• There really is “no free lunch”– Everyone shares some of the

costs– Is it worth it?

POTLUCK DINNER

Page 75: The Grid as a Parallel Computer

• To minimize the “surprises” (quality of service)– Let the pros - the chefs - do it– You sit back, relax, and enjoy, and pay for and only for

what you consume

• Grid now is a private club• But eventually it should be like …

– Ubiquitous– Invisible (the machinery behind)– It’s my “cup of coffee”

Page 76: The Grid as a Parallel Computer

The “pervasive grid” – everyone’s club

The grid (invisible computing)

Thin clients“To use a computer

is fun, but not to manage it”

Page 77: The Grid as a Parallel Computer

Edge computing

• Person … device … middleware (proxies) ... Internet

• The abstract cloud moves with the client – personalized “cuddleware”, nomadic computing

Internetproxies united

client

metropolis

Page 78: The Grid as a Parallel Computer

Problems worth pursuing

• Edge computing → “seamless”– New protocols for the edge

• The continuum → the network is the computer– Collaborative models and mechanisms, esp. at the edge

• The global grid → invisible, “PC” disappearing• The device

– Adaptation• On-demand code composition

– The SOC approach?

• Content– HTML

• UI description languages

– New paradigms for user interaction in small devices (input and output)

Page 79: The Grid as a Parallel Computer