48
IPPS 98 1 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley http://now.cs.berkeley.edu

IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 1

What’s So Different about Cluster Architectures?

David E. Culler

Computer Science Division

U.C. Berkeley

http://now.cs.berkeley.edu

Page 2: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 2

High Performance Clusters “happen”

• Many groups have built them.

• Many more are using them.

• Industry is running with it

– Virtual Interface Architecture

– System Area Networks

• A powerful, flexible new design technique

Page 3: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 3

Outline

• Quick “guided tour” of Clusters at Berkeley

• Three Important Advances

=> Virtual Networks Alan Mainwaring

=> Implicit Co-scheduling Andrea Arpaci-Dusseau

=> Scalable I/O Remzi Arpaci-Dusseau

• What it means

Page 4: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 4

Stop 1: HP/fddi Prototype

• FDDI on the HP/735 graphics bus.

• First fast msg layer on non-reliable network

Page 5: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 5

Stop 2: SparcStation NOW

• ATM was going to take over the world.

The original INKTOMI

Page 6: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 6

Stop 3: Large Ultra/Myrinet NOW

Page 7: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 7

Stop 4: Massive Cheap Storage

•Basic unit:

2 PCs double-ending four SCSI chains

Currently serving Fine Art at http://www.thinker.org/imagebase/

Page 8: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 8

Stop 5: Cluster of SMPs (CLUMPS)

• Four Sun E5000s– 8 processors

– 3 Myricom NICs

• Multiprocessor, Multi-NIC, Multi-Protocol

– see S. Lumetta IPPS98

Page 9: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 9

Stop 6: Information Servers

• Basic Storage Unit:– Ultra 2, 300 GB raid, 800 GB

tape stacker, ATM

– scalable backup/restore

• Dedicated Info Servers– web,

– security,

– mail, …

• VLANs project into dept.

Page 10: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 10

Stop 7: Millennium PC Clumps

• Inexpensive, easy to manage Cluster

• Replicated in many departments

• Prototype for very large PC cluster

Page 11: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 11

So What’s So Different?

• Commodity parts?

• Communications Packaging?

• Incremental Scalability?

• Independent Failure?

• Intelligent Network Interfaces?

• Complete System on every node– virtual memory

– scheduler

– files

– ...

Page 12: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 12

Three important system design aspects

• Virtual Networks

• Implicit co-scheduling

• Scalable File Transfer

Page 13: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 13

Communication Performance Direct Network Access

• LogP: Latency, Overhead, and Bandwidth

• Active Messages: lean layer supporting programming models

0

2

4

6

8

10

12

14

16

µs

gLOrOs

Latency 1/BW

Page 14: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 14

General purpose requirements

• Many timeshared processes– each with direct, protected access

• User and system

• Client/Server, Parallel clients, parallel servers– they grow, shrink, handle node failures

• Multiple packages in a process– each may have own internal communication layer

• Use communication as easily as memory

Page 15: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 15

Virtual Networks

• Endpoint abstracts the notion of “attached to the network”

• Virtual network is a collection of endpoints that can name each other.

• Many processes on a node can each have many endpoints, each with own protection domain.

Page 16: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 16

Process 3

How are they managed?

• How do you get direct hardware access for performance with a large space of logical resources?

• Just like virtual memory– active portion of large logical space is bound to physical

resources

Process n

Process 2Process 1

***

HostMemory

Processor

NICMem

Network Interface

P

Page 17: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 17

Endpoint Transition Diagram

COLDPaged Host Memory

WARMR/O

Paged Host Memory

HOTR/W

NIC Memory

Read

Evict

Swap

WriteMsg Arrival

Page 18: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 18

Network Interface Support

• NIC has endpoint frames

• Services active endpoints

• Signals misses to driver– using a system endpont

Frame 0

Frame 7

Transmit

Receive

EndPoint Miss

Page 19: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 19

Solaris System Abstractions

Segment Driver• manages portions of an address space

Device Driver• manages I/O device

Virtual Network Driver

Page 20: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 20

LogP Performance

• Competitive latency

• Increased NIC processing

• Difference mostly– ack processing

– protection check

– data structures

– code quality

• Virtualization cheap0

2

4

6

8

10

12

14

16

gam AM gam AM

µs

gLOrOs

Page 21: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 21

Bursty Communication among many

Client

Client

Client

ServerServerServer

Msgburst work

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 2 4 6 8 10 12 14 16

Clients

Ms

gs

/Se

c

0

10000

20000

30000

40000

50000

60000

70000

0 5 10 15 20

Clients

Bu

rst

Ba

nd

wid

th

(Ms

g/S

ec

)

Page 22: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 22

Multiple VN’s, Single-thread Server

0

10000

20000

30000

40000

50000

60000

70000

80000

1 4 7 10 13 16 19 22 25 28

Number of virtual networks

Ag

gre

ga

te m

sgs/

scontinuous

1024 msgs

2048 msgs

4096 msgs

8192 msgs

16384 msgs

Page 23: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 23

Multiple VNs, Multithreaded Server

0

10000

20000

30000

40000

50000

60000

70000

80000

1 4 7 10 13 16 19 22 25 28

Number of virtual networks

Ag

gre

ga

te m

sgs/

scontinuous

1024 msgs

2048 msgs

4096 msgs

8192 msgs

16384 msgs

Page 24: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 24

Perspective on Virtual Networks

• Networking abstractions are vertical stacks– new function => new layer

– poke through for performance

• Virtual Networks provide a horizontal abstraction– basis for build new, fast services

Page 25: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 25

Beyond the Personal Supercomputer

• Able to timeshare parallel programs – with fast, protected communication

• Mix with sequential and interactive jobs

• Use fast communication in OS subsystems– parallel file system, network virtual memory, …

• Nodes have powerful, local OS scheduler

• Problem: local schedulers do not know to run parallel jobs in parallel

Page 26: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 26

Local Scheduling

• Schedulers act independently w/o global control

• Program waits while trying communicate with its peers that are not running

• 10 - 100x slowdowns for fine-grain programs!

=> need coordinated scheduling

A AAB

BC

A

A

AA

B C

B

C

Time

P1 P2 P3 P4

A

C

Page 27: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 27

Explicit Coscheduling

• Global context switch according to precomputed schedule

• How do you build it? Does it work?

A A AA

B CB C

A A AA

B CB C

TimeP1 P2 P3 P4

Master

Page 28: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 28

Typical Cluster Subsystem Structures

A

LS

A A

LS

A

LS

A

LS

A

Master

A

LS

A

GS

A

LS

GS

A

LS

A

GS

LS

A

GS

Local service

Applications

Communication

Global Service

Communication

Communication

Master-Slave

Peer-to-Peer

Page 29: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 29

Ideal Cluster Subsystem Structure

• Obtain coordination without explicit subsystem interaction, only the events in the program

– very easy to build

– potentially very robust to component failures

– inherently “service on-demand”

– scalable

• Local service component can evolve.

A

LS

A

GS

A

LS

GS

A

LS

A

GS

LS

A

GS

Page 30: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 30

Three approaches examined in NOW

• GLUNIX explicit master-slave (user level)– matrix algorithm to pick PP

– uses stops & signals to try to force desired PP to run

• Explicit peer-peer scheduling assist with VNs– co-scheduling daemons decide on PP and kick the solaris

scheduler

• Implicit– modify the parallel run-time library to allow it to get itself co-

scheduled with standard scheduler

A

LS

A A

LS

A

LS

A

LS

A

M

A

LS

A

GS

A

LS

GS

A

LS

A

GS

LS

A

GS

A

LS

A

GS

A

LS

GS

A

LS

A

GS

LS

A

GS

Page 31: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 31

Problems with explicit coscheduling

• Implementation complexity

• Need to identify parallel programs in advance

• Interacts poorly with interactive use and load imbalance

• Introduces new potential faults

• Scalability

Page 32: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 32

Why implicit coscheduling might work

• Active message request-reply model

• Infer non-local state from local observations; react to maintain coordination

observation implication action

fast response partner scheduled spin

delayed response partner not scheduled block

WS 1 Job A Job A

WS 2 Job B Job A

WS 3 Job B Job A

WS 4 Job B Job A

sleep

spin

request response

Page 33: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 33

Obvious Questions

• Does it work?

• How long do you spin?

• What are the requirements on the local scheduler?

Page 34: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 34

How Long to Spin?

• Answer: round trip time + 5 x wake-up time

– round-trip to stay scheduled together

– plus wake-up to get scheduled together

– plus wake-up to be competitive with blocking cost

– plus 3 x wake-up to meet “pairwise” cost

Job A

Job B

Spin-Wait

WS 1

WS 2

Job A

Job C

Job AWakeup

Job C

Spin-Wait Sleep

Job B

2L+4o2L+4o+W

Page 35: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 35

Does it work?

Page 36: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 36

Synthetic Bulk-synchronous Apps

• Range of granularity and load imbalance– spin wait 10x slowdown

Page 37: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 37

With mixture of reads

• Block-immediate 4x slowdown

Page 38: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 38

Timesharing Split-C Programs

Page 39: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 39

Many Questions

• What about – mix of jobs?

– sequential jobs?

– unbalanced placement?

– Fairness?

– Scalability?

• How broadly can implicit coordination be applied in the design of cluster subsystems?

Page 40: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 40

A look at Serious File I/O

• Traditional I/O system

• NOW I/O system

• Benchmark Problem: sort large number of 100 byte records with 10 byte keys

– start on disk, end on disk

– accessible as files (use the file system)

– Datamation sort: 1 million records

– Minute sort: quantity in a minute

Proc-Mem

P-M P-M P-M P-M

Page 41: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 41

NOW-Sort Algorithm: 1 pass

• Read – N/P records from disk -> memory

• Distribute – send keys to processors holding result buckets

• Sort– partial radix sort on each bucket

• Write– gather and write records to disk

Page 42: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 42

Key Implementation Techniques

• Performance Isolation: highly tuned local disk-to-disk sort

– manage local memory

– manage disk striping

– memory mapped I/O with m-advise, buffering

– manage overlap with threads

• Efficient Communication– completely hidden under disk I/O

– competes for I/O bus bandwidth

• Self-tuning Software– probe available memory, disk bandwidth, trade-offs

Page 43: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 43

World-Record Disk-to-Disk Sort

• Sustain 500 MB/s disk bandwidth and 1,000 MB/s network bandwidth

Minute Sort

SGI Power Challenge

SGI Orgin

0123456789

0 10 20 30 40 50 60 70 80 90 100

Processors

Gig

abyt

es s

orted

Page 44: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 44

Towards a Cluster File System

• Remote disk system built on a virtual networkR

ate

(MB

/s)

LocalRemote

5.0

6.0

Read Write

CP

U U

tiliz

atio

n

Read Write0%

40%

20%

client

server

Client

RDlibRD server

Activemsgs

Page 45: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 45

Streaming Transfer Experiment

P P P P

0 1 2 3

0 1 2 3

Loca

lP

3F

S L

oca

l

P3F

S R

ever

seP

3F

S R

emot

eP P P P

0 1 2 3

0 1 2 3

P P P P

3 2 1 0

0 1 2 3

P P P P

0 1 2 3

0 1 2 3

Page 46: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 46

Results

• Data distribution affects resource utilization

• Not delivered bandwidth

LocalP3FS LocalP3FS ReverseP3FS Remote

Rat

e (M

B/s

)

5.0

6.0

Access Method

CP

U U

tiliz

atio

n0%

40%

Access Method

20%

client

server

Page 47: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 47

I/O Bus crossings

M

P

NI

M

P

NI

M

P

NI

M

P

NI

Parallel Scan Parallel Sort

(a) local disk (b) remote disk (a) local disk (b) remote disk

Page 48: IPPS 981 What’s So Different about Cluster Architectures? David E. Culler Computer Science Division U.C. Berkeley

IPPS 98 48

Conclusions

• Complete system on every node makes clusters a very powerful architecture.

• Extend the system globally– virtual memory systems,

– schedulers,

– file systems, ...

• Efficient communication enables new solutions to classic systems challenges.

• Opens a rich set of issues for parallel processing beyond the personal supercomputer.