20
Evolution of High Performance Cluster Architectures David E. Culler [email protected] http://millennium.berkeley.ed u/ NPACI 2001 All Hands Meeting

Evolution of High Performance Cluster Architectures

  • Upload
    beau

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Evolution of High Performance Cluster Architectures. David E. Culler [email protected] http://millennium.berkeley.edu/ NPACI 2001 All Hands Meeting. Much has changed since “NOW”. inktomi.berkeley.edu. NOW 110 UltraSparc +Myrinet. NOW1 SS+ATM/Myrinet. NOW0 HP+medusa FDDI. - PowerPoint PPT Presentation

Citation preview

Page 1: Evolution of High Performance Cluster Architectures

Evolution of High Performance Cluster

Architectures

David E. Culler

[email protected]

http://millennium.berkeley.edu/

NPACI 2001 All Hands Meeting

Page 2: Evolution of High Performance Cluster Architectures

Much has changed since “NOW”

NOW0 HP+medusa FDDI

NOW1 SS+ATM/Myrinet

NOW 110 UltraSparc +Myrinet

inktomi.berkeley.edu

Page 3: Evolution of High Performance Cluster Architectures

Millennium Cluster Editions

Page 4: Evolution of High Performance Cluster Architectures

The Basic Argument

• performance cost of engineering lag– miss the 2x per 18 months

– => rapid assembly of leading edge HW and SW building blocks

– => availability through fault masking, not inherent reliability

• emergence of the “killer switch”

• opportunities for innovation– move data between as fast as within machine

– protected user-level communication

– large-scale management

– fault isolation

– novel applications

Page 5: Evolution of High Performance Cluster Architectures

Clusters Took Off

• scalable internet services– only way to match growth rate

• changing supercomputer market

• web hosting

Page 6: Evolution of High Performance Cluster Architectures

Engineering the Building Block

• argument came full circle in ~98

• wide-array of 3U, 2U, 1U rack-mounted servers

– thermals and mechanicals

– processing per square-foot

– 110 AC routing a mixed blessing

– component OS & drivers

• became the early entry to the market

Page 7: Evolution of High Performance Cluster Architectures

Emergence of the Killer Switch

• ATM, Fiberchannel, FDDI “died”

• ServerNet bumps along

• IBM, SGI do the proprietary thing

• little Myrinet just keeps going– quite nice at this stage

• SAN standards shootout– NGIO + FutureIO => Infiniband– specs entire stack from phy to api

» nod to IPv6

– big, complex, deeply integrated, DBC

• Gigabit EtherNet steamroller...– limited by TCP/IP stack, NIC, and cost

Page 8: Evolution of High Performance Cluster Architectures

Opportunities for Innovation

Page 9: Evolution of High Performance Cluster Architectures

Unexpected Breakthru: layer-7 switches

• fell out of modern switch design– process packets in chunks

• vast # of simultaneous connections

• many line-speed packet filters per port

• can be made redundant

• => multi-gigabit cluster “front end”– virtualize IP address of services

– move service within cluster

– replicate it, distribute it

high-level xforms fail-over, load management Layer-7SwitchLayer-7Switch

NetworkNetwork

SwitchSwitch

Page 10: Evolution of High Performance Cluster Architectures

e-Science

any useful app should be a service

Page 11: Evolution of High Performance Cluster Architectures

Protected User-level messaging

Virtual Interface Architecture (VIA) emerged primitive & complex relative to academic prototypes industrial compromise went dormant

Incorporated in Infiniband big one to watch

Potential breakthrough user-level TCP, UDP with IP NIC storage over IP

Page 12: Evolution of High Performance Cluster Architectures

Management

• workstation -> PC transition a step back

– boot image distribution, OS distribution

– network troubleshoot and service

• multicast proved a powerful tool

• emerging health monitoring and control

– HW level

– service level

– OS level still a problem

Page 13: Evolution of High Performance Cluster Architectures

Rootstock

Local Local Rootstock Rootstock

ServerServer

InternetInternet

Rootstock Rootstock ServerServer Local Local

Rootstock Rootstock ServerServer

Local Local Rootstock Rootstock

ServerServer

UC BerkeleyUC Berkeley

Page 14: Evolution of High Performance Cluster Architectures

Ganglia and REXEC

rexecd

rexecd

rexecd

rexecd

vexecd(Policy A)

rexec

Cluster IP Multicast Channel

%rexec –n 2 –r 3 indexer

minimum $

vexecd(Policy B)

Node A Node B Node C Node D

“Nodes AB”

Also: bWatch BPROC: Beowulf Distributed Process Space VA Linux Systems: VACM, VA Cluster Manager

Page 15: Evolution of High Performance Cluster Architectures

Network Storage

• state-of-practice still NFS + local copies

• local disk replica management lacking

• NFS doesn’t scale– major source of naive user frustration

• limited structured parallel access

• SAN movement only changing the device interface

• Need cluster content distribution, caching, parallel access and network striping

see: GPFS, CFS, PVFS, HPSS, GFS,PPFS,CXFS, HAMFS,Petal, NASD...

Page 16: Evolution of High Performance Cluster Architectures

Distributed Persistent Data Structure Alternative

Service

DDS lib

Storage

“brick”

Service

DDS lib

Service

DDS lib

Storage

“brick”

Storage

“brick”

Storage

“brick”

Storage

“brick”

Storage

“brick”

System Area Network

Clustered

Service DistrHash tableAPI

Single-nodedurablehash table

Redundantlow latencyhigh xputnetwork

Page 17: Evolution of High Performance Cluster Architectures

Scalable Throughput

100

1000

10000

100000

1 10 100 1000

# of DDS bricks

ma

x t

hro

ug

hp

ut

(op

s/s)

reads

writes

(128,13582)

(128,61432)

Page 18: Evolution of High Performance Cluster Architectures

“Performance Available” Storage

D A

D A

D A

D A

Static Parallel Aggregation

D A

D A

D A

D ADis

trib

ute

d Q

ue

ue

Adaptive Parallel Aggregation

0%10%20%30%40%50%60%70%80%90%

100%

0 5 10 15

Nodes

% o

f P

ea

k I/

O R

ate

Adpative Agr.

Static Agr.

0%10%20%30%40%50%60%70%80%90%

100%

0 5 10 15

Nodes Perturbed

% o

f P

ea

k I/

O R

ate

Adpative Agr.

Static Agr.

Page 19: Evolution of High Performance Cluster Architectures

Application Software

• very little movement towards harnessing architectural potential

• application as service– process stream of requests (not shell or batch)

– grow & shrink on demand

– replication for availability

» data and functionality

– tremendous internal bandwidth

• outer-level optimizations, not algorithmic

Page 20: Evolution of High Performance Cluster Architectures

Time is NOW

finish the system area networktackle the cluster I/O problemcome together around management

toolsget serious about application services