Copyright Gordon Bell LANL 5/17/2002 Technical computing: Observations on an ever changing, occasionally repetitious, environment Los Alamos National Laboratory

Copyright Gordon Bell LANL 5/17/2002Copyright Gordon Bell LANL 5/17/2002

Technical computing: Observations on an ever changing, occasionally repetitious, environment

Los Alamos National Laboratory

17 May 2002

A brief, simplified history of HPC1. Sequential & data parallelism using shared memory, Cray’s Fortran

computers 60-02 (US:90) 2. 1978: VAXen threaten general purpose centers…3. NSF response: form many centers 1988 - present

4. SCI: Search for parallelism to exploit micros 85-95 5. Scalability: “bet the farm” on clusters.

Users “adapt” to clusters aka multi-computers with LCD program model, MPI. >95

6. Beowulf Clusters adopt standardized hardware and Linus’s software to create a standard! >1995

7. “Do-it-yourself” Beowulfs impede new structures and threaten g.p. centers >2000

8. 1997-2002: Let’s tell NEC they aren’t “in step”.

9. High speed networking enables peer2peer computing and the Grid. Will this really work?


Outline

Retracing scientific computing evolution: Cray, SCI & “killer micros”, ASCI, & Clusters kick in.

Current taxonomy: clusters flavors deja’vu rise of commodity computng:

Beowulfs are a replay of VAXen c1978 Centers: 2+1/2 at NSF;

BRC on CyberInfrastructure urges 650M/year Role of Grid and Peer-to-peer Will commodities drive out or enable new ideas?


DARPA SCI: c1985-1995;prelude to DOE’s ASCI

Motivated by Japanese 5th Generation … note the creation of MCC

Realization that “killer micros” were Custom VLSI and its potential Lots of ideas to build various high

performance computers Threat and potential sale to military


Steve Squires & G Bell at our “Cray” at the start of DARPA’s SCI c1984.

What Is the SystemArchitecture?(GB c1990)

MIMD

Multiprocessors Single Address Space Shared Memory Computation

Multicomputers Multiple Address Space Message Passing Computation

Central Memory Multiprocessors (not scalable)

Distributed Memory Multiprocessors (scalable)

Dynamic Binding of addresses to processors KSR

Static Run-time Binding research machines

Bus multis DEC, Encore, NCR, ... Sequent, SGI,Sun

Cross-point or Multi-stage Cray, Fujitsu, Hitachi, IBM, NEC, Tera

Distributed Multicomputers (scalable) Switch connected

IBM

Mesh Connected Intel

Fast LANs for High Availability and High Capacity Clusters DEC, Tandem

LANs for Distributed Processing Workstations, PCs

Butterfly/Fat Tree/Cubes CM5, NCUBE

Static binding, Ring multi IEEE SCI proposalStatic Binding, caching Alliant, DASH

Simple, ring multi ... bus multi replacement

X

X

X

GRID

SIMDX


Processor Architectures?

VECTORS VECTORSOR

CS View

MISC >> CISC >> Language directed

RISC >> Super-scalar >>Extra-Long Instruction Word

Caches: mostly alleviate need for memory B/W

SC Designers View

RISC >> VCISC (vectors)>>

Massively parallel (SIMD) (multiple pipelines)

Memory B/W = perf.


The Bell-Hillis Bet c1991Massive (>1000) Parallelism in 1995

TMC

World-wide

Supers

TMC

World-wide Supers

TMC

World-wideSupers

ApplicationsRevenue

Petaflops / mo.


Results from DARPA’s SCI c1983 Many research and construction efforts … virtually all

new hardware efforts failed except Intel and Cray. DARPA directed purchases… screwed up the market,

including the many VC funded efforts. No Software funding! Users responded to the massive power potential with LCD

software. Clusters, clusters, clusters using MPI. It’s not scalar vs vector, its memory bandwidth!

– 6-10 scalar processors = 1 vector unit– 16-64 scalars = a 2 – 6 processor SMP

Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1

Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics


What a difference 25 years AND spending >10x makes!

LLNL 150 Mflops machine room c1978

ESRDC: 40 Tflops. 640 nodes (8 - 8GFl P.vec/node)


Computer types

NetwrkedSupers…

LegionCondor Beowulf NT clusters

VPPuni

T3E SP2(mP) NOW

NEC mP

SGI DSM clusters &SGI DSM

NEC super Cray X…T(all mPv)

MainframesMultis

WSs PCs

-------- Connectivity--------

WAN/LAN SAN DSM SM

mic

ros

v

ecto

r

ClustersGRID& P2P

OldWorld


Top500 taxonomy… everything is a cluster aka multicomputer

Clusters are the ONLY scalable structure– Cluster: n, inter-connected computer nodes operating as one

system. Nodes: uni- or SMP. Processor types: scalar or vector. MPP= miscellaneous, not massive (>1000), SIMD or

something we couldn’t name Cluster types. Implied message passing.

– Constellations = clusters of >=16 P, SMP– Commodity clusters of uni or <=4 Ps, SMP– DSM: NUMA (and COMA) SMPs and constellations– DMA clusters (direct memory access) vs msg. pass– Uni- and SMPvector clusters:

Vector Clusters and Vector Constellations


Linux - a web phenomenon Linus Tovald - writes news reader for his PC Puts it on the internet for others to play Others add to it contributing to open source

software Beowulf adopts early Linux Beowulf adds Ethernet drivers for essentially all

NICs Beowulf adds channel bonding to kernel Red Hat distributes Linux with Beowulf software Low level Beowulf cluster management tools added


The Challenge leading to Beowulf

NASA HPCC Program begun in 1992 Comprised Computational Aero-Science and

Earth and Space Science (ESS) Driven by need for post processing data

manipulation and visualization of large data sets Conventional techniques imposed long user

response time and shared resource contention Cost low enough for dedicated single-user

platform Requirement:

– 1 Gflops peak, 10 Gbyte, < $50K Commercial systems: $1000/Mflops or 1M/Gflops


Inno

vatio

n

The Virtuous Economic Cycle drives the PC industry… & Beowulf

Volum

e

Competition

Standards

Utility/value

DOJ

Greater availability

@ lower cost

Creates apps, tools, training,Attracts users

Attracts suppliers

Lessons from Beowulf

An experiment in parallel computing systems Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of

applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that

allowed apps to form Industry begins to form beyond a research project

Courtesy, Thomas Sterling, Caltech.


Clusters: Next Steps

Scalability… They can exist at all levels:

personal, group, … centers Clusters challenge centers…

given that smaller users get small clusters


Disk Evolution Capacity:100x in 10 years

1 TB 3.5” in 2005 20 TB? in 2012?!

System on a chip High-speed SAN

Disk replacing tape Disk is super computer!

Kilo

Mega

Giga

Tera

Peta

Exa

Zetta

Yotta


Intermediate Step: Shared Logic

Brick with 8-12 disk drives 200 mips/arm (or more)

2xGbpsEthernet General purpose OS 10k$/TB to 100k$/TB Shared

– Sheet metal– Power– Support/Config– Security– Network ports

These bricks could run applications e.g. SQL, Mail…

Snap ~1TB 12x80GB NAS

NetApp ~.5TB 8x70GB NAS

Maxstor ~2TB 12x160GB NAS

IBM TotalStorage~360GB 10x36GB NAS


SNAP Architecture----------

RLX “cluster” in a cabinet 366 servers per 44U cabinet

– Single processor– 2 - 30 GB/computer (24 TBytes)– 2 - 100 Mbps Ethernets

~10x perf*, power, disk, I/O per cabinet

~3x price/perf Network services…

Linux based

*42, 2 processors, 84 Ethernet, 3 TBytes

Computing in small spaces @ LANL(RLX cluster in building with NO A/C)

240 processors @2/3 GFlops

Fill the 4 racks -- gives a Teraflops


Beowulf Clusters: space

Performance/Space Ratio

Bladed Beowulf

ASCI White

Mflops/Sq. Ft.


Beowulf clusters: power

Performance/Power Ratio

Bladed Beowulf

Beowulf

Mflops/Watt


“The networks becomes the system.”- Bell 2/10/82 Ethernet announcement with Noyce (Intel), and Liddle (Xerox)

“The network become the computer.” SUN Slogan >1982

“The network becomes the system.” GRID mantra c1999


ComputingSNAPbuilt entirelyfrom PCs Wide & Local

Area Networksfor: terminal,

PC, workstation,& servers

Centralized& departmental

uni- & mP servers(UNIX & NT)

Legacymainframes &

minicomputersservers & terms

Wide-areaglobal

network

Legacymainframe &

minicomputerservers & terminals

Centralized& departmental

servers buit fromPCs

scalable computers

built from PCs

TC=TV+PChome ...

(CATV or ATM or satellite)

???

Portables

A space, time (bandwidth), & generation scalable environment

Person servers (PCs)

Person servers (PCs)

MobileNets

Telnet & FTP

EMAIL

WWW Audio Video

Voice!Voice!

StandardsStandards

Increase Capacity(circuits & bw)

Lower response time

Create newservice

Increased Demand

The virtuous cycle of bandwidth supply and demand

Incompence ?


Internet II concerns given $0.5B cost

Very high cost– $(1 + 1) / GByte to send on the net;

Fedex and 160 GByte shipments are cheaper– DSL at home is $0.15 - $0.30

Disks cost $1/GByte to purchase! Low availability of fast links (last mile problem)

– Labs & universities have DS3 links at most, and they are very expensive

– Traffic: Instant messaging, music stealing Performance at desktop is poor

– 1- 10 Mbps; very poor communication links

Scalable computing: the effects They come in all sizes; incremental growth

10 or 100 to 10,000 (100X for most users)debug vs run; problem growth

Allows compatibility heretofore impossible1978: VAX chose Cray Fortran1987: The NSF centers went to UNIX

Users chose sensible environment– Acquisition and operational costs & environments– Cost to use as measured by user’s time

The role of gp centers e.g. NSF, statex is unclear. Necessity for support?– Scientific Data for a given community…– Community programs and data– Manage GRIDdiscipline

Are clusters ≈ Gresham’s Law? Drive out alts.


The end

Documents

Copyright Gordon Bell LANL 5/17/2002 Technical computing: Observations on an ever changing, occasionally repetitious, environment Los Alamos National Laboratory