14
Office of Instructional and Research Technology Very large computing and the real world a very few thoughts Eric Marshall Associate Director for Research Technology Rutgers University

high performance computing exposed

Embed Size (px)

Citation preview

Page 1: high performance computing exposed

Office of Instructional and Research Technology

Very large computing and the real world

a very few thoughts

Eric MarshallAssociate Director for Research Technology

Rutgers University

Page 2: high performance computing exposed

Office of Instructional and Research Technology

Shock and awe

Bigger is better!

Page 3: high performance computing exposed

Office of Instructional and Research Technology

The shiny future

• Newer is Better!

Page 4: high performance computing exposed

Office of Instructional and Research Technology

The real world

• Bugs, warts, and the eternal problem of hindsight

Page 5: high performance computing exposed

Office of Instructional and Research Technology

The problem of architecture

• Build as you go vs. predicting the future

Page 6: high performance computing exposed

Office of Instructional and Research Technology

Where do you put and for how long?

• The problem of 2x foot print in the land of 24x7

Page 7: high performance computing exposed

Office of Instructional and Research Technology

Who is expert?

• Is the architect, programmer, scientist, owner, vendor or bottle washer expert? Complex problems are hard.

Page 8: high performance computing exposed

Office of Instructional and Research Technology

“Anyone who understands the system isn’t doing science!”

• The problem of users

Page 9: high performance computing exposed

Office of Instructional and Research Technology

Supercomputers are disposable

• 3 to 5 year ‘shelf life’

Page 10: high performance computing exposed

Office of Instructional and Research Technology

“This system sucks, the last one was better!”(no matter how many systems)

• The problem of transition: porting, change and habits

Page 11: high performance computing exposed

Office of Instructional and Research Technology

Goldlock’s paradox

• The problem of useful use: efficient programming, useful scaling, overhead, keeping track of results, allocation, etc.

Page 12: high performance computing exposed

Office of Instructional and Research Technology

Goldlock’s paradox (cont’d)

• Someone will always say the solution is around around the corner!

Page 13: high performance computing exposed

Office of Instructional and Research Technology

Scaling is deadly

• Scaling problems: OS/SAN/code/people/etc.

Large Scale Cluster (LSC)SGI Origin 3800 + 3900, 600MHz

2 Nodes x 512 PE + 512GB + 2.9TB disk5 Nodes x 256 PE + 256GB + .9TB disk1 Node x 128 PE + 128GB + .9TB disk SAN Bandwidth: 2GB/s per LSC Node

CXFS, PCP, Workshop Pro,GridEngine, S-Plus,TotalView, Matlab, NAG SMP, Mathmatica

Analysis Cluster (ANC) SGI Origin 3900, 600 MHz, 2 Nodes x 96 PE + 96GB + 4.2TB disk

SAN Bandwidth: 2GB/s per ANC NodeGridEngine, CXFS, PCP, Workshop Pro

Tape SAN4 x STK 9310 Tape Libraries24 x 9940B Drives (200GB, 30MB/s)22 x 9840A Drives (20GB, 10MB/s)3.5PB Tape Storage On-Line 1.5PB Off-Line

LANCisco Catalyst 65094 x 16 GbE2 x 48 Fast Ethernet

ANC

SAN (FC) SwitchBrocade 2800 & 3800

Redundant AccessDual-Ported

Fiber ChannelMetaData Server (MDS)

HFS & HSMS ServerSGI Origin 3800, 600 MHz, 2 Nodes x 64 PE + 64GB

Disk SAN: 4GB/s per MDS NodeTape SAN: 1GB/s per MDS Node 2.8TB disk, Failsafe, DMF, CXFS

Onyx 3 - Infinite Reality 3

MDS

Computational Capability & Capacity89 Coupled Climate Model Years

Per Computational Day1 deg. Ocean Model2 deg. Atmospheric

Disk SAN 23.6TB SAN Disk

TP9100B5+P+HS RAID5

w/Dual Controllers2Gbit/s Fibre

GFDL HPCSJuly 2005

CCCI Cluster (IC)SGI Altix 3700, 1.5GHz

2 Nodes x 256 PE + 512GB + 2TB disk1 Node x 96 PE + 192GB + 3TB disk

SAN Bandwidth: 2GigE/Node, NFS mounted

PCP, Workshop Pro,GridEngine,TotalView,

NAG

IC

Page 14: high performance computing exposed

Office of Instructional and Research Technology

Questions?

Eric MarshallOffice of Instructional and Research [email protected] 445-2262