Upload
daniel-knight
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Headline in Arial Bold 30pt
Le nuove frontiere dell’ HPC
Sergio ReSales & Marketing Manager
Silicon Graphics Italia
SGI oggi
Più di 1700 persone
Più di fatturato
$500m
I ns punti di forza tecnici
Un’anzienda rivolta all’innovazione
• sistemi Linux HPC più avanzati del mondo
• un’architettura scalare unica
• un sistema di condivione globale della memoria
• File systems e condivisione dello storage
• Servizi e consulenza
in oltre paesi nel mondo
50
6000+ Clienti attivi
800+ persone che visitano clienti
300+ Ingegneri in R&D
Fatturato per segmento di mercato
Sciences
36%
Engineering
Analysis
20%
Enterprise BusinessManagement/Media
9% Defense & Intelligence
35%
Geographic Contribution
Americas59% Europe
25%
Rest of World16%
SGI SystemsHighly Integrated and Massively Scalable
StorageAdvancedGraphics
High-PerformanceComputing
Joint Development With Partners
SGI and Intel collaborage on system design requirements for large scale computing for Itanium and Xeon CPUs
SGI maintains a close working relationship with Novell on Linux support for scalability, performance, and support for SGI servers and storage systems.
SGI works closely with Red Hat on Linux support with a special emphasis on security and adherence to standards
SGI and Oracle are jointly developing, selling and marketingEnterprise solutions for data intensive problems
SGI has contributed thousands of lines of code to the Linux Community Including code that supports large scale computing, reliability and stability
Signal processingSignal processing
Generiche applicazioni di calcoloGeneriche applicazioni di calcolo
Media streamingMedia streaming
DatabaseDatabase
I/OI/O
Web serverWeb server
Simulazione MetereologicaSimulazione Metereologica CPUCPU
MemoriaMemoria
Le necessità di calcolo
SGI Advanced HPC PlatformLarge
SMP WorkflowMidrange SMP/
Cluster Workflow Cluster Workflow
SGI® Altix® 4700SGI® Altix® 450
SGI® Altix® XE
ICE 8200
Intel® ITANIUM2 based Intel® Xeon® based
SGI® f1200
Application Appliance
Systems Management & Monitoring
Common Workload Management Tools
SGI Scalable shared file servers & storage solutions
Common Linux OS & Development Tools
SGI Workflow Ready SolutionWorkflow Continuum
SGI: Complete HPC Solution on Linux®
• Compilers:– Intel C++ and Fortran Compilers for Linux– GNU Compiler for C and Fortran 77
• Libraries:– SGI Message Passing Toolkit – SGI Scientific Computing Software Library – SGI Flexible File I/O – Intel Math Kernel Library – Intel Integrated Performance Primitives – NAG C, Parallel, Fortran, Fortran SMP, F90
• Automated Parallelization Tools:– Parallel Software Products ParaWise
• Open Source Development Tools:– Linuxlinks.org, Freshmeat.net, SourceForge®.net
• SGI Data Management Software– CXFS™ cluster File System– DMF hierarchical storage management
SGI ProPack™ for Linux
AndThird Party
Tools
SGI ProPack™ for Linux
AndThird Party
Tools
• Debuggers:–Intel Debugger–Etnus® TotalView®–GNU GDB–Allinea Software Distrbuted Debug Tool
• Performance and Analysis Tools:–Intel VTune™ Performance Analyzer–Intel Trace Analyzer and Trace Collector–SGI Performance Co-Pilot™–SGI pfmon and profile.pl–SGI Histx
• Other SGI ProPack Tools–REACT 4.2 (real-time support)–XVM–NUMA tools (cpuset, dlock, dplace)–Embedded Support Partner (ESP)–Graphics support
Standard LinuxDistributions
Standard LinuxDistributions
• Novell SUSE™ Linux Enterprise Server 9 and 10
• RedHat® Enterprise Linux 4
• Carlsbad IRU: 128 cores and no cables
• Redundant, hot swap power and cooling
• Fully Buffered DIMMS to reduce transient errors
• Blade design provides rapid serviceability
• InfiniBand backplane for high signal reliability
ICE 8200Breakthrough Reliability
(16) 2-Socket Nodes
(2) 4x DDR IB Switch Blades(1) 24-Port IB switch ASIC per blade
10U 24-inch EIA Form Factor(17.50-in H x 22.5-in W x 32-in D)
(1) Chassis Management
Controller (CMC)(7+1) 1625W 12VDC Output Front-End Power Supplies
Front View
ICE 8200Breakthrough Performance Density
L 1 Display
L 1 Display
L 1 Display
L 1 Display
L 1 Display
L 1 Display
L 1 Display
L 1 Display
(16) Carlsbad
Blades
(16) Carlsbad
Blades
(16) Carlsbad
Blades
(16) Carlsbad
Blades
Up to 512 Cores and 6 TFlops per Rack
• Each 42U rack (30” W x 40” D) rack has:– (4) IRUs with (16) 2-Socket Carlsbad Nodes each– (128) DP Intel® Xeon® sockets– DDR IB ports on (4) backplanes for torus
• (48) 4x DDR IB
• 19” standard rack also supported • SGI offers optional chilled water-cooled units
for use in large system configurations• 39.5kW (high-bin SKUs + (4) FB DIMMs
/socket) – 31.6kW (assuming 80% system-level derate)
• Rack weight ~ 2050 Lb (246 Lb/ft2 footprint)
SGI Scalable ccNUMA ArchitectureBasic Node Interconnect
Physical Memory
CACHE
CPU
InterfaceChip
CPU
CACHE
NUMAlink Interconnect
Physical Memory
InterfaceChip
Physical Memory
CACHE
CPU CPU
CACHE
Open Systems Scaleable Infrastructure
Physical Memory
CACHE
CPU
InterfaceChip
CPUCACHE
Physical Memory
CACHE
CPU
InterfaceChip
CPUCACHE
Physical Memory
CACHE
CPU
InterfaceChip
CPUCACHE
TIO
GeneralPurpose
I/O
GeneralPurpose
I/O
General PurposeI/O Interfaces
TIO
GPUs GPUs
Scalable GPUs
TIO
FPGA(s)
RASC™ (FPGA)
NUMAlink™Interconnect
Fabric
FPGA(s)
SGI® Altix® 450
• “Plug and Solve” Blade Form Factor
• Half-rack or Full-rack
• 5U ‘IRU’ Chassis
-Chassis-only option
-3rd party rack option
• 2 to 38 Sockets
-4 to 76 cores
• 608 GB SSI Memory
-Increasing to 912GB in 1HCY07
DoubleSlot(I/O)
SingleSlot
NU
MA
LIN
K
Po
wer
S
up
.
SingleSlot
SingleSlot
SingleSlot
NU
MA
LIN
K
Po
wer
Su
p.
Po
wer
Su
p.
5U ‘Individual Rack Unit’ Chassis
SSSS
SS
SS S
SSS
SS
SS
SS
SS
SSSS
SS
SS S
SSS
SSSS
SS
SS S
SSS
I/O
Bringing it Together: Solution Components
• Scalable servers, clusters, and supercomputers
• Cost-efficient, reliable Altix XE clusters with leading density, power efficiency
• Advanced scalability to 512 processors per Altix server and 128TB globally addressable memory per system
• Complete Linux solution for HPC
• Interactive parallel computing platform
• Bridges MATLAB and Altix servers• Works with familiar desktop tools,
while leveraging an HPC for computationally-intensive tasks
• Automatic and transparent, no new programming
SGI® Altix® and Altix® XE ISC Star-P™
Images courtesy of Silicon Graphics, Inc; Interactive Supercomputing Inc.
How Star-P works
• Star-P consists of desktop & server software• Desktop software – Star-P Client
– Overloads or intercepts desktop tool functions– Connects and communicates to/with server software – securely
• Server software – Star-P Server on SGI® Altix® or SGI® Altix® XE– Manages and directs resources – memory, cpu’s and I/O– Contains world class libraries for parallel execution– User & Session management
What’s the Value
• Desktop Users:– No change in religion– Interactivity
• On parallel machines• For large data
– No reprogramming• No C, Fortran, MPI
– Reduced run times• Not hours or weeks
– Continued model optimization
• Organization– Collapse development cycles– Reduce costs– Broaden usage– Shorten solution time– Accelerate research
Parallel Development Takes Too Long
• Months or years are spent porting from desktops to parallel systems
• No interactively on parallel machines from desktop
• Little ability to iterate
• Long compute times for batch runs; hours-days
• Analyst’s ability to optimize the model is limited
Using Star-P– Serial operations
• Use MATLAB– File Editor– Profiler– Debugger– Array Editor– Desktop– Visualization– Small Calculations
• Running Star-P does not affect normal MATLAB environment
• Problems that can be solved on desktop - stay on desktop
SGI® RASC™ Solution:Simplifies Development & Improves Programmer Efficiency
Gnu Debugger (GDB)
FPGA AwareSimultaneous debugging of both the CPU based app and the FPGA accelerated app
RASC Abstraction Layer (RASCAL)
SGI providedEnables serial or parallel FPGA scaling
RASC API and Core Services Library
Provides tools to develop reconfigurable computing elements in a multi-user, multiprocessing environment
3rd Party HLL Development Tools
Mitrionics Mitrion C, Impulse-C and ROCCC
Synplicity Synplify Pro and Xilinx Synthesis Technology
Supported within RASC environmentFor advanced incremental and modular design methodologies
How Do FPGAs Differ from Traditional CPUs?
Directly map computationally-intensive algorithms to hardware with RC100 technologyIdentify RASC
appropriate algorithm
Compare Application Run Time %’s
Ex
po
rt A
lgo
rithm
to R
AS
C
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
App 1 App 2 App 3 App 4 App 5
% o
f Run
time
Algorithm Algorithm Memory Calls Branche inst.
Application Run-Time Comparison
RC100 MethodKey Algorithm
running on FPGA
AlgorithmExecution Time
010010000100100111010010101011100101010001100010001100010101010101011100000111100100000100101110100 11 001 00011 11 11011110011 0
Traditional MethodCPU only
AlgorithmExecution time
TimeSavings
Jo
b R
un
Tim
e
0100100001001001110100101010
SGI® RASC™ RC100 Blade
TIO
TIO
NL4
NL4
Loader
NL4
PCI
SSP
SSP
Selmap
Selmap
V4LX200
V4LX200
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM SRAM
SRAM
SSAM
SRAM
FSI Workload
InfiniBand or GigE Fabric
SGI® Scalable NAS (and other shared file servers)
Fibre Channel or InfiniBand or GigE Fabric
CAPACITYCAPABILITY
SGI Altix 450/4700SMP & super head node
SGI Altix 450/4700SMP & super head node
SGI Altix XE 1200/1300(x86-64) clusters
SGI Altix XE 1200/1300(x86-64) clusters
Optimally meet the diverse needs of all workloads or procurement driversOptimally meet the diverse needs of all workloads or procurement drivers
SGI Workflow Ready Solution –Segment Example –Fluid Structure Interaction (FSI)
Minimize time-to solution for the largest & most
demanding problems
Minimize time-to solution for the largest & most
demanding problems
Cost-effectiveSolution
& performance leader for most analyses
Cost-effectiveSolution
& performance leader for most analyses
Any combination of ALTIX servers & XE sharing storage resources
SGI Solution:1.ALTIX XE
• Modest memory addressability (~2-4GB+/core)
2.ALTIX 450/4700• Large memory
addressability (~4-8GB/core)
• Option for B/W blades3.Storage
• High Speed SAS (~250GB/core)
• 4SAS disks per XE node
StorageAdvancedGraphics
4500• Max performance• 4Gb FC or IB• Enterprise S/W• FC RAID / SATA
4000• 4Gb Fibre Channel• Ultimate
Price/Performance• FC RAID / SATA
SAN NAS• Completely integrated• Easy to deploy• Grows with customers’ business
SGI® InfiniteStorage Hardware
6700• Multiple high
resolution streaming
• Isochronous
Multi-purpose RAID Systems Streaming Real-time RAID
10000• Ultra-high density• Tape complement or
replacement• One rack – 240 TB
120 • Easy to deploy modular
scalability
Ultra-dense RAID
JBOD
Low-cost SATA RAID
350• 4 Gb connectivity• 500GB SATA drives
Data Management Software Stack
Storage Product Integration
XFS
DMF
CXFS
3rd PartyDisks, Fiber Channel Switches
SGI SystemAltix Servers
NFSRDMA accelerated (NAS)
InfiniteStorage®
Appliance Manager
SGI® InfiniteStorage, le soluzioni
SGI® InfiniteStorage Data Migration Facility (DMF) migra in maniera trasparente I files dallo Storage On-line a quello definito near-line secondo I critei temporali assegnati
• Questo porta ad abbassare il TCO • Incremetare il ROI e la produttività • E’ più facile da gestire • Riduce I rischi di perdita dei dati• Protegge gli investimenti iniziali • Integra la disponibilità del dato con la sua sicurezza
SGI® InfiniteStorage Shared File System CXFS™
• tutti I file sono condisisi• non sono copiati• Non si spreca spazio• si risparmia tempo• si risparmiano soldi
File B
File D
File E
File F
File G
File H
File I
File A
File C
Dedicato a: Decision Support Centres, Surveillance/Homeland Security/Crisis, C4I battlefield command and control
Vi interessa?... Si chiama Pixelfusion Enviroment
FusionInput Output
Media Fusion Process
Renderto
Pipes
LocalDisplay
Record/Retrieve
NativeRender
LocalStreams
Input Fusion Output
Fusion
StreamtoIP
Network
IPStreams
Network
• Packaging– Consistency
– Density & Reliability
– Energy Efficiency
• Interconnect– Reduce Cost
– Increase Value
• Data Management
HPC Technology Investment Strategy
Government Customer’s Data & TestsIncumbent 96GB System vs. Altix 960GB
Improvement
• Ingest order-records 5x
• Ingest person-records 12x
• Query 1 per secest vs. 91K/sec
• Join Data 1 every min vs. 13K/sec
• Sub-Query 1 every 5mins vs. 2.5K/sec
May’06
Oracle® TimesTen :In-Memory DB Customer Benchmark Results
SGI Altix Servers Support More Memory
• SGI Altix 4700 supports more memory
• Fewer cores are required to support the same level of memory
• Lower TCO:• Spend less on processors• Spend less on software licenses
Source: Ideas International, Inc. – February 2007
Maximum Memory Memory/Core
SGI Altix 4700 128 TB 128 GB/core
IBM p595 2 TB 32 GB/core
HP Superdome 2 TB 16 GB/core
Sun Enterprise 25K
1 TB 8 GB/core
SGI Altix 4700 Requires Less Floorspace
HP SuperDome
IBM
p59
5
Su
n E
25K
SGI Altix 4700
Width Depth Area
Altix 4700 45” 26” 1170”
HP SuperDome 60” 48” 2880”
Sun E25K 65” 33” 2145”
IBM p595 52” 31” 1612”
Dense System Packaging is one of SGI’s Core Competencies
System Footprint
SGI Altix Innovative Power Architecture
AC
1.85v
48VDC
3.3v 1.2v
12v
~80%
Typical Power Architecture
Server
board
Additionalboards
AC
1.85v 3.3v 1.2v
12v 90%
85%
SGI Power Architecture
SGI Altix Server
board
Additionalboards
80% x 80% x 70% = 45% efficiency 90% x 92% = 76% efficiency
No 48VConversion
High EfficiencyPower Converters
ProprietaryPower Design
~80%
~70%
SGI Servers are Twice as Efficient
• NUMAlink4 (Today)
– Custom copper cable
– Custom signaling
– Custom protocol
Interconnect Strategy : Reduce Cost & Increase Capability
$450 (5m) $150 (5m)
• NUMAlink5: hw extension of IFB (’09)
– COTS Infiniband12x copper cable
– COTS serdes
– Custom protocol (higher capability)
Picture credit : LRZ
Copper : Weight becoming significant
SGI NumaLink System Architecture
MPUMPU
MPU IOIO
APU
(FPGA)CPU
CPU
CPU
APU
(GPU)
Very Large Shared Memory
. Globally Addressable . Low Latency . High Bandwidth . Many Ports
APU
(FPGA)
CPU
CPU
CPU