View
36
Download
0
Category
Tags:
Preview:
DESCRIPTION
The Gelato Federation. What is it exactly ?. Sverre Jarp March, 2003. Gelato is a collaboration. Goal: Promote Linux on Itanium-based systems Sponsor Hewlett-Packard Others coming Members 13 (right now) Mainly from the High Performance/High Throughput Community - PowerPoint PPT Presentation
Citation preview
1
The Gelato Federation
What is it exactly ?
Sverre Jarp
March, 2003
SJ – Mar 2003 2
Gelato is a collaboration
Goal: Promote Linux on Itanium-based systems
Sponsor Hewlett-Packard Others coming
Members 13 (right now) Mainly from the High Performance/High
Throughput Community Expected to grow rapidly
SJ – Mar 2003 3
Current members
North America NCAR (National Center for Atmospheric Research) NCSA (National Center for Supercomputing
Applications) PNNL (Pacific Northwest National Lab) PSC (Pittsburgh Supercomputer Center) University of Illinois-Urbana/Champaign University of Waterloo
Europe CERN DIKU (Datalogic Institute, University of Copenhagen) ESIEE (École Supérieure d’ingénieurs near Paris) INRIA (Institut National de Recherche en Informatique
et Automatique)
Far-East/Australia Bio-informatics Institute (Singapore) University of Tsinghua (Beijing) University of New South Wales (Sydney)
SJ – Mar 2003 4
Center of gravity
Web portal (http://www.gelato.org) Rich content
(Pointers to) Open source IA-64 applications Examples:
ROOT (from CERN) OSCAR (Cluster mgmt software from NSCA) OpenImpact compiler (UIUC)
News Information, advice, hints
Related to IPF, Linux kernel, etc. Member overview
Who is who, etc.
SJ – Mar 2003 5
Current development focus
Six “performance” areas:
Single system scalability From 2-way to 16-way (HP, Fort Collins)
Cluster Scalability and Performance Mgmt Up to 128-nodes: NSCA
Parallel File System BII
Compilers UIUC
Performance tools, management HP Labs
SJ – Mar 2003 6
CERN Requirement # 1
Better C++ performance through
Better compilers
Faster systems
Both!
Gelato focus
Madison @
1.5 GHz
SJ – Mar 2003 7
Further Gelato Research and Development
Linux memory management Superpages TLB sharing between processes IA-64 pre-emption support
Compilers/Debuggers OpenImpact C compiler (UIUC) Open Research Compiler enhancements
(Tsinghua) Fortran, C, C++
Parallel debugger (Tsinghua)
SJ – Mar 2003 8
The “opencluster” and the “openlab”
Sverre Jarp IT Division
CERN
SJ – Mar 2003 9
Definitions
The “CERN openlab for DataGrid applications” is a framework for evaluating and integrating cutting-edge technologies or services in partnership with industry, focusing on potential solutions for the LCG.
The openlab invites members of the industry to join and contribute systems, resources or services, and carry out with CERN large-scale high-performance evaluation of their solutions in an advanced integrated environment.
“opencluster” projectThe openlab is constructing a pilot ‘compute and storage farm’ called the opencluster, based on HP's dual processor servers, Intel's Itanium Processor Family (IPF) processors, Enterasys's 10-Gbps switches and, at a later stage, a high-capacity storage system.
SJ – Mar 2003 10
Technology onslaught
Large amounts of new technology will become available between now and LHC start-up. A few HW examples:
Processors SMT (Symmetric Multi-Threading) CMP (Chip Multiprocessor) Ubiquitous 64-bit computing (even in laptops)
Memory DDR II-400 (fast) Servers with 1 TB (large)
Interconnect PCI-X PCI-X2 PCI-Express (serial) Infiniband
Computer architecture Chipsets on steroids Modular computers
ISC2003 Keynote Presentation Building Efficient HPC Systems from Catalog Components
Justin Rattner, Intel Corp., Santa Clara, USA Disks
Serial-ATA Ethernet
10 GbE (NICs and switches) 1 Terabit backplanes
Not all, but some of this will definitely be used
by LHC
SJ – Mar 2003 11
Vision: A fully functional GRID cluster node
Gigabit long-haul link
CPU
Servers
WAN
Multi-gigabit LAN
Storagesystem
RemoteFabric
SJ – Mar 2003 12
opencluster strategy
Demonstrate promising technologies LCG and LHC on-line
Deploy the technologies well beyond the opencluster itself
10 GbE interconnect in the LHC Testbed Act as a 64-bit Porting Centre
CMS and Alice already active; ATLAS is interested CASTOR 64-bit reference platform
Storage subsystem as CERN-wide pilot Focal point for vendor collaborations
For instance, in the “10 GbE Challenge” everybody must collaborate in order to be successful
Channel for providing information to vendors
Thematic workshops
SJ – Mar 2003 13
The opencluster today
Three industrial partners: Enterasys, HP, and Intel
A fourth partner joining soon Data storage subsystem
Which would “fulfill the vision”
Technology aimed at the LHC era Network switches at 10 Gigabits Rack-mounted HP servers 64-bit Itanium processors
Cluster evolution: 2002: Cluster of 32 systems (64 processors) 2003: 64 systems (“Madison” processors) 2004/05: Possibly 128 systems (“Montecito” processors)
SJ – Mar 2003 14
Activity overview Over the last few months
Cluster installation, middleware Application porting, compiler
installations, benchmarking Initialization of “Challenges” Planned first thematic workshop
Future Porting of grid middleware Grid integration and benchmarking Storage partnership Cluster upgrades/expansion New generation network switches
SJ – Mar 2003 15
opencluster in detail
Integration of the cluster: Fully automated network installations 32 nodes + development nodes RedHat Advanced Workstation 2.1 OpenAFS, LSF GNU, Intel, ORC Compilers (64-bit)
ORC (Open Research Compiler, used to belong to SGI)
CERN middleware: Castor data mgmt CERN Applications
Porting, Benchmarking, Performance improvements
CLHEP, GEANT4, ROOT, Sixtrack, CERNLIB, etc.
Database software (MySQL, Oracle?)
Many thanks to my colleagues in ADC, FIO and CS
SJ – Mar 2003 16
The compute nodes HP rx2600
Rack-mounted (2U) systems Two Itanium-2 processors
900 or 1000 MHz Field upgradeable to next generation
2 or 4 GB memory (max 12 GB) 3 hot pluggable SCSI discs (36 or 73 GB) On-board 100 Mbit and 1 Gbit Ethernet 4 PCI-X slots:
full-size 133 MHz/64-bit slot(s) Built-in management processor
Accessible via serial port or Ethernet interface
SJ – Mar 2003 17
rx2600 block diagram
PCIX 133/64
LAN10/100
zx1 IOA
zx1 IOA
zx1 IOA
zx1 IOA
HDD
HDD
USB 2.0
IDECD - DVD
SCSIUltra 160
Gbit LAN
PCIX 133/64
PCIX 133/64
zx1 IOA
zx1 IOA
cell 1
HDD
CD/
DVD
12 DIMMs
Intel Itanium
2
Intel Itanium
2
3 internal drives
zx1 Memory &
I/O Controller
LAN 10/100
3 serial ports
cell 0
zx1 IOA
Service processor
Management Processor card
VGA monitor
channel a
channel b
6.4GB/s
PCIX 133/64
4.3 GB/s
1 GB/s
SJ – Mar 2003 18
Benchmarks
Comment: Note that 64-bit benchmarks will pay a performance penalty for LP64, i.e. 64-bit pointers.Need to wait for AMD systems that can run natively either a 32-bit OS or a 64-bit OS to understand the exact cost for our benchmarks.
SJ – Mar 2003 19
Benchmark-1: Sixtrack (SPEC)
What we would have liked to see for all CERN benchmarks:
Projections: Madison @ 1.5 GHz: ~ 81 s
Small is best!
From www.spec.org Itanium 2 @ 1000MHz (efc7)
Pentium 4 @ 3.06 GHz(ifl7)
IBM Power4 @ 1300 MHz
Sixtrack 122 s 195 s 202 s
SJ – Mar 2003 20
Benchmark-2: CRN jobs/FTN
Itanium @ 800 MHz (efc O3,prof_use)
Itanium 2 @ 1000 MHz (efcO3, ipo, prof_use)
Pentium 4 Xeon @ 2 GHz, 512KB(ifc O3, ipo, prof_use)
Geom. Mean 183 387 415
CU/MHz 0.23 0.39 0.21
Big is best!
Projections: Madison @ 1.5 GHz: ~ 585 CU P4 Xeon @ 3.0 GHz: ~ 620 CU
SJ – Mar 2003 21
Benchmark-3: Rootmarks/C++
All jobs run in “batch” modeROOT 3.05.02
Itanium 2 @ 1000MHz (gcc 3.2, O3)
Itanium 2 @ 1000MHz (ecc7 prod, O2)
Stress –b -q 423 476
Bench –b -q 449 534
Root -b benchmarks.C -q
344 325
Geometric Mean 402 436
Projections: Madison @ 1.5 GHz: ~ 660 RM Pentium 4 @ 3.0 GHz/512KB: ~ 750
RM
René’s own 2.4 GHz P4 is normalized to 600 RM.
Stop press: We have
just agreed on a compiler improvement
project with Intel
SJ – Mar 2003 22
opencluster - phase 1
Perform cluster benchmarks:
Parallel ROOT queries (via PROOF) Observed excellent scaling:
2 4 8 16 32 64 CPUs To be reported at CHEP2003
“1 GB/s to tape” challenge Network interconnect via 10 GbE switches Opencluster may act as CPU servers 50 StorageTek tape drives in parallel
“10 Gbit/s network Challenge” Groups together all Openlab partners
Enterasys switch HP servers Intel processors and n/w cards CERN Linux and n/w expertise
MB/s
0
100
200
300
400
500
600
700
800
0 10 20 30 40 50 60 70
MB/s
SJ – Mar 2003 23
10 GbE Challenge
SJ – Mar 2003 24
Network topology in 2002
1-12 13-24 25-36 37-48
49- 60 61-72 73-84 85-96
1-12 13-24 25-36
44
44
4 444
22
2
2
2
1-96 FastEthernet
Disk Servers
E1 OAS E1 OAS
E1 OAS E1 OAS
Gig copperGig fiber10 Gig
SJ – Mar 2003 25
Enterasys extension 1Q2003
1-12 13-24 25-36 37-48
49- 60 61-72 73-84 85-96
1-12 13-24 25-36
4 4 4
4 444
2 2 2
2
2
1-96 FastEthernet
Disk Servers
Gig copperGig fiber10 Gig
32
32 node Itanium cluster 200+ node Pentium cluster
E1 OAS E1 OAS
E1 OAS E1 OAS
SJ – Mar 2003 26
Why a 10 GbE Challenge?
Demonstrate LHC-era technology All necessary components available
inside the opencluster Identify bottlenecks
And see if we can improve
We know that Ethernet is here to stay 4 years from now 10 Gbit/s should be
commonly available Backbone technology Cluster interconnect Possibly also for iSCSI and RDMA traffic
We want to advance the state-of-the-art !
SJ – Mar 2003 27
Demonstration of openlab partnership
Everybody contributes: Enterasys
10 Gbit switches Hewlett-Packard
Server with its PCI-X slots and memory bus Intel
10 Gbit NICs plus driver Processors (i.e. code optimization)
CERN Linux kernel expertise Network expertise Project management IA32 expertise
CPU clusters, disk servers on multi-Gbit infrastructure
SJ – Mar 2003 28
“Can we reach 400 – 600 MB/s throughput?”
Bottlenecks could be: Linux CPU consumption
Kernel and driver optimization Number of interrupts; tcp checksum; ip packet handling,
etc. Definitely need TCP offload capabilities
Server hardware Memory banks and speeds PCI-X slot and overall speed
Switch Single transfer throughput
Aim: identify bottleneck(s) Measure
peak throughput Corresponding cost: processor, memory, switch, etc.
SJ – Mar 2003 29
Gridification
SJ – Mar 2003 30
Opencluster - future
Port and validation of EDG 2.0 software Joint project with CMS
Integrate opencluster alongside EDG testbed Porting, Verification
Relevant software packages (hundreds of RPMs) Understand chain of prerequisites Exploit possibility to leave control node as IA-32
Interoperability with EDG testbeds and later with LCG-1
Integration into existing authentication scheme GRID benchmarks
To be defined later
Fact sheet: HP joined openlab mainly because of their interest in Grids
SJ – Mar 2003 31
Opencluster time line
Jan 03 Jan 04 Jan 05 Jan 06
Install 32 nodes
Start phase 1 - Systems expertise in place
Complete phase 1
Order/Install G-2 upgrades and 32 more nodes
Order/Install G-3 upgrades; Add nodes
op
en
Clu
ste
rin
teg
ratio
n EDG and LCG interoperability
Start phase 2
Complete phase 2Start phase 3
SJ – Mar 2003 32
Recap:opencluster strategy
Demonstrate promising IT technologies
File system technology to come
Deploy the technologies well beyond the opencluster itself
Focal point for vendor collaborations
Channel for providing information to vendors
SJ – Mar 2003 33
Storage Workshop Data and Storage Mgmt Workshop (Draft Agenda) March 17th – 18th 2003 (Sverre Jarp) Organized by the CERN openlab for Datagrid applications and the LCG
Aim: Understand how to create synergy between our industrial partners and LHC Computing in the area of storage management and data access.
Day 1 (IT Amphitheatre)
Introductory talks: 09:00 – 09:15 Welcome. (von Rueden) 09:15 – 09:35 Openlab technical overview (Jarp) 09:35 – 10:15 Gridifying the LHC Data: Challenges and current shortcomings (Kunszt) 10:15 – 10:45 Coffee break
The current situation: 10:45 – 11:15 Physics Data Structures and Access Patterns (Wildish) 11:15 – 11:35 The Andrew File System Usage in CERN and HEP (Többicke) 11:35 – 12:05 CASTOR: CERN’s data management system (Durand) 12:05 – 12:25 IDE Disk Servers: A cost-effective cache for physics data (NN) 12:25 – 14:00 Lunch
Preparing for the future 14:00 – 14:30 ALICE Data Challenges: On the way to recording @ 1 GB/s (Divià) 14:30 – 15:00 Lessons learnt from managing data in the European Data Grid (Kunszt) 15:00 – 15:30 Could Oracle become a player in the physics data management? (Shiers) 15:30 – 16:00 CASTOR: possible evolution into the LHC era (Barring) 16:00 – 16:30 POOL: LHC data Persistency (Duellmann) 16:30 – 17:00 Coffee break 17:00 – Discussions and conclusion of day 1 (All)
Day 2 (IT Amphitheatre) Vendor interventions; One-on-one discussions with CERN
SJ – Mar 2003 34
THANK YOU
Recommended