Clusterix:National IPv6 Computing Facility in Poland
Artur Binczewski [email protected]
Radosław Krzywania [email protected]
Maciej Stroiński [email protected]
Jan Węglarz [email protected]
Agenda
• Clusterix Project
• PIONIER Network
• Clusterix Network Architecture
• Network as a resource
• Dynamic Computing Resources
Clusterix Project
• Initiated in the year 2003 by 12 Polish computing centers
• Objectives:
– To build productive and efficient GRID environment
– To provide enhanced security to created GRID infrastructure
– To introduce IPv6 based communication to GRID applications
– To create scalable computing infrastructure with dynamic resourced attachment
Clusterix Project
• 64 bits Intel computing nodes
• Over 800 processors with computing power at 4.4 TFLOPS
• Linux operating system (Debian distribution)
• IPv6 as primary protocol (with IPv4 coexistence)
• Communication based on dedicated channels within PIONIER network
PIONIER network
• Polish Optical Internet – PIONIER – Modern fiber based network– Connects 21 academic and research centres– Over 5500 km of fibers is planned (over 3500 km
exist by now)– Build with DWDM infrastructure– 10 Gbps capacity is available by now
PIONIER network
TELIA 2x2,5 Gb/s
GTS 1,2 Gb/s
GDAŃSK
POZNAŃ
ZIELONA GÓRA
KATOWICE
KRAKÓW
LUBLIN
WARSZAWA
BYDGOSZCZ
TORUŃ
CZĘSTOCHOWA
BIAŁYSTOK
OLSZTYN
RZESZÓW
BIELSKO-BIAŁA
MetropolitanAreaNetworks
KOSZALIN
SZCZECIN
WROCŁAW
ŁÓDŹ
KIELCEPUŁAWYOPOLE
RADOM
BASNET 34 Mb/s
CESNET, SANET 10 Gb/s
10 Gb/s(1 lambda)
PIONIER’S FIBERS
2 x 10 Gb/s(2 lambdas)
1 Gb/s
CBDF 10GE
GÉANT 10+10 Gb/s
DFN 10 Gb/s
Clusterix Network Architecture• Communication to all cluster is
passed through router/firewall• routing based on IPv6 protocol,
with IPv4 for back compatibility feature
• Application and Clusterix middleware are adjusted to IPv6 usage
• For security reason only outgoing connections to Internet are permitted
• Two 1 Gbps VLANs are used to improve management of network traffic– Communication VLAN is dedicated
to support nodes messages exchange
– NFS VLAN is dedicated to support file transfer
PIONIERCore Switch
Clusterix StorageElement
Local ClusterSwitch
ComputingNodes
Access Node
RouterFirewall
Internet NetworkAccess
Communication& NFS VLANs
Internet Network
Backbone Traffic
1 Gbps
Network as a resource
• Network management application– Objectives and features
• Tracking and monitoring network status
• Performing measurements
• Discovering failures location
• Providing network statistics for GRID services
• Layer 3 QoS management
• Automatic measurement session configuration
• Failure resistance
Network as a resource – Measurements
NetworkManager
MeasurementReports
Computing Cluster
Local ClusterMeasurements
PIONIERBackboneMeasurementsSNMP
Monitoring
• Measurement architecture– Distributed 2-level
measurement agent mesh (backbone/cluster)
– Centralized control manager (multiple redundant instances)
– Switches are monitored via SNMP
– Reports are stored by manager (forwarded to database)
– IPv6 protocol and addressing schema is used for measurement
Network as a resource – Architecture
Database ExternalClients GUI
BackupManager
Controller ExternalInterfaces
RedundancyController
System Logic
Measurement AgentsManager
DeviceManager
DevicesBackbone measurements
Local Cluster measurements
Sys
tem
Man
ager
Ext
erna
lE
ntiti
esS
yste
mR
esou
rces
• Manager architecture– Statistics are stored in
external database (short time backup is stored in manager)
– GUI shows network status and configure manager
– Backup managers improves failure recovery (active manager switching)
– External applications are allowed to retrieve various network statistics
– Devices and agents management modules collect network data
Network as a resource – Protocol
• Active Measurement Protocol–All agent types uses the same communication protocol
–First implementation was OWAMP based
–One way measurements was abandoned, and round trip measurement approach is used
–Future modifications was done due to non-fixed messages length and extra requirements
–Protocol supports both IPv6 and IPv4 protocols
–Measurements traffic pattern can be specified for more detailed network examination
–Network metrics: •RTT •Duplicated packets
•Jitter •Packets out of order
•Packet loss
Network as a resource – Monitoring
• Monitoring– Core switches are monitored via SNMP protocol to track
• Interfaces status
• Maximum available capacity
• Current link utilization
– SNMP View is used to improve device's security
Network as a resource – Fail Safe
Manager
BackupManager
SynchronizationData
Measurement Network
Regular working
• Only one active manager is allowed (selection algorithm is based on Bully algorithm)
• Required data are exchanged between active and backup managers
• Measurement agents register at active manager only
Network as a resource – Fail SafeManagerFailure
NewManager
Failure event
• In case of failure, the selection of new active manager is performed
• Agents not register until new active manager is elected
• Measurements are still performed, and results are temporarily stored on agents side
• Newly elected manager recovers system state and accepts agents registrations
• System is ready to serve information
Network as a resource – GUI
• GUI– Provides view of network status
– Gives look at statistics
– Simplifies network troubleshooting
– Allows to configure measurement sessions
– Useful for topology browsing
Dynamic Computing Resources – Motivation
• External clusters can be easily attached to Clusterix infrastructure in order to:
– Increase computing power with new clusters
– Utilize external clusters during nights or non-active periods
– Make Clusterix infrastructure scalable
Dynamic Computing Resources - Architecture
• Dynamic cluster attachment:
– Requirements needs to be checked against new clusters
• Installed software
• SSL certificates
– Communication through router/firewall
– Network Management System will automatically discover new resources
– New cluster can serve computing power on regular basis
PIONIERBackbone Switch
LocalSwitch
RouterFirewall
RegularCluster
DynamicResources
Internet
Summary
• Fast computing center interconnection through PIONIER
• IPv6 protocol is introduced to GRID environment
• Failure resist network monitoring system
• Network is used as a regular GRID resource
• Dynamic architecture allows easy power upgrades