24
Clusterix:National IPv6 Computing Facility in Poland Artur Binczewski [email protected] Radosław Krzywania [email protected] Maciej Stroiński [email protected]. pl Jan Węglarz [email protected]. pl

Clusterix:National IPv6 Computing Facility in Poland Artur Binczewski [email protected] Radosław Krzywania [email protected] Maciej Stroiński [email protected]

Embed Size (px)

Citation preview

Clusterix:National IPv6 Computing Facility in Poland

Artur Binczewski [email protected]

Radosław Krzywania [email protected]

Maciej Stroiński [email protected]

Jan Węglarz [email protected]

Agenda

• Clusterix Project

• PIONIER Network

• Clusterix Network Architecture

• Network as a resource

• Dynamic Computing Resources

Clusterix Project

Clusterix Project

• Initiated in the year 2003 by 12 Polish computing centers

• Objectives:

– To build productive and efficient GRID environment

– To provide enhanced security to created GRID infrastructure

– To introduce IPv6 based communication to GRID applications

– To create scalable computing infrastructure with dynamic resourced attachment

Clusterix Project

• 64 bits Intel computing nodes

• Over 800 processors with computing power at 4.4 TFLOPS

• Linux operating system (Debian distribution)

• IPv6 as primary protocol (with IPv4 coexistence)

• Communication based on dedicated channels within PIONIER network

PIONIER network

PIONIER network

• Polish Optical Internet – PIONIER – Modern fiber based network– Connects 21 academic and research centres– Over 5500 km of fibers is planned (over 3500 km

exist by now)– Build with DWDM infrastructure– 10 Gbps capacity is available by now

PIONIER network

TELIA 2x2,5 Gb/s

GTS 1,2 Gb/s

GDAŃSK

POZNAŃ

ZIELONA GÓRA

KATOWICE

KRAKÓW

LUBLIN

WARSZAWA

BYDGOSZCZ

TORUŃ

CZĘSTOCHOWA

BIAŁYSTOK

OLSZTYN

RZESZÓW

BIELSKO-BIAŁA

MetropolitanAreaNetworks

KOSZALIN

SZCZECIN

WROCŁAW

ŁÓDŹ

KIELCEPUŁAWYOPOLE

RADOM

BASNET 34 Mb/s

CESNET, SANET 10 Gb/s

10 Gb/s(1 lambda)

PIONIER’S FIBERS

2 x 10 Gb/s(2 lambdas)

1 Gb/s

CBDF 10GE

GÉANT 10+10 Gb/s

DFN 10 Gb/s

Clusterix Network Architecture

Clusterix Network Architecture• Communication to all cluster is

passed through router/firewall• routing based on IPv6 protocol,

with IPv4 for back compatibility feature

• Application and Clusterix middleware are adjusted to IPv6 usage

• For security reason only outgoing connections to Internet are permitted

• Two 1 Gbps VLANs are used to improve management of network traffic– Communication VLAN is dedicated

to support nodes messages exchange

– NFS VLAN is dedicated to support file transfer

PIONIERCore Switch

Clusterix StorageElement

Local ClusterSwitch

ComputingNodes

Access Node

RouterFirewall

Internet NetworkAccess

Communication& NFS VLANs

Internet Network

Backbone Traffic

1 Gbps

Network as a resource

Network as a resource

• Network management application– Objectives and features

• Tracking and monitoring network status

• Performing measurements

• Discovering failures location

• Providing network statistics for GRID services

• Layer 3 QoS management

• Automatic measurement session configuration

• Failure resistance

Network as a resource – Measurements

NetworkManager

MeasurementReports

Computing Cluster

Local ClusterMeasurements

PIONIERBackboneMeasurementsSNMP

Monitoring

• Measurement architecture– Distributed 2-level

measurement agent mesh (backbone/cluster)

– Centralized control manager (multiple redundant instances)

– Switches are monitored via SNMP

– Reports are stored by manager (forwarded to database)

– IPv6 protocol and addressing schema is used for measurement

Network as a resource – Architecture

Database ExternalClients GUI

BackupManager

Controller ExternalInterfaces

RedundancyController

System Logic

Measurement AgentsManager

DeviceManager

DevicesBackbone measurements

Local Cluster measurements

Sys

tem

Man

ager

Ext

erna

lE

ntiti

esS

yste

mR

esou

rces

• Manager architecture– Statistics are stored in

external database (short time backup is stored in manager)

– GUI shows network status and configure manager

– Backup managers improves failure recovery (active manager switching)

– External applications are allowed to retrieve various network statistics

– Devices and agents management modules collect network data

Network as a resource – Protocol

• Active Measurement Protocol–All agent types uses the same communication protocol

–First implementation was OWAMP based

–One way measurements was abandoned, and round trip measurement approach is used

–Future modifications was done due to non-fixed messages length and extra requirements

–Protocol supports both IPv6 and IPv4 protocols

–Measurements traffic pattern can be specified for more detailed network examination

–Network metrics: •RTT •Duplicated packets

•Jitter •Packets out of order

•Packet loss

Network as a resource – Monitoring

• Monitoring– Core switches are monitored via SNMP protocol to track

• Interfaces status

• Maximum available capacity

• Current link utilization

– SNMP View is used to improve device's security

Network as a resource – Fail Safe

Manager

BackupManager

SynchronizationData

Measurement Network

Regular working

• Only one active manager is allowed (selection algorithm is based on Bully algorithm)

• Required data are exchanged between active and backup managers

• Measurement agents register at active manager only

Network as a resource – Fail SafeManagerFailure

NewManager

Failure event

• In case of failure, the selection of new active manager is performed

• Agents not register until new active manager is elected

• Measurements are still performed, and results are temporarily stored on agents side

• Newly elected manager recovers system state and accepts agents registrations

• System is ready to serve information

Network as a resource – GUI

• GUI– Provides view of network status

– Gives look at statistics

– Simplifies network troubleshooting

– Allows to configure measurement sessions

– Useful for topology browsing

Dynamic Computing Resources

Dynamic Computing Resources – Motivation

• External clusters can be easily attached to Clusterix infrastructure in order to:

– Increase computing power with new clusters

– Utilize external clusters during nights or non-active periods

– Make Clusterix infrastructure scalable

Dynamic Computing Resources - Architecture

• Dynamic cluster attachment:

– Requirements needs to be checked against new clusters

• Installed software

• SSL certificates

– Communication through router/firewall

– Network Management System will automatically discover new resources

– New cluster can serve computing power on regular basis

PIONIERBackbone Switch

LocalSwitch

RouterFirewall

RegularCluster

DynamicResources

Internet

Summary

• Fast computing center interconnection through PIONIER

• IPv6 protocol is introduced to GRID environment

• Failure resist network monitoring system

• Network is used as a regular GRID resource

• Dynamic architecture allows easy power upgrades

Thank you for your attention!

Visit http://www.clusterix.pcz.pl