42
TM Clustering and Networking on Linux Kumaran Kalyanasundaram SGI Technology Center

Clustering and Networking on Linux

  • Upload
    tyson

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Clustering and Networking on Linux. Kumaran Kalyanasundaram SGI Technology Center. Agenda. Types of clusters Examples Important technologies Available products Cluster interconnects Speeds and feeds Compute clusters Hardware layouts Programming considerations. - PowerPoint PPT Presentation

Citation preview

Page 1: Clustering and Networking on Linux

TM

Clustering and Networking on LinuxClustering and Networking on Linux

Kumaran KalyanasundaramSGI Technology Center

Page 2: Clustering and Networking on Linux

TMAgenda

• Types of clusters– Examples – Important technologies– Available products

• Cluster interconnects– Speeds and feeds

• Compute clusters– Hardware layouts– Programming considerations

Page 3: Clustering and Networking on Linux

TMTypes of ClustersFunctional View

Availability clusters: For “mission-critical” apps– RAS features are essential requirements for all clusters

Throughput clusters: Run multiple jobs on nodes in the cluster

– Mostly batch-style apps that are not “cluster-aware”– Scheduling, load-balancing

Capability clusters: Run a cluster-aware (HPC/MPI) job on multiple nodes

Page 4: Clustering and Networking on Linux

TMClustering for High Availability

A collection of systems in a cluster, lends itself well to providing significantly higher availability when compared to a standalone system. – If one fails, move to the other!

Significantly higher availability with moderate cost overhead– all systems actively engaged in the cluster workload

Page 5: Clustering and Networking on Linux

TMWhat is High Availability?

• When System or service are available almost all the time!• Resiliency from any single point of failure• Availability at or above 99.9%

– accumulated unplanned outages less than 8 hrs / year

• Minimized Downtime, services made available much before the broken component gets fixed.

• Can hide planned downtime as well

Page 6: Clustering and Networking on Linux

TMThe High-Availability Environment

• Member nodes of the HA cluster

• Services/applications to be made highly available

• Resources they depend upon

• Primary node for each of these applications

• Designated alternate node(s) in case of a failure on primary node

Page 7: Clustering and Networking on Linux

TMElements of HA Infrastructure(HW)

Heartbeat

Public Network (Ethernet, FDDI, ATM)

Replicated Data configuration

A B

ServerA

ServerB

changes

Page 8: Clustering and Networking on Linux

TMElements of HA Infrastructure(HW)

ServerA

ServerB

Heartbeat

Reset

Public Network (Ethernet, FDDI, ATM)

Fibre RAID - Dual Hosted storage configuration

A BFibre Loop A Loop B

Controller BController A

Page 9: Clustering and Networking on Linux

TMElements of HA Infrastructure(SW)

• Heartbeat - short message exchange to monitor system health

• HA Framework monitors common system level resources– Volumes, File systems, network interfaces

• Application specific agents monitor application health

• Cluster management tools

Page 10: Clustering and Networking on Linux

TMIn case of a failure

Failure detected by• storage processors within storage system

• Internode communication failure

• monitors detect resource / application malfunction

Page 11: Clustering and Networking on Linux

TMIn case of a failure - Recovery Steps

• Failure notification to administrator

• Storage access attempt via alternate path

• Service and other necessary resource failover based on predefined failover policies

• I/O fencing to prevent any possibility of data

corruption from failing node

Page 12: Clustering and Networking on Linux

TMIssues to consider

Application cleanup and restart time contribute directly to service downtime

False failovers due to timeouts Mechanisms to avoid it– Conservative Timeouts– Multiple monitoring methods

Shared vs. non-shared data access

Page 13: Clustering and Networking on Linux

TMLinux and High availability

Today, • Linux very popular in Internet server space

• Availability needs of a different flavor

• Software packages addressing HA needs in this segment:– Watchdog (www.wizard.de)– RSF-1 from RSI (www.rsi.co.uk)– Understudy from polyserve(www.polyserve.com)– Some work going on in Linux community, related to Heartbeat etc– Red Hat 6.1

Page 14: Clustering and Networking on Linux

TMLinux and High Availability

As Linux matures, it is expected to make way into:Database serversFile servers and Enterprise applicationsand more...

Important Contributing Technologies– High Availability Framework– Enterprise class HA features– Cluster management - Single System View– Journalled Filesystem

· SGI’s XFS going open-source!

Page 15: Clustering and Networking on Linux

TMClustered, Journalled Filesystem

• Seamless sharing of data• No failover of data required• Near-local file system performance

– Direct data channels between disks and nodes

Clustered XFS (CXFS) :• A shareable high-performance XFS file system

– CXFS sits on top of XFS: Fast XFS features

• A resilient file system– Failure of a node in the cluster does not prevent access to the disks from

other nodes

Page 16: Clustering and Networking on Linux

TMInternet server farm

Web server farms

Download servers

Email server farms

Internet Load balancingswitch

Page 17: Clustering and Networking on Linux

TMInternet server farmManageability of a cluster

• Load sharing switch– Hybrid HA and Throughput solutions– Currently minimal feedback from backend to switch– Efforts in place to provide active feedback to switch

· E.g. Cisco’s Dynamic Feedback Protocol

• Performance monitoring

• Cluster management - Single entity view

Page 18: Clustering and Networking on Linux

TMThroughput clustersComputational workload

Engineering/Marketing scenario analysis servers

EDA server farms

Render Farms

Important Technologies• Cluster wide resource management

• Performance monitoring

• Accounting tools

• Cluster management - Single System View

Page 19: Clustering and Networking on Linux

TMThroughput clustersCluster resource management software

Software to optimally distribute independent jobs on independent servers in a cluster

· a.k.a. job scheduler, load sharing software …

• Portable Batch System (PBS)– Open-source software– Developed at NASA – Jobs submitted specifying resource requirements– Jobs run when resources are available subject to

constraints on maximum resource usage– Unified interface to all computing resources

Page 20: Clustering and Networking on Linux

TM

Beowulf : A Beowulf parallel computer is a cluster of standard computer components, dedicated to use for parallel computation and running software that allows the system to be viewed as a unified parallel computer system.

• Coarse-grain parallel computation and communication-based applications

• Popular with academic and research communities• Breaking into commercial environments

Linux Capability Clusters

Page 21: Clustering and Networking on Linux

TMBeowulf background

Mileposts• 1994: Started with 486s and Ethernet• 1997: Caltech achieves 11 Gflops at SC’97 (140 CPUs)• 1999: Amerada Hess replaces SP2 with 32P Beowulf cluster of

Pentium III’s• 1999: SGI shows 132p Beowulf cluster at Supercomputing 99

– Ohio Supercomputing deploys 132p Beowulf cluster from SGI

Page 22: Clustering and Networking on Linux

TM

Motivation : Solving huge problems using commodity technologies.

Recent popularity because of technology availability :

• Linux Operating System• Price/performance hardware

– Killer Microprocessors– Killer Network Interface Cards (NIC)

• Drivers and Usable Protocols• System and Job Management Software• Parallel Algorithm Maturation• Cluster Application Availability

Linux Capability Clusters

Page 23: Clustering and Networking on Linux

TMCluster friendly applications

• Weather analysis : MM5, ARPS• Engineering analysis :CFD, NVH, Crash• Bioinformatics and Sciences

– GAMESS– Zues-MP: Pure Hydro– AS-PCG: 2D Navier-Stokes– Cactus: 3-D Einstein GR Equations– QMC: Quantum Monte Carlo– MILC: QCD

Page 24: Clustering and Networking on Linux

TM

Important technologies:

• Parallel programming environment– MPI : Widely-supported, highly detailed specification of a standard

C and Fortran interface for message-passing parallel programming– Parallel Debugger

· Totalview from Etnus

• Fast interconnects– Commodity or special-purpose NICs– OS bypass implementations of protocols

Linux Capability Clusters

Page 25: Clustering and Networking on Linux

TM

Interconnect (Myrinet)

32 servers

SGI™ 1400 Beowulf Cluster @ OSC

Configuration:

MPI Myrinet Bypass

Head Node

User Community

Page 26: Clustering and Networking on Linux

TMSGI Beowulf ClusterSample Config

Page 27: Clustering and Networking on Linux

TMInterconnect considerations

Latency: • Key Measure : Round Trip Time measured using the

API(e.g. MPI), not hardware latency

Bandwidth: Measured using the API

CPU Overhead:– How much of API/Protocol is buried in the NIC

Page 28: Clustering and Networking on Linux

TM

Interconnect technologyAPI to Adapter

Cluster API Support, Protocols, Network Drivers and Interfaces: • MPI-LAM / TCP-IP / Fast Ethernet• MPICH / GM / Myrinet • MPI/Pro / VIA / Giganet• SGI-MPI / ST / GSN : Currently SGI MIPS only

GSN: Gigabyte System Network

ST: Schedule Transfer (Protocol)

Page 29: Clustering and Networking on Linux

TMCluster Interconnects

Network Hardware• Fast Ethernet• Gigabit Ethernet• Myrinet• GiganetTM

• GSN

Choice of network is highly application dependent

Page 30: Clustering and Networking on Linux

TMScheduled Transfer Protocol(STP)

• Transaction based communication protocol for low latency system area networking

• Extension of DMA to the network• Low CPU utilization and low latency• Data link layer independent• Standard specifies encapsulation for Ethernet, GSN,

ATM ...

Page 31: Clustering and Networking on Linux

TMCluster Interconnect Comparison1 Gbps range

Latency(one way)

Cost ofNIC

Switchcost

Myrinet MPI 18us $1400 $300/port

Giganet MPI 25us $800 $800/port

GigE(Copper)

MPI >100us $500 $300/port

Page 32: Clustering and Networking on Linux

TMMyrinet

• High performance packet switching interconnect technology for clusters

• Bandwidth upto 1.1 Gb/s

• Myricom supported GM is provided for NT/Linux/Solaris/Tru64/VxWorks

• Supports MPI

• Low CPU and latency(MPI--> 18us)

Page 33: Clustering and Networking on Linux

TMReally Fast Interconnects!API to Adapter

MPI/ST/GSN • The technology used on the ASCI Blue Mountain

and other ultra-high-end clusters• ST Protocol : Light-weight ANSI standard protocol

specialized for high-performance clusters• We are working hard to bring ST to Linux and

Gigabit Ethernet.

Page 34: Clustering and Networking on Linux

TMWhat Is GSN?

• ANSI standard interconnect– Highest bandwidth and lowest latency interconnect

standard

• Gigabyte-per-second links, switches, adapters– Provides full duplex dual, 6.4 Gbps (800MB/s) of error-

free, flow controlled data

• Multi-vendor, multi-protocol interoperability– IBM, Compaq, and others to provide full connectivity

Page 35: Clustering and Networking on Linux

TMCompute cluster components Nodes

• Node Width: ?-way SMP

• Special Serves Nodes:– I/O Nodes– Visualization– Front End– Cluster Management

• Node Qualities:– Form Factor– Remote Monitoring– I/O Performance

Page 36: Clustering and Networking on Linux

TMThin Node Cluster

CPU

CPU

I/OX

X: Internal Bus or Crossbar

CPU

CPU

I/O X

Switch

CPU

CPU

I/OX

CPU

CPU

I/O X

Page 37: Clustering and Networking on Linux

TMScalable Hybrid Architecture:Fat Node Cluster

CPU

CPU

I/OX

Internal Bus, Crossbar or Scalable Interconnect

CPU

CPU

I/O XCPU

CPU

CPU

CPU

Switch

CPU

CPU

I/OX

CPU

CPU

I/O XCPU

CPU

CPU

CPU

X:

Page 38: Clustering and Networking on Linux

TMScalable Hybrid Architecture:

• Scalable Internal Interconnect

• Scalable External Network– Switch– Multiple External Network Devices

• Cluster API: MPI– Use Internal Interconnect ONLY on communication

within a node– Support Use of Multiple External Network devices– Multiple Threads Communicating Increases Message

Passing Bandwidth.

Page 39: Clustering and Networking on Linux

TMFat Node Advantages

Larger Shared Memory environment:• More Applications

• Higher Performance Potential– Shared Memory Latencies/Bandwidths– Add parallelism to supporting applications

• Easier System Administration

Less complex, Fatter network:• Fewer wires, higher bandwidth

Require MPI mixed mode support

Page 40: Clustering and Networking on Linux

TMHybrid Programming Models

Parallelism Granularity and Implementation Layers for Hybrid MPI/OpenMP

• OpenMP -> MPI --> MPI/OpenMP– Automatic Parallelization: Based on OpenMP

Library– OpenMP: Loop Level: Fine Grain– OpenMP: Threaded Implementations: Course

Grain– MPI: Course Grain– MPI/OpenMP: Course Grain/Fine Grain

Page 41: Clustering and Networking on Linux

TMSGI Advanced Cluster Environment (ACE)Comprehensive compute cluster package

• Programming Environment

• Load Sharing and Scheduling Tools

• Administration Tools - Single System View

• Performance Management Tools

• Interconnect drivers

• Cluster-wide accounting

• Shared File System

• High Availability Framework

• Professional/Managed Services

Page 42: Clustering and Networking on Linux

TMSummary

• Linux clusters provide– A wide range of computing options

· High Availability· Throughput· Capacity

– Flexibility– Price/performance– Expandability

• ‘Best solution’ requires integration of commodity systems, open source solutions and specific value-add components