Upload
scot-berry
View
222
Download
0
Embed Size (px)
Citation preview
-- OSS for High-Availability April, 2005
Linux in High-Availability Environments
Alan Robertson
IBM Linux Technology Center
-- OSS for High-Availability April, 2005
OSS in HA Environments
Why OSS for High Availability Environments?
What is High-Availability (HA) Clustering?
What can HA do for me?
DRBD Data Replication
The Linux Virtual Server Load Balancer
The Linux-HA project?
Linux-HA applications and customers
Thoughts about cluster security
-- OSS for High-Availability April, 2005
Why OSS In High-Availability Environments?
Openness
Broad Range Of Environments
Breadth of Support Options
Lack of Vendor Lock-In
-- OSS for High-Availability April, 2005
Openness
Extensive Peer Review System
Source code freely availableSource code reviewed by outside partiesChanges discussed openly – often in great detail
Ability to obtain uncensored product information
Mailing lists archives contain contain uncensored comments from
Users with deep expertiseUsers with little expertiseUsers who are very happyUsers with problems
-- OSS for High-Availability April, 2005
Broad Range of Environments
OSS typically runs on many platforms, often on different OSes too
Users often find very creative uses for the software
Freedom to try something at low cost decreases perceived risks and encourages this behavior
Creative uses find their way into mailing list (archives) and sometimes into the OSS product
Users help with testing – providing more breadth in test environment than might otherwise occur
-- OSS for High-Availability April, 2005
Support for OSS Systems
Mailing lists consist of hundreds to thousands of users who are very knowledgeable and helpful – usually regarded as very responsive – typically located in most time zones across the world
Can choose support vendor freely:
Hardware, OS or OSS supplier
Independent consulting/support organizations
In-house expertise (most motivated)
OSS mailing lists
Any combination of the above
-- OSS for High-Availability April, 2005
No Vendor Lock-In
Does not rely on a vendor's future plans being compatible with yours (risk mitigation)
Obsolescence more readily manageable
Does not rely on a single vendor in another company or country
Contributing to the product (or paying someone else to) provides you a voice in future direction
Compatibility with other systems typically better
-- OSS for High-Availability April, 2005
What Is HA Clustering?
A group of computers which cooperate and trust each other to provide a service even when cluster components fail
When one machine goes down, others take over its work
This involves IP address takeover, service takeover, etc.
New work comes to the “takeover” machine
Not primarily designed for high-performance
-- OSS for High-Availability April, 2005
What Can HA Clustering Do For You?
It cannot achieve 100% availability – nothing can.HA Clustering designed to recover from single faults
It can make your outages very short
From about a second to a few minutes
It is like a Magician's (Illusionist's) trick:
When it goes well, the hand is faster than the eye
When it goes not-so-well, it can be reasonably visible
A good HA clustering system adds a “9” or two to your availability
99->99.9, 99.9->99.99, 99.99->99.999, etc.
Complexity is the enemy of reliability!
-- OSS for High-Availability April, 2005
The Desire for HA systems
Who wants low-Who wants low-availability systems?availability systems?
Why are so few systems High-Availability?
-- OSS for High-Availability April, 2005
Why isn't everything HA?
Cost
Complexity
-- OSS for High-Availability April, 2005
-- OSS for High-Availability April, 2005
Single Points of Failure (SPOFs)
A single point of failure is a component whose failure will cause near-immediate failure of an entire system or service
Good HA design eliminates of single points of failure
-- OSS for High-Availability April, 2005
How Does HA work?
Manage redundancy to improve service availability
Like a cluster-wide-super-init on steroids
Even complex services are now “respawn”
on node (computer) death
on “impairment” of nodes
on loss of connectivity
for services that aren't working (not necessarily stopped)
managing very complex dependency relationships
-- OSS for High-Availability April, 2005
DRBD – RAID over the LAN
Block-device (filesystem) level replication
Clever synchronization methods make resyncs faster, decrease latency, preserve integrity
Useful for both HA and Disaster Recovery
NO single point of failure
Extremely cost-effective$200 (max) instead of $20,000 (min) ($USD)
Probably not suitable for some high-end write-intensive applications
Supportable by IBM Support Line
-- OSS for High-Availability April, 2005
-- OSS for High-Availability April, 2005
LVS – The Linux Virtual Server Project
LVS is the standard Linux Load Balancer
Called "ipvs" in the standard Linux kernel
Stable, fast, flexible
Especially suitable for large "server farms"
-- OSS for High-Availability April, 2005
LVS IN Action
-- OSS for High-Availability April, 2005
“Plays Well With Others”
Each of these independent services can work together to scale to large systems
All single points of failure can be eliminated
High-Availability, Load Balancing work together nicely
-- OSS for High-Availability April, 2005
Linux Virtual Server, Linux-HA and DRBD
-- OSS for High-Availability April, 2005
The Linux-HA Project
Linux-HA is the oldest high-availability project for Linux, with the largest associated community
The core piece of Linux-HA is called “heartbeat”(though it does much more than heartbeat)
Linux-HA has been in production since 1999, and is currently in use on about ten thousand sites
Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others
Linux-HA is shipped with every major Linux distribution except one.
-- OSS for High-Availability April, 2005
Linux-HA Release 1 Applications
Database Servers
Load Balancers
Web Servers
Custom Applications
Firewalls, routers, DNS, DHCP
Retail Point of Sale Solutions
Authentication
File Servers
Proxy Servers
Medical ImagingAlmost any type server application you can think of – except SAP
-- OSS for High-Availability April, 2005
Selected Linux-HA customersLos Alamos (US) National LabsLos Alamos (US) National Labs – linear accelerator badge reader
EmageonEmageon – medical imaging for hospitals and clinics
ISO New EnglandISO New England manages power grid using ≈ 20 Linux-HA clusters
Various Firewall, DNS, DHCP productsVarious Firewall, DNS, DHCP products use Linux-HA basically embedded
Karstadt, Circuit City, Autozone Karstadt, Circuit City, Autozone use Linux-HA in each of several hundred stores
MAN Nutzfahrzeuge AGMAN Nutzfahrzeuge AG – truck manufacturing division of Man AG
AutostradaAutostrada – 230 clusters across Italy
BBCBBC – Internet Infrastructure
Citysavings BankCitysavings Bank in Munich (infrastructure)
Bavarian Radio StationBavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City
The Weather ChannelThe Weather Channel (weather.com)
SonySony (manufacturing)
IncredimailIncredimail bases their mail service on Linux-HA on IBM hardware
University of Toledo (US)University of Toledo (US) – 20k student Computer Aided Instruction system
-- OSS for High-Availability April, 2005
Linux-HA Release 1 capabilities
Supports 2-node clusters
Can use serial, UDP bcast, mcast, ucast comm.
Fails over on node failure
Fails over on loss of IP connectivity
Capability for failing over on loss of SAN connectivity
Limited command line administrative tools to fail over, query current status, etc.
Active/Active or Active/Passive
Simple resource group dependency model
Requires external tool for resource monitoring
SNMP monitoring
-- OSS for High-Availability April, 2005
Linux-HA Release 2 capabilities
Built-in resource monitoring
Support for the OCF resource standard
Much Larger clusters supported (>= 8 nodes)
Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP)
XML-based resource configuration
Configuration and monitoring GUI
Support for GFS cluster filesystem
Multi-state (master/slave) resource support
Initially - no IP, SAN monitoring
-- OSS for High-Availability April, 2005
Resource Objects in Release 2
Release 2 supports “resource objects” which can be any of the following:
Primitive ResourcesOCF, heartbeat-style, or LSB resource agent scripts
Resource Incarnations – need “n” resource objects - somewhere
Resource groups – a group of resources with implied co-location and linear ordering constraints
Multi-state resources (master/slave)Designed to model master/slave (replication) resources (DRBD, et al)
-- OSS for High-Availability April, 2005
Basic Dependencies in Release 2
Ordering Dependencies
start before (implies stop after)
start after (implies stop before)
Mandatory Co-location Dependencies
must be co-located with
cannot be co-located with
-- OSS for High-Availability April, 2005
Resource Incarnations
Resource Incarnations allow one to have a resource which runs multiple (“n”) times on the cluster
This is useful for managing
load balancing clusters where you want “n” of them to be slave servers
Cluster filesystems
Cluster Alias IP addresses
-- OSS for High-Availability April, 2005
Security Considerations
Cluster: A computer whose backplane is the Internet
If this isn't scary, you don't understand...
You may think you have a secure cluster network
You're probably mistaken now
You will be in the future
-- OSS for High-Availability April, 2005
Secure Networks are Difficult Because...
Security is not often well-understood by adminsSecurity is well-understood by “black hats”Network security is easy to breach accidentally
Users bypass it
Hardware installers don't fully understand it
Most security breaches come from “trusted” staffStaff turnover is often a big issue
Virus/Worm/P2P technologies will create new holes especially for Windows machines
-- OSS for High-Availability April, 2005
Security Advice
Good HA software should be designed to assume insecure networks
Not all HA software assumes insecure networks
Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication
Crossover cables are reasonably secure – all else is suspect
-- OSS for High-Availability April, 2005
References
http://linux-ha.org/
http://linux-ha.org/download/
http://wiki.linux-ha.org/NewHeartbeatDesign
New Web site content (a work in progress)
http://wwnew.linux-ha.org/(prettier)
http://wiki.linux-ha.org/(editable)
http://wwnew.linux-ha.org/SuccessStories
www.linux-mag.com/2003-11/availability_01.html
http://www.linuxvirtualserver.org/
http://drbd.org/
-- OSS for High-Availability April, 2005
Legal Statements
IBM is a trademark of International Business Machines Corporation.
Linux is a registered trademark of Linus Torvalds.
Other company, product, and service names may be trademarks or service marks of others.
This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.