EuroPar Present 5 - Louisiana Tech Universitybox/hapc/docs/EuroPar2003_Ibrahim.pdfEURO-PAR 2 003 2...

Preview:

Citation preview

12 30 0EURO-PAR

EURO-PAR CONFERENCE

Carrier Grade Linux Platforms: Characteristics and Ongoing Efforts

Ibrahim.Haddad@Ericsson.com

2 0 0 3

22 30 0EURO-PAR

Presentation Objective

In this presentation, we will discuss thecharacteristics of telecom platforms and present

the ongoing development efforts to build Carrier Grade Linux Platforms, both by the

industry and in Open Source.

32 30 0EURO-PAR

The Open System Lab

The Open System Lab isunder the IP Network Organization at the Ericsson ResearchCorporate Unit.

The lab is located in Montreal, Canada.

42 30 0EURO-PAR

The Open System Lab

People• Networking Competence • System Competence • Linux and Open Source Competence• Web Services Competence• Security (Network, GPRS, GTP, Platform, Application)

Universities• High involvement from Graduates (Master and PhD)• Professors on Sabbatical

Lab• Cluster technology (over 300 Processors)• Security System Test Lab• IPv6 Lab• Router Research Lab• End-to-end simulation and emulation Lab

52 30 0EURO-PAR

What is Carrier Grade?

• Carrier Grade is a term for public network telecommunications products that require a reliability percentage up to 5 or 6 nines, or 99.999 to 99.9999 percent. – This translates into 5 minutes (5 nines) to 30 seconds (6 nines) of

downtime per year.

– 5 nines is usually associated with Carrier Grade servers

– 6 nines is usually associated with Carrier Grade switches

• Carrier Grade Linux is a new flavor of Linux that is more robust than the garden-variety enterprise Linux. It promises to provide a standards-based, open-architecture software platform for converging telecommunications.

62 30 0EURO-PAR

Faster Than You Think

10 minute Multimedia Presentation

72 30 0EURO-PAR

Outline

• The Mobile Internet & Next Generation Networks

• Characteristics of Telecom Platforms

• Why Linux?

• The Server Platform (TSP)

• Commercial Efforts

• Open Source Development Lab

• Service Availability Forum

• Open Cluster Framework

• Open System Lab Experiences

• Conclusions & Challenges

82 30 0EURO-PAR

The Mobile Internet

92 30 0EURO-PAR

The Mobile Internet

• The Mobile Internet will become a part of our everyday business and personal lives: shopping, gaming, messaging, multimedia, entertainment, banking, …

102 30 0EURO-PAR

“Always-on” Revolution

• “Always-on” Capability– Every device needs its own permanent IP address– Today’s dynamically assigned IP addresses will no longer be

available

• The Future– Millions and millions of:

• new Internet devices• new Internet users

– Internet available everywhere, all of the time

112 30 0EURO-PAR

Classes of Applications

M-Commerce Multimedia & Entertainment

Location Services MessagingPersonalization

122 30 0EURO-PAR

The Internet – Today vs. Tomorrow

• The Internet Today– Millions of users– Web, email, audio & video, …

• The Internet Tomorrow (NGN)– Billions of users and devices– New technologies leading to novel applications– Building a real information society that is ‘Always-on’– Global village is a reality– Convergence of applications and services

132 30 0EURO-PAR

A networked society with always-on connectivity

• Different types of network infrastructures are linked through a common protocol (IP).

• New novel multimedia applications (using SIP) have different requirements to be met by the underlying IP protocol.

IP

142 30 0EURO-PAR

Next Generation Networks

152 30 0EURO-PAR

Current Generation Network ArchitectureServices

Access Transport & Switching Networks

• Multiple, single purpose networks.

• Bundled services are difficult.

Wire

line

Net

wor

ks

Dat

a/IP

Net

wor

ks

Cab

le T

V N

etw

orks

Wire

less

Net

wor

ks

162 30 0EURO-PAR

Next Generation Network Architecture

Clients

• Single, multipurpose backbone.

• Bundled services.• Easy service creation.• Network management.• Consolidate billing.

– Ex: mobile + fixed + DSL

User applications Service Networks & Control & Gateways

Communications control

Connectivity(wireless, narrowband, broadband)

172 30 0EURO-PAR

Next Generation Platforms

182 30 0EURO-PAR

Requirements for the New Telecom World

• Access independent• High capacity• Scalable• Reliable• Real-time with very low latency• QoS enabled• Support for multi-media services • Manageable

192 30 0EURO-PAR

Market Requirements (1/2)

• 99.999% Availability (Five nines).• Global Platform with a Full Range of Access Gateways.• Open API for rapid creation of new services.• New Multi-Media Services for Business and Residence.• Full voice/data feature transparency with existing circuit

networks.• PSTN voice quality and reliability.• ITU Compliant; Standards Based Interfaces.

202 30 0EURO-PAR

Market Requirements (2/2)

• ATM and IP Protocols with Inter-Working.• Scalable up to Millions of Lines.• Programmable Feature Server Supporting 3rd Party

Development.• Can Link to Any Vendor’s ATM Switch or IP Router.• Web Based Service Customization.• …

212 30 0EURO-PAR

Drivers for IP Technology

222 30 0EURO-PAR

Carrier Class IP Telephony and Multimedia Services

• Traditional Internet service is Best Effort– Packets suffer from loss, delay, etc...– Packet loss mainly caused by router congestion, not line

transmission errors– One-way packet delay is more correlated with number of hops than

geographical distance. – Best Effort service is enhanced by TCP through retransmission

and sequencing.

• Real-time applications cannot be supported by TCP

232 30 0EURO-PAR

IP Transport Benefits

• Lower cost infrastructure

• Faster provisioning of new features

• Easy integration of network elements

• Stimulate creation of new services

• Simplify harmonization of standards

242 30 0EURO-PAR

Drivers of IP Technology (1/2)

• IP is gradually becoming a dominating transport technology thanks to recent advances in optics and routing technology and the impact that these have had on price/performance.

• When combined with other key technologies, such as IP-based virtual private networks (VPN), IP enables a new generation of advanced multi-service networks.

• The use of a common infrastructure based on a single technology simplifies network implementation and operation and helps reduce costs.

252 30 0EURO-PAR

Drivers of IP Technology (2/2)

• There are two main arguments that drive the integration of IP technology into mobile core networks:– Support for (new) IP applications to generate (new) revenues;– A common transport technology to reduce costs.

• The entire mobile telecommunications industry is funded out of the end-user’s pocket. To ensure future growth in the industry, the end-user value needs to be enhanced.– Service and application offerings are the prime drivers of the entire

network and terminal evolution.

262 30 0EURO-PAR

Telecom Platforms Characteristics

Hardware & Software Features

272 30 0EURO-PAR

General Hardware Features (1/2)

• Telecom equipment have to deliver dependable and reliable performance during the conditions encountered under normal and abnormal circumstances (such as natural disasters).

• Telecom equipment have a high safety requirements and must not cause any risks or hazards to personnel, other equipment, or the physical structure where they are placed.

• Telecom equipment must comply with NEBS testing and design requirements to help make the equipment operate properly and safely.

– Network Equipment Building Systems (NEBS) are generic standard requirements to provide safety and reliability.

– NEBS requirements allow operators to use a single, uniform set of rules to evaluate the telecom equipment they plan to deploy fairly and impartially.

• Other standards are commonly used; however, they are either unidirectional or limited in scope.

282 30 0EURO-PAR

General Hardware Features (2/2)• Comply with standard telecom rack dimensions• Provide redundant hot-swap/hot-insertion power supplies• Provide hot-swap/hot-insertion disk drives• Comply with -48 Volts• Provide hot-swap/hot-insertion processors with multiple Ethernet ports• Provide hot-swap/hot-insertion network cards • Provide how-swap/hot-insertion tape drives• Multiple boot options

– Processors should be able to boot through the network [network 1, network 2 for redundancy], Flash, floppy and CD-ROM when connected)

• Provide remote diagnostics support and alarms to monitor temperature• …

292 30 0EURO-PAR

Clustered Telecom Platforms

• High Availability: Isolate or reduce the impact of a failure in the node, resources, or device through redundancy and fail over techniques.

• Scalability: Expand the capacity of servers in terms of processors, memory, storage, or other resources to support subscribers/trafficgrowth

• Improved processing speed: High performance, fast access timeand response time

• Efficient resources utilization: through load balancing & traffic distribution among all nodes of the cluster server

• Manageability: Reduce system management costs through appropriate system management facilities / middleware

Current trends in the telecom platforms space is to move towardsclustered platforms for the benefits they offer:

302 30 0EURO-PAR

Clustering in Telecom Platforms

Clustering is the use of multiple “loosely coupled, nothing shared”nodes, to form what appears to users as a single highly available system.

Reliable & Fault-tolerant processor interconnect

Processor

Operating System

Middleware

Application

312 30 0EURO-PAR

Clustering in Telecom Platforms

• N + M redundancy of processors

• Mated pairs

• Fast inter-processor communication (TCP/IP is not fast enough)

• Single view of data

• Single view of platform (cluster)

322 30 0EURO-PAR

Uptime

• The main operator requirements remains at 30 seconds of system interruption per year including hardware and software upgrade.

• Target 99.9999% uptime– Apply to an overall solution that involves integrated high-availability

hardware, software (OS and middleware), and the application.

• Ensure uptime of mission critical applications, software subsystems, and hardware platforms.

332 30 0EURO-PAR

High Availability (1/2)

• High availability (HA) is a term for the technology that enhances the uptime of computer-based communications systems by distributing functionality across multiple CPUs.

• In response to hardware and software failures, HA systems facilitate the rapid transfer of control (failover) from a faulty CPU, peripheral, or software component to a functional one, while preserving operations or transactions in-progress at the time of failure.

342 30 0EURO-PAR

High Availability (2/2)

• Error Detection• Damage Containment• Error Recovery• Fault Treatment (incl. dynamic reconfiguration)

• Assumption: We are dealing with systems comprising clusters of processors which share nothing.

352 30 0EURO-PAR

Availability Defined

• Availability is best defined as:

MTBF MTBF: Mean Time Between Failure------------------- MMTR: Mean Time To RepairMTBF + MTTR

• Example: – If a system offered a MTBF of 20,000 hours with a MTTR of 2

hours, then its availability would be 99.99%, “4-nines.”

362 30 0EURO-PAR

Achieving High Availability

• A complete high availability solution that demonstrates 5-nines requires close integration of:

– High-availability hardware,

– A robust high-availability software solution,

– High-availability middleware, and

– Application software that can cause failover to redundant systems.

372 30 0EURO-PAR

High Availability Levels (1/2)

Source: IMEXHA Report

382 30 0EURO-PAR

High Availability Levels (2/2)

Source: Intel

392 30 0EURO-PAR

Redundancy in HA Systems

• One important characteristic of HA systems is the redundancy of key subsystems.

• A highly available system includes: – Redundant Ethernet to ensure constant networking connections – Disk mirroring to ensure high levels of data reliability– Redundant power supplies– …

• Example: CGL includes an enhanced kernel with hardened device drivers and fault response behaviors, such as panic handler improvements, that ensure the application logs appropriate messages and sends notifications before a kernel panic.

402 30 0EURO-PAR

Other HA Features (1/2)

• When defining the high availability platform, some requirements are to support hot swap (reduce MTTR), remote boot, diskless operation, …

• Concepts: – hot insert (adding cards to the system not originally in place at boot

time), – hot remove, and – identity maintenance (maintaining device identities across hot

swaps and system boots).

• Different implementations of Linux support these features.

412 30 0EURO-PAR

Other HA Features (2/2)

Some other high availability challenges are:

• Flexible options for booting compressed and remotely hosted kernel images.

• Support of compressed r/w and read-only Flash file systems.

• Accelerated boot and daemon start times from several minutes to seconds

• Speeding shutdown

• Eliminating costly file system operations with journaling file systems.

422 30 0EURO-PAR

Non-Stop Operations

• No single point of failure• No scheduled downtime• HA failover software• Hot plugged components• Software Configuration Control:

– Automatic restart of processes that originally executed on a faulty processor on the ones that are working

– Self healing• In-service upgrade of software with no disturbance to

operation

432 30 0EURO-PAR

Zero Downtime

• Zero Downtime Operation is achieved through

– Data Distribution

– On-line Process Replication

– Software Fault Tolerance

442 30 0EURO-PAR

Fast Recovery of Applications

• Maximize availability of application and services through application of specific monitoring and detection of failures.

• Provide automatic failover and recovery capabilities with very minimum interruptions to the users.

452 30 0EURO-PAR

Failover

• Transparent failover capabilities: applications and users are automatically and transparently reconnected to another system. Update transactions are rolled back.

• Failover performance: depends on hardware configuration, instance recovery time and workload at time of failover.

• Multiple Failure Robustness: Should be able to survive multiple node failures and still provide protection for the mission critical applications.

462 30 0EURO-PAR

Fault Tolerance (1/2)

• Hardware Fault Tolerance– Relies on redundant processors, power supplies, Ethernet cards,

disk storage, .. – If an active component fails:

• It is isolated, • A standby component becomes operational, • The load is shared among available active components.

• Software Fault Tolerance– Uses a combination of software and hardware redundancy to

provide the necessary backup hardware in case of failure. – The software becomes responsible for duplication, update and

synchronization of information across redundant hardware components.

472 30 0EURO-PAR

Fault Tolerance (2/2)

• Allows single site to expand to multi-site for disaster tolerant solutions.

482 30 0EURO-PAR

Load Balancing

• Have a mechanism to balance workloads after a node failure, to minimize the performance impact on other applications running on the platform.

492 30 0EURO-PAR

Concurrent Maintenance

• Allow scheduled maintenance to be performed on one node of a cluster while other nodes continue to provide service without noticeable degradation.

502 30 0EURO-PAR

Online Configurability

• Allow online configurability to reduce downtime.

• Allow recovery to be either in active/stand or active/active modes to mitigate any idle systems.

512 30 0EURO-PAR

Manageability

• Provide policy based management across entire spectrum of applications, software, and hardware.

• Single point of control for data management: – Data movement– Security– Backup– Recovery

• Assure availability and performance using Service Level Agreement type business approach.

522 30 0EURO-PAR

Scalability

• Support linear growth as modular addition of HW and SW components happens.

• If we double the number of processors, we should expect to almost double the throughput of the system.

532 30 0EURO-PAR

Database High Availability Requirements

• If a node or interconnect failure occurs. The surviving nodes perform recovery as:

– Cluster reorganization: OS cluster monitor determines cluster membership

– DLM Lock Rebuild: Time is proportional to the number of locks at crash

– Database Recovery: Cache recovery & transaction recovery

542 30 0EURO-PAR

High Availability Storage

• HA storage is a critical and necessary part of a fault tolerant system.

• Hardware and software RAID support

552 30 0EURO-PAR

Hardware Architecture

• Hardware redundancy• Automatic software error recovery• On-line backup• Hot swap hardware replacement• Adaptive hardware configuration • Geographical node redundancy

562 30 0EURO-PAR

Online OS and Applications Upgrade

• Support remote upgrade of operating systems and application software without any system interruption or disturbance.

• Support smooth software upgrades when old and new version of same process can coexist.– Provide mechanisms to upgrade an application which is running;

– The system will deal with an old and new running versions of theapplication simultaneously.

• Possibility for application to arrange state transfer between old and new static process.

572 30 0EURO-PAR

Capacity Control

• Overload protection by selectively rejecting messages when message queues become too long.

• Load regulation by API asking kernel if it is ok to accept traffic.

582 30 0EURO-PAR

High Reliability

• The database used is distributed among all the processors in thesystem. – If a processor crashes, other processor in the system takes over.

• The same mechanisms are also used for identifying and handling software faults.

• There is no risk of a software fault hanging the system: if a fault occurs, the system is automatically restarted in real time.

592 30 0EURO-PAR

High Performance

• Must be fast enough to meet requirements– Design cost/hardware cost trade-off– Beware of unnecessary sub-optimization!

• Real-time performance is critical to many carrier grade applications.

• The ability to respond quickly and predictably to external events is a key feature of availability.

602 30 0EURO-PAR

Software Management

• The software management layer provides the following services:

− Initial software loading

− Software reloading

− Software upgraded handling

− Hardware Configuration Management

− Loading configuration

612 30 0EURO-PAR

Security

• Access control• IPsec, etc

622 30 0EURO-PAR

Key Software Issues (1/2)

• Provide a fast low level protocol for inter-processor communication

• Provide as high level support for high availability in middleware/OS– Cluster management / equipment management / alarm management– Application start / restart / hand-over / fail-over / fail-back– Replication of data (configuration data, state data)– Mechanisms for software update– Naming and addressing

• Use middleware/OS mechanisms in a consistent way

632 30 0EURO-PAR

Key Software Issues (2/2)

• The design of the software architecture of the application * from the start * should take into account:– Scalability– Failure Handling– Error (software bug) handling– Future modification– Hot software upgrade

• Standard O&M interfaces– SNMP, Corba, HTTP– Common LCT components

642 30 0EURO-PAR

Common Traps (1/2)

1. Design a non scaleable and non HA aware application and leave scalability and HA issues to later phases.

2. An OS and Middleware can help provide scalability and HA but they can’t provide it alone!

Scalability and HA must be designed into the application from the start, or the application will have to go through a major redesign later.

652 30 0EURO-PAR

Common Traps (2/2)

3. Converting an existing non scaleable and non HA aware application to use a HA framework (middleware/OS) requires a major redesign!

662 30 0EURO-PAR

Linux for Telecom Platforms

672 30 0EURO-PAR

Motivations

• Networks are converging for multimedia communications services.

• Future multimedia-type data services will require substantially greater bandwidth, requiring new architectures to reduce delivery cost.

• Commercial off-the-shelf (COTS) software components are a necessary part of these new architectures.

– Proprietary platforms, the mainstay of current architectures aremore expensive to develop than open-standards based ones.

– Open-standards based solutions expand solution options and reduce time to market.

682 30 0EURO-PAR

Vision

Next generation and multimedia communication services are delivered using Linux-based open standard platforms for Carrier Grade infrastructure equipment.

692 30 0EURO-PAR

Service and Application Layer

Service and Application Layer

Moving from Proprietary to Open Solutions

NEP proprietaryNEP proprietary

Application stacksand platforms

Application stacksand platforms

MiddlewareMiddlewareSA forum APIsSA forum APIs

From proprietary solutions to

standardized, modular solutions

Carrier-Grade OSCarrier-Grade OS

AdvancedTCA (ATCA)AdvancedTCA (ATCA)

702 30 0EURO-PAR

Towards Open Solutions

Utilization of old infrastructures and technologies basedon obsolete standards for modern requirements:

– Proprietary, closed system worlds– Insufficient support of current and upcoming standards– High administration and management costs– Ongoing utilization only possible with continuously increasing costs– Expired product life cycles– Proprietary infrastructures did not take today's requirements into

consideration

How can the requirements of tomorrow be met?

712 30 0EURO-PAR

Economic and Strategic Challenges

Focus on generation of revenues with new data services:– Networks supporting various service types are needed– Verifiable ROI in legacy networks and equipment– Expandable, flexible, modular solution approaches for new

services– Telecommunications platforms from modular HW/SW kits– Reusability of these platforms– Simplified, central, uniform administration– Need for flexible, stable, powerful standard platforms

Cost reduction, standardization, reusability

722 30 0EURO-PAR

Linux and HA

Linux seems a good choice for OS for HA systems because of the following motivations:– Availability of source code– Standard APIs and other interfaces– Stable and robust platform– Integrated, high performance networking– Support for a broad range of processors and peripherals– No runtime royalties– Excellent performance in terms of throughput and real-time response

Benefits: lower overall cost of deployment and faster time to market.

732 30 0EURO-PAR

Linux: The Alternative to Proprietary OS (1/2)

• Linux was developed in networks for networks• Open development (Open Source)• High innovation rate• Scalable for all kinds of requirements and infrastructures• Independence from manufacturers and service providers• New technologies and requirements are adapted in a more

efficient and standardized manner than for any other operating system

742 30 0EURO-PAR

Linux: The Alternative to Proprietary OS (2/2)

• New IP features are introduced between 6 to 18 months later on Solaris compare to Linux (or FreeBSD).

• Linux is available on all hardware/processors architecture (not dependent on a single hardware/processor vendor).

• We have access to source code in order to rapidly fix faults or add features to the kernel when required.

• We can contribute to the Open Source the required “hooks” for efficient integration of the upper-layer HA middle-ware.

752 30 0EURO-PAR

Linux: General Characteristics

• Availability and cost• (Soft) Real Time characteristics• Performance & Scalability• Reliability• Flexibility • Openness

– Hardware– Languages– Interoperability– 3rd party software– Open Development Environment

762 30 0EURO-PAR

Open Hardware – Commercial HW Solutions

Commercially Available:- cPCI based Processor cards - Standard 100bTX Ethernet

Switches- Off-the shelf Peripherals

– Ethernet Switches– CD ROM Drives– Tape Drive– Hard Drives

Future Proof Architecture:- Follow industry

Price/PerformanceCurve (Moore’s Law): Faster CPUs at Lower prices

- Quick Availability of New Processors

772 30 0EURO-PAR

Ericsson TSP Platform

The Server Platform: A Case Scenario

782 30 0EURO-PAR

Characteristics (1/2)

• Very high availability and robustness implemented in SW handles both HW and most SW errors. Modular HW with no single point of failure. (achieved Five 9:s, 99.999% uptime).

• Soft Real Time, scheduling, communication and Database Management System (DBMS) tailored for high performance.

• Linear Scalability by using loosely coupled processors.

• Zero downtime operation

792 30 0EURO-PAR

Characteristics (2/2)

• Open hardware solution, Currently Intel/Pentium processors and easily portable to other architectures and processes

• Languages: C++, Java

• Open interfaces– Interoperability: CORBA/IIOP, TCP/IP, Java RMI– 3rd party SW: Java and Linux

802 30 0EURO-PAR

Processors Running Linux

Std PC HW

TelORB MW+

TelORB OS( = DICOS)

O&M&P + #7

Application

Std PC HW

TelORB MW+

TelORB OS( = DICOS)

O&M&P + #7

Application

Std PC HW

TelORB MW+

TelORB OS( = DICOS)

O&M&P + #7

Application

Std PC HW

TelORB MW+

TelORB OS( = DICOS)

O&M&P + #7

Application

Std PC HW

MW+

Prop. OS

O&M&P + #7

Application

Std PC HW

Linux

TelORBMiddleware

O&M&P + #7

Application

Std PC HW

Linux

TelORBMiddleware

O&M&P + #7

Application

Std PC HW

Linux

TelORBMiddleware

O&M&P + #7

Application

Std PC HW

Linux

TelORBMiddleware

O&M&P + #7

Application

Std PC HW

Linux

Middleware

O&M&P + #7

Application

812 30 0EURO-PAR

Linear Scalability

Capacity grows linearly asprocessors are added.

New processors can beadded without disturbance.

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 ...

822 30 0EURO-PAR

Sample TSP Applications

• HLR Home Location Register• AC Authentication Center• SCP Service Control Point• MG Mobility Gateway• SCS Service Capability Server• AAA Authentication Authorization and

Accounting• CCN Cost Control Node• …

832 30 0EURO-PAR

Advantages

• Standard interfaces through CORBA, TCP/IP, Java RMI• Linux in the cluster means openness for 3p SW• Based on commercial processors• Fault tolerance implemented in software• Standard languages: C++, Java• Fully scalable architecture• Soft Real-time OS• Includes powerful middleware: A database management system and

functions for software management• Fully compatible simulated environment for development on

Solaris/Linux workstations

842 30 0EURO-PAR

Why Open Standards?

852 30 0EURO-PAR

Open Standards (1/2)

• Open standards are a key reason why equipment providers are moving toward Linux-based solutions

• Creating platforms based on open standards:– Ensures interoperability with third-party software, and – Makes maintenance and application development much easier

• Therefore, utilizing the standard Linux kernel and adhering to key Linux standards is essential

862 30 0EURO-PAR

Open Standards (2/2)

• There are many standards-related activities in the industry to define hardware and software high availability:– PICMG Group (PCI Industrial Computer Manufacturers): defining

standards for high availability hardware. PICMG specifications includeCompactPCI, rackmount applications and PCI/ISA for passive backplane, standard format cards.

– Service Availability Forum (SA Forum): focusing on APIs for hardware platform management and for application failover in theapplication API.

– Open Source Development Lab (OSDL): defining specifications for Carrier Grade Linux.

872 30 0EURO-PAR

Ongoing Efforts

Open Source Development LabSoftware Availability ForumOpen Cluster Framework

882 30 0EURO-PAR

Open Source Development Lab:Towards Carrier Grade Linux

892 30 0EURO-PAR

OSDL Lab Facts

• OSDL founded to support Linux community– Industry supporting developers – Vendor neutral virtual lab– Non-Profit

• First lab opened in Beaverton, OR, USA• Second facility in Yokohama, Japan • Carrier Grade Linux working group formed January, 2002• Data Centre Linux working group formed August, 2002

902 30 0EURO-PAR

OSDL Sponsors

912 30 0EURO-PAR

What OSDL is and is not

• OSDL is:– A group of companies and individuals working together– Working within the current open source processes– A resource for the Linux development community

• OSDL is not:– A Linux distributor

• OSDL works with the distributors• OSDL benefits the community

– An ISV• OSDL only works on open source• OSDL produces no products just specs & code

922 30 0EURO-PAR

CGL Working Group

A forum of industry leaders to support and accelerate the development of Linux functionality for

telecommunication applications

MEMBER COMPANIES

932 30 0EURO-PAR

Working Group Organization

• Committee and board members must either:

– Represent an OSDL member company

– Represent an OSDL affiliated organization directly involved with the technology

– Be an OSDL employee

Steering CommitteeTechnical Board Marketing Board

SpecificationsProof of Concept

Validation

CollateralEvents

Tech Marketing

• Chairperson must represent an OSDL member company

• Team members must either:– Represent an OSDL member company– Represent an OSDL affiliated organization directly

involved with the technology– Be an OSDL employee– Act as an individual with no employer recognition

942 30 0EURO-PAR

CGL Architecture

Solution-specific components to be defined by vendors

Scope of the Carrier Grade Linux Working Group

Applications

Middleware Components

High Availability Hardware Platforms

High Availability ComponentsHA PlatformInterfaces

HA ApplicationInterfaces

Java CORBA Databases ...

Linux OSwith Carrier Grade Enhancements

Standard Interfaces(LSB, POSIX...)

High Availability Interfaces Service Interfaces

Hardened Device Drivers Co-Processor InterfacesHardware Configuration and

Management Interfaces

952 30 0EURO-PAR

Target CGL Applications

• Gateways– Bridges between two different technologies or administrative domains (e.g.,

converting Time-Domain-Multiplexed networks to IP-based).– A gateway maintains a large number of connections in real-time over a large

number of interfaces without losing a frame or packet. – Gateways are implemented on dedicated platforms with replicated (rather than

clustered) systems for redundancy.• Signalling Servers

– Handle call control, session control, radio recourses, handles routing and maintains the status of calls over the network.

– 10s of thousands of connections.• Management servers

– Customer data - configuration, local, personal preferences, handle traditional network management operations as well as service and customer management.

– Far less stringent response time requirements.

962 30 0EURO-PAR

CGL Process

Gather Requirements

Publish Specifications

Identify and Evaluate OSS Projects

Build Proof of Concept Implementations

Construct Validation Suite

Establish Certification Criteria

Certify Distros / Implementations

972 30 0EURO-PAR

Specifications Characteristics

• The specification promotes:– Portability– Ease of programming– Software availability for telecom developers looking to implement Linux in an equipmentdesign.

• The specification focuses on:– High availability– Performance– Adopting technologies that promote high availability and service

availability.

982 30 0EURO-PAR

Specifications Characteristics

• The working group hopes to develop “standards” that make it easier to avoid problems in coding, and thus improve the reliability of systems.

• These standards will ensure that companies have choices for Carrier Grade Linux but that all of them will:– Meet the specification– Support a rich set of high availability features– Have consistent interfaces and functionality

992 30 0EURO-PAR

CGL Summary - To Date

• CGL Working Group Started January, 2002

• Latest revision of official released specs and white papers released August 2002– CCL Technical Scope White Paper– CGL Requirements Definition v1.1– CGL Architecture Specification v1.1– CGL Validation Framework v1.1

1002 30 0EURO-PAR

CGL Specs

• CGL specs are divided into Categories– Standards– Platform– Availability– Serviceability– Tools– Performance– Security– Scalability

• Assigned a Priority– Priority 1 - Current requirements– Priority 2 - For inclusion in the next version of specs– Priority 3 - For future inclusion (no version implied)

1012 30 0EURO-PAR

CGL Specs …

• Given a Version Assignment (1.x)– Core: Required feature - must be present and functional in any implementation.– Configurable: Must be available, end implementation has option to disable.– Not Applicable: Features not requiring code to be satisfied.

• Given a Version Assignment (2.x)– Default Setting: specifies whether the functional implementation is expected to be

functionally enabled when a CGL system is configured.• On -- the function must be functionally enabled by default.• Off -- the function must be functionally disabled by default but must still be

available to be enabled.– Toggle Control: specifies whether the functional implementation is required to

support being toggled between the enabled and disabled states without rebuilding.• Yes -- the function must support the capability of being toggled (moved from

disabled to enabled or enabled to disabled).• No -- the function is not required to support the capability of being toggled.

1022 30 0EURO-PAR

Standards Requirements

• OSDL CGL does not create standards - but it specifies those required for compliance.

• Requirements that reference specifications controlled outside of the OSDL CGL workgroup that are important to carrier server systems.– Compliance to standards a key to adoption of Open-Standards, off the

shelf components.

• OSDL CGL v1.1 discusses 4 broad categories of standards– Linux Standard Base Compliance– POSIX interfaces– IETF RFCs - involving IPv6 and protocols like SCTP … – Service Availability Forum Compliance

1032 30 0EURO-PAR

Standards Requirements - Priority 1

• Linux Standard Base 1.2 Compliance– The goal of the LSB is to develop and promote a set of standards

that will increase compatibility among Linux distributions and enable software applications to run on any compliant system.

– Using the version 1.1 test suite - with certain defined and limited exceptions, acknowledging the variety of carrier platforms.

• POSIX Interface Compliance– Following the Austin Group specs, a.k.a. IEEE Std 1003.1-2001– Specific compliance: Timers, Signals, Message Queuing,

Semaphores, Event Logging, and Threads• SNMP - including agent versions 1, 2, and 3• IPv6, IPSECv6, MIPv6 + a long list of IETF RFCs

1042 30 0EURO-PAR

Platform Requirements

• Requirements that support interactions with the hardware platforms making up carrier server systems.

• "Platform" capabilities are vital building blocks, innately closer to the hardware than the "availability" and "serviceability" categories.

• OSDL CGL does not specify platforms and architectures, rather it specifies platform capabilities.– Platform capabilities are not tied to a particular vendor's

implementation.– The specification may suggest model implementations which are

tied to platforms - but do not require them.

1052 30 0EURO-PAR

Platform Requirements - Priority 1

• Hot swap, hot insert, hot remove, hot device identity– When devices are capable of being changed on a running system,

OSDL CGL will support it.

• Remote boot, no console operation, diskless systems, boot cycle detection– Ensure that remote systems can be managed without a human

presence when trouble occurs.

• Proprietary (binary-only) modules will be permitted by default

1062 30 0EURO-PAR

Availability and Serviceability Requirements

• Availability: Requirements that support heightened availability of carrier server systems, such as improving the robustness of SW components or by supporting recovery from failure of HW or SW.

• Serviceability: Requirements that support servicing and managing HW and SW on carrier server systems.

• This a wide-ranging set requirements that, put together, help support the availability of applications and the operating system.

1072 30 0EURO-PAR

Availability Requirements - Priority 1

• Watchdog timer helps detect OS failures.

• Application heartbeat helps detect application failures.

• RAID Support: Mirroring will be supported.

• Resilient file system support: support existing and future file systems that can quickly recover when required.

• Disk and volume management: File systems can span physical disks and be enlarged without un-mounting or rebooting.

• Multiple Ethernet NIC bonding and failover: Be able to aggregate bandwidth over multiple NICs and support failover of IP addresses from NIC to NIC.

• Hardened driver support: OSDL CGL will encourage the development of robust drivers.

1082 30 0EURO-PAR

Serviceability Requirements - Priority 1

• Resource monitoring– OSDL CGL will introduce specifications and frameworks to monitor a system's

resources and their health.– Such a comprehensive standard does not yet exist.

• Kernel dumps– OSDL CGL will support producing and storing kernel dumps.– (Kernel dumps are not a standard part of Linux ).

• Kernel message structuring– Provide an event log mechanism that permits more detailed and consistent

application error logging.• Platform signal handler

– Provide a handler so that hardware errors are logged in the above event log.• Remote access to event log

– Provide a mechanism to access the event log remotely.• Dynamic debug/probe insertion

– Provide a specification to permit instrumentation of live applications and the kernel.

1092 30 0EURO-PAR

Tools Requirements

Requirements that support auxiliary capabilities not directly involved innormal execution of carrier server systems, for example debuggers usedto develop modules, drivers or applications.

• Provide capabilities to facilitate diagnosis

• User-level (gdb) debugging support for threads– The GNU debugger needs enhancements for threaded processes - and

are underway as part of the Native Posix Thread Model implementation.

• Kernel Debugging– Linux does not natively support a kernel debugger; but it is a requirement

for debugging CGL systems.

1102 30 0EURO-PAR

Performance Requirements

Requirements that support performance levels necessary for theenvironments expected to be encountered by carrier server systems.

• Soft real time support: Specifies a target scheduling latency of 10ms or greater.

• Pre-emptible kernel: The kernel will provide support for pre-emption which reduces the latency of the kernel. It allows processes to be preempted even if in kernel mode. – The resulting system response is greatly increased.

• Raid 0 support: Striping is required for enhanced disk throughput• Application (pre) loading: Applications can be pre-loaded and pinned

to prevent demand paging during execution.

1112 30 0EURO-PAR

Other Requirements Categories

• Scalability: – Requirements that support vertical and horizontal scaling of carrier

server systems such that addition of HW resources results in acceptable increases in capacity.

• Clustering:– Requirements that support the use of multiple carrier server

systems. – This is to support higher levels of service availability through

redundant resources and recovery capabilities, and to provide a horizontally-scaled environment supporting increased throughput.

1122 30 0EURO-PAR

CGL V2.0 Status

• V2.0 started October, 2002• 3 major components:

– Clustering Specification– Security Specification– General System Requirements

• Public drafts of each will be released early and often– Security and General OS Requirements released as public drafts

in April 2003.– Clustering public draft released May 2003.

• Final release of V2.0 specifications: October 2003

1132 30 0EURO-PAR

Ericsson Contributions to Carrier Grade Linux

• Asynchronous Event Mechanism (AEM)• Telecom IPC (TIPC)• Distributed Security Infrastructure (DSI)

All released to Open Source under the GPL License.

1142 30 0EURO-PAR

CGL Road Map (tentative)

03/10 v2.0 Specs release

04/1 v3 Specs kick-off in NY LWE 04/5 v3 Specs feature freeze04/7 v3 Specs release

1152 30 0EURO-PAR

Vanilla Linux* vs. CGL

* kernel.org

1162 30 0EURO-PAR

Linux Kernel

1172 30 0EURO-PAR

Implementations Status

• Started - A project exists, but is not considered a current candidate to fulfill a requirement as it is too new, inactive or inattentive (to OSDL CGL directions).

• Experimental - One or more projects exist and are viable candidate(s) but more work is needed to fully satisfy a requirement.

• Production - A project fully satisfies a requirement and is ready for deployment.

1182 30 0EURO-PAR

Integration with Mainstream Linux

• Takes time

• Some of the enhancements will be pushed to be integrated with the kernel 2.7

• Others will follow in later kernel releases

• Meanwhile, all enhancements will be available from sourceforge or projects web sites

1192 30 0EURO-PAR

Software Availability Forum

1202 30 0EURO-PAR

Service Availability Forum (SAF)

• A consortium of communications industry leaders and startups dedicated to producing standards to enable the development of carrier-grade communications systems from off-the-shelf hardware platforms and middleware.

• Carrier grade system provide uninterrupted user access to the services it is designed to deliver, with no loss to the continuity of those services.

• To meet this goal, the SAF is developing two layers of standard,carrier-grade interfaces: an application interface and a platform interface.

1212 30 0EURO-PAR

High Availability Hardware Platforms

Carrier Grade OS

Applications

Service Availability Middleware

ApplicationInterface

PlatformInterface

PICMG

OSDLCG Linux

SAF

DMTFOMG

HA Middleware Components-Databases-Application Servers-Communication Protocols-Directory

Open Standards Context

1222 30 0EURO-PAR

SA Forum Interfaces

ApplicationInterface

PlatformInterface

1232 30 0EURO-PAR

Application Interface

• The Application Interface provides access to a standard set of tools for application software to use

– to distribute processing over multiple computing elements, and

– to respond to failures of those elements without loss of service delivery or continuity to any user.

1242 30 0EURO-PAR

Application Interface Specification Objectives

• Promote the rapid development of applications that deliver highly dependable voice, data and multimedia services over fixed and wireless IP networks.

• Target adopters– Application developers– Value added component supplier– Platform integrators

• Lower costs by enabling and ensuring– Portability– Choice of service availability middleware vendors– Choice of adopter products and services

1252 30 0EURO-PAR

SAF Approach

• By using a standard interface to manage the physical platform, developers can write the service availability middleware independently of any particular hardware.

• This independence allows application developers to choose the best hardware platform and the best service availability middleware to fit their needs.

1262 30 0EURO-PAR

Partial Scope of Application Interface Specification Areas

HA Framework� Availability Management� Health Monitoring� Error reporting

HA Services� Checkpointing� Events� Messaging� Membership� Synchronisation

System Management� Configuration� Provisioning� Administration

Application Services� Database� Java interoperability� CORBA interoperability� External I/O: Signaling

Release 1 Scope Scope for Future Releases

1272 30 0EURO-PAR

Conclusion

• Final release 1 Spec

• Expected publication of release 1 Spec: Q2 CY03.

• Released spec will be open for comments and contributions.

1282 30 0EURO-PAR

Open Cluster Framework

1292 30 0EURO-PAR

OCF

• An open group of industry and users of clustering services who are defining some standard APIs for clustering.

• Most OCF APIs are generally intended to be usable by both high-performance and high-availability clustering platforms.

• A working group of the Free Standards Group.

• Two-pronged approach (Both proceed together)– Define standard cluster APIs– Create component-based reference implementation

1302 30 0EURO-PAR

Approach

• API Definition– Select Areas of Interest

– Create Sub teams

– Define APIs

– Reach agreement

– Publish APIs for review

– Refine APIs

• Reference Implementation– Create Plumbing/Infrastructure

– Coordinate with API definition

– Define Framework components

– Implement components

– Test result

– Provide as Open Source

1312 30 0EURO-PAR

Properties of the APIs

• Implementation Neutral (agnostic)• Royalty-Free• For OSS or proprietary software• Creates opportunities for interoperability• Focused on Linux, but not limited to Linux

1322 30 0EURO-PAR

APIs areas of Interest

• Event services• Node services• Resource Services• Recovery• Group Services• Low Level Communication Services• Fencing• DLM• External Interfaces (GUI, CLI, SNMP, logging, etc.)

1332 30 0EURO-PAR

OCF Cluster Conceptual Model

• A cluster is a collection of nodes (computers).• Failures in the cluster occur asynchronously and are

observed stochastically and independently.• Each cluster is divided into zero or more partitions

(by communication failures, etc.).• Each active node belongs to exactly one partition at a

time.• One of these partitions may be named the “primary”

partition. This partition is said to “have quorum”.

1342 30 0EURO-PAR

OCF Cluster Conceptual Model

• The method of determining membership is defined by the implementation – not by OCF standards.

• The method of electing a primary partition is defined by the implementation – not by OCF standards.

• The OCF generally defines the properties an implementation must have, not how they are achieved.

1352 30 0EURO-PAR

OCF Current Status

• Active participation by IBM, SuSE, OSDL, Sun, HP, Intel, Steeleye, Oracle, BigStorage, Linux-HA, University of Delft.

• Effort also endorsed by Free Standards Group, Conectiva, MSC Software, OSCAR, Red Hat, SGI, Bald Guy Software, UnitedLinux

• Now a working group of the Free Standards Groups

• Preliminary Draft APIs available for Event Services, Membership, and Resource agents.

1362 30 0EURO-PAR

OCG Plans

• Draft of current APIs by 1Q 2003.

• Next areas: recovery, group services, fencing.

• Refine and add APIs => official spec.

• Formal review period.

• Complete and release reference implementation.

1372 30 0EURO-PAR

Carrier Grade Linux Implementations

1382 30 0EURO-PAR

Who is building Carrier Grade Linux?

• Red Hat• United Linux (SuSE, SCO, Turbo Linux, Conectiva)• MontaVista

1392 30 0EURO-PAR

Red Hat

Red Hat Linux Enterprise Server

1402 30 0EURO-PAR

SuSE/United Linux

SuSE Linux Enterprise Server

1412 30 0EURO-PAR

United Linux / SuSE SLES

• A globally available Linux operating system for industry and enterprises, based on standards.

• Involvement of industry and customers in the development via United Linux Technical Advisory Board.

• Developed by the SCO Group, Conectiva, Turbo Linux, and SuSE Linux AG.

1422 30 0EURO-PAR

SuSE SLES

1432 30 0EURO-PAR

SuSE: The CGL Platform

• SuSE offers global support for development of applications and services.

• Version 8 of the powerful, reliable, and stable industry OS.• Scalable for up to 32 processors (Intel).• Proven HA solutions from third-party manufacturers

(Steeleye).• Long-term support by leading application developers,

product life cycle 5+ years

1442 30 0EURO-PAR

SuSE & CGL

• Standards Requirements:– Linux Standard Base– POSIX Timer Interface – POSIX Signal Interface– POSIX Queue Interface– POSIX Semaphore Interface– Event Logging POSIX – IPv6 RFCs compliance– IPsecv6 RFCs compliance– MIPv6 RFCs compliance– SNMP support– POSIX threads Standards

compliance

• Platform Requirements:– Hot Insert & Remove– Remote boot support– Boot cycle detection– Loading or proprietary modules– Diskless Systems– Serial console connection

1452 30 0EURO-PAR

SuSE & CGL• Availability Requirements:

– Watchdog Timer Interface– Application heartbeat monitor– Ethernet link aggregation– Ethernet link failover– RAID 1 support– Resilient file system support– Disk and volume management

• Serviceability Requirements:– Kernel dump targets– Kernel summary dump– Kernel message structuring– Dynamic debug/probe insertion– Platform signal handler– Remote access to event log

• Tools Requirements:– User level (gdb) debug support for

threads– Kernel dump analysis– Kernel debugger

• Performance Requirements:– Soft-real time performance– Kernel preemption– RAID 0 support– Application Loading– Concurrent timers scaling behavior

and reporting

1462 30 0EURO-PAR

Evolution of Distribution

• SuSE release cycle is 18-24 months– There will be Service Packs within one version, where new features (and

bug- and security fixes for the kernel) are included.

• They mostly back port these features (from newer kernel-versions) to their stable and current kernel version to avoid at client side the need of exchanging the systems.

• As a new kernel version normally affects the whole system, e.g., moving to kernel 2.6, they will include those in new versions of SLES, while an upgrade from 2.4.n --> 2.4.n+1 may be released within a service pack, but only if the risk, the effort and the need justify that.

1472 30 0EURO-PAR

MontaVista

Linux Carrier Grade Edition

1482 30 0EURO-PAR

Monta Vista Linux Carrier Grade Edition (CGE)

• First Linux vendor to distribute an industry standards-based COTS Carrier Grade Linux.

• CGE Release 3.0 based on Linux Kernel 2.4.18.

• Fully complies with Open Source Development Labs Carrier Grade Linux Specification Release 1.1.

• MontaVista is shipping all priority 1 features of OSDL and many priority 2.

1492 30 0EURO-PAR

MontaVista CGE Architecture

IA-32 Based Reference Hardware

MontaVista™ Linux® Carrier Grade KernelPOSIX/Hardened Drivers/Real Time Pre-emption

Serviceability EnhancementsHigh Availability EnhancementsPerformance Enhancements

200+ Networking& Application

Packages

High Availability Services

HW MgmtFail-over

Middleware & Application Services

Databases

Java

CORBA

Protocol

Carrier Applications – Soft SwitchTools

Target Tools•Runtime App Patcher

•Field-safeApp Debugger

•Enhanced kerneldump

App Tools•Kdevelop IDE•gdb•Gcc•KDB•KGDB•Trace•Debug

Config Tools•I/O Latency•Target Config•Lib Optimization

1502 30 0EURO-PAR

HA Hardware Support

• Key Standards Support– PICMG 2.12– PICMG 2.16– ATCA (May 2003)

• Hardware Redundancy– I/O Processing Hot Swap, Hot Insert– Redundant Storage– Redundant Networking– CPU Redundant System Slot

• Remote boot across LAN/WAN

1512 30 0EURO-PAR

POSIX Compatible Interfaces

• Microsecond Timers– MontaVista ported to 2.4– Accepted to 2.5 kernel

• POSIX event log• signaling and message queues • POSIX threads

– NGPT, migrating to NPTL

1522 30 0EURO-PAR

HA Hardware Support

• Hardened device driver support• Support bonding of multiple Ethernet NICs• RAID 1 support and user-level volume mgmt utilities 1• Boot cycle detection• Hyper-threading Support• IPMI Driver Support

1532 30 0EURO-PAR

Performance Features

• Journaling filesystem(s)• O(1) real-time scheduler with CPU affinity• Preemptive real-time kernel • Scalability improvements – kernel locks, locking primitives• RAID 0 striping • Application loading/locking into memory• Fast boot up time • Forced unmount of files systems

1542 30 0EURO-PAR

MontaVista Real Time Linux

• Real Time Pre-emptible Kernel– MontaVista developed technology– Added to the 2.5 kernel – Provides a stable standard real-time solution– Preserves Linux programming model User-level applications and Standard

APIs

• Fixed priority scheduler– Enhanced version of Order-One or (0)1 priority scheduler – Added CPU affinity support

1552 30 0EURO-PAR

Serviceability Features

• Kernel resource monitoring framework. • System event log with remote access capability.• Produce and store kernel dumps across LAN, in-memory,

disk.• Multithreaded core dump.• Dynamic debugging and on-line probe insertion.

1562 30 0EURO-PAR

Open System Lab

Experience and Contributions Towards Carrier Grade Linux

1572 30 0EURO-PAR

Hardware

• Compact PCI design• -48V Central Office

Powered• NEBS Compliant Ready• 16 P3 500MHz 512 Mbytes

RAM• 6 Ethernet Ports /

Processor• 8 SCSI Disk banks

(3x18GBytes)• Fully Redundant and hot

swap

• Off-the-shelf 1U • Celeron/Pentium III• 256/512 MB RAM• 20/40 GB IDE HD• Floppy/CD-ROM• 2 fast Ethernet ports• 2 USB ports

1582 30 0EURO-PAR

ARIES 2000

2000

Find and prototype the necessary technology to prove the feasibility of an Internet Server that has the guaranteed availability, response time and scalability using both TelORB and Linux.

Focus Areas: HA Clusters as InternetServers using TelORB and Linux

• HA Linux Cluster • Linux Diskless Booting • Linux NFS Redundancy• Linux Ethernet• Redundancy- …

1592 30 0EURO-PAR

ARIES 2001

Focus Areas: - Alternate Scalability Technologies

- Cluster Dimensioning- Load Balancing- IPv6 - Security

2001

Enhance clustering capabilities of TelORB and Linux clusters as Mobile Internet Servers

1602 30 0EURO-PAR

Focus Areas: - IPv6- Asynchronous Event Mechanism (AEM)- Security on clustered servers (DSI)- Carrier Grade Linux- TIPC port for Linux

ARIES 2002/2003

2002

- Convergence of specific elements of the different platform technologies to be used in Server and Router nodes.

- Focus on the clustering capabilities of different OS supporting highly available secured application in an all IP world.

1612 30 0EURO-PAR

Ericsson Contributions to Carrier Grade Linux

• Asynchronous Event Mechanism (AEM)• Distributed Security Infrastructure (DSI)• Telecom IPC (TIPC)

All released to Open Source under the GPL License.

1622 30 0EURO-PAR

Asynchronous Event Mechanism (AEM)

• Advantages of Carrier-Grade Linux:– Openness (open source software, third party software),– Stability on a wide variety of architectures (UP, SMP, HP, NUMA,…)– Capability to run in different environments (Large-scale distributed,

embedded, real-time,…)• Carrier-Grade Environments (must be supported)

– Media Gateway and MG Controller,– Authentication controller,– SS7 gateway,– (IP) Application servers (transactional, streaming),– IP gateway,– Other cross-network gateways…

⇒ Imply different software requirements

1632 30 0EURO-PAR

AEM …

• Carrier-Grade Software Requirements– A high –response rate,– A minimum down-time,– Scalability w.r.t external requests and hardware,– Ranging from Soft Real-Time ⇒ Hard Real-Time capabilities.

• Carrier-Grade Platform Requirements– Live software upgrade, hardware hot-swap,…– Large database, fail-over, memory utilization…– Huge number of processes, fault detection and prevention, application

restart, process reload,…

⇒ Many system events to handle quickly

1642 30 0EURO-PAR

AEM …

• AEM is a Linux kernel patch and a set of modules providing the asynchronous execution of processes. It implements a native support for asynchronous events in the Linux kernel. Its aim is to bring carrier-grade characteristics to Linux: scalability and soft real-time responsiveness.

• AEM offers event-based development framework, Scalability, Flexibility and extensibility.

• AEM is part of the OSDL Carrier-Grade Linux 2.0 specifications.• AEM will be included in the HA OSCAR implementation. • AEM: http://sourceforge.net/projects/aem• AEM Contact: Frederic.Rossi@Ericsson.ca

1652 30 0EURO-PAR

Distributed Security Infrastructure (DSI)

• The telecom industry uses clusters for telecom platforms … but there isn’t any coherent & homogeneous security framework dedicated toclusters– DSI was started as an internal Ericsson research project – DSI was released as Open Source in July 2002:

http://sourceforge.net/projects/disec• Security framework for real-time distributed applications on large-scale

carrier-class Linux clusters• During 2002-2003:

– Completed the design– Implemented general infrastructure– Implemented distributed security services– Exposed our ideas & received feedback

1662 30 0EURO-PAR

DSI Overview

One primary Security Server (SS)

Multiple Security Managers (SM) (one per node)

SS and SMs communicate through an encrypted and authenticated channel (SSL/TLS over CORBA)

Security policy is enforced at kernel level

Implemented: Distributed Access Control service, Distributed Security Policy

Future: Distributed Confidentiality and Integrity (DisCI)

Primary Security Server Node

Node 1 Node 2 Node 3

DSMSS DSM DSM

Proc123 Proc978 Proc222

Ker

nel

Security Broker

Secondary

Data TrafficInsi

d e th

e C

l us t

er

Security andO&M/IDS

Out

sid e

the

Clu

ster

SS Security Server

SM Security Manager

AuthenticatedEncrypted Communications

SMSMSM

DSM Distributed Security Module

1672 30 0EURO-PAR

TIPC

• Inter process and processor communication service– Specially designed for efficient intra cluster communication.– But also good support for inter cluster communication.

• Framework for supervising and reporting topology changes.

• Portable source code package, ~15000 lines C code

– Complex, but generic 'core' implementation.

– Easy to port

– Easy to use

• Quality product

– Has been deployed as part of Ericsson products for years.

1682 30 0EURO-PAR

RAISON D’ÊTRE

• Complete location transparency– 'The cluster is the computer'– Automatic, fast and transparent reconfiguration

• Application support level– In-sequence, loss free delivery both in connectionless and connection

oriented mode– Lightweight, agile connection concept– Immediate feedback to application on events– Function availability/unavailability (synchronization/event channel)

• Configuration support level– Immediate feedback on processor/signalling link availability events– Self configuring links– Remote topology subscription, configuration and management

1692 30 0EURO-PAR

RAISON D’ÊTRE• Performance

– <2/3 time/message. <1/2 time/transaction (vs TCP/IP)

• Robustness– Overload protection support– Network redundancy with transparent and disturbance free failover– Releases resources after process or processor crash

• Resource utilization– Load sharing among up to 4 bearers– Package bundling to optimize bandwidth utilization

• Portability– Adaptive towards different bearer types– Adaptive towards different user interfaces– Requires only a minimal set of services from the OS environment

1702 30 0EURO-PAR

TIPC info

• TIPC was released to Open Source early 2003.

• TIPC is part of the OSDL Carrier-Grade Linux 2.0 specifications.

• For more on DSI: http://www.linux.ericsson.se/tipc

• Contact: Jon.Maloy@Ericsson.ca

1712 30 0EURO-PAR

Challenges

1722 30 0EURO-PAR

Challenges

• Moving from proprietary technologies to Open Technologies while maintaining technological competitiveness.

• Contributions to Open Source should be seen as a strategic investment rather than being perceived as a cost.

• As part of interacting with Open Source and following more Open working methods, people must be open to share their ideas and source code and be willing to communicate that with others.

1732 30 0EURO-PAR

Challenges (2/2)

• Changing ways of working

• Avoid duplicated efforts

• Harmony and synergy among all efforts

• Working with competitors

• Building enablers together

1742 30 0EURO-PAR

Working with Open Source community

• Listen to Free Software/Open Source/Kerneldevelopers– they know their stuff, especially regarding integration and

distribution. • Keep an eye on user forums

– they use it everyday, so listen to their reports, monitor the mailing lists.

• Re-use– free software people are famous for being extremely lazy and not

prone in doing something twice if it can be automated.

1752 30 0EURO-PAR

Working with Open Source Community …

• Be open– Disclose problems fast, along with immediate workarounds and

early fixes. Pretending everything is right will buy you trouble.• Release early and often

– Gives a good estimate on progress, helps catching bugs early.• Don’t be perfect

– Ask around in mailing lists or public forums; let it be known that your company is interested in finding The Right Way to do things –don’t just hack something and good luck.

• Don’t reinvent the wheel– use best known methods. If you find a better one, make sure it

becomes best known.

1762 30 0EURO-PAR

Conclusion

1772 30 0EURO-PAR

Linux in the Carrier Grade Space

• Many efforts currently exist to move Linux forward towards Carrier Grade characteristics.

• “Moving slowly but surely”

• Our challenge: When was the last time you picked-up a phone and did not have a dial-tone available? We want to achieve this with Mobile Phones over IP with Linux based platforms.

1782 30 0EURO-PAR

Thanks to:

• Alexander Larruy [Ericsson]• Denis Monette [Ericsson]• Andre Beliveau [Ericsson]• Jon Maloy [Ericsson]• Makan Pourzandi [Ericsson]• Frederic Rossi [Ericsson]• Peter Badovinatz [IBM]• Alan Robertson [IBM]• Glenn Seiler [MontaVista]• Mika Kukkonen [OSDL]• Doug Kolb [OSDL]• Silke Hirneiss [SuSE]• Oliver Nachtrab [SuSE]• Lars Marowsky-Bree [SuSE]

1792 30 0EURO-PAR

Thank you.

Ibrahim HaddadEricsson Research – Corporate Unit

Ericsson Canada Inc. 8400 Decarie Blvd Phone: 1.514.345.7900 x5484Town of Mount Royal Mobile: 1.513.577.0345Quebec H4P 2N2 Fax: 1.514.345.6105Canada Email:Ibrahim.Haddad@Ericsson.com

This version of the slide show is available from http://www.linux.ericsson.ca/visibility

1802 30 0EURO-PAR

QUESTIONS & ANSWERS

2 0 0 3EURO-PAR CONFERENCE

Recommended