1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein...

Preview:

Citation preview

1© 2002-2003 Hein Meling and Alberto Montresor

The Jgroup/ARMDependable Computing Toolkit

Hein MelingStavanger University College – Norway

Department of Electrical and Computer Engineering

Alberto MontresorUniversity of Bologna - Italy

Department of Computer Science

2© 2002-2003 Hein Meling and Alberto Montresor

Context

(Distributed) systems that require

• Reliable and high-availability operation

• Fault tolerance

• (Load balancing)

Based on ”cheap” hardware and software

• Commercial off the shelf, and not custom hardware

• Heterogenous software (OS) architectures

Middleware architectures for distributed computing

• Middleware: between the application and OS

3© 2002-2003 Hein Meling and Alberto Montresor

Types of Failures

Processor failures

• Crash failures

• Value failures (very expensive)

Network failures

Operating System hangs

Memory leaks

Software design errors(beyond state-of-the-art)

4© 2002-2003 Hein Meling and Alberto Montresor

Overview

Jgroup

• A toolkit aimed at supporting the development of reliable and highly-available applications.

Autonomous Replication Management (ARM)

• A framework for server replica deployment and recovery without user intervention.

History

• Formal specification (1996-97)

• Algorithm description and Jgroup implementation

• Integration with existing technologies (Java RMI / Jini)

• The ARM framework (2000-03)

• Development of Jgroup-based applications

5© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

6© 2002-2003 Hein Meling and Alberto Montresor

The Problem

Some environments supporting distributed computing:

• CORBA (OMG)

• DCOM / .NET (Microsoft)

• Java RMI / Jini / EJB (Sun)

Characteristics:

• Object-oriented

• Based on client - server remote method invocations

• Promote modularity, reusability, interoperability, portability

7© 2002-2003 Hein Meling and Alberto Montresor

Java Remote Method Invocations

Java RMI protocol:

• enables objects residing in different JVMs to communicate through remote method invocations

Client Server

Stub

Server-sideRMI

Runtime

Network

JVM1 JVM2

method() return x

8© 2002-2003 Hein Meling and Alberto Montresor

Java Remote Method Invocations

Client Server

JVM1 JVM2

method()

return x

9© 2002-2003 Hein Meling and Alberto Montresor

The Problem

Distributed computing environments did not provide adequate support for developing reliable and high-available applications

Lack of reliable “one-to-many” interaction primitives

• From the client’s point of view: non-transparent access to replicated servers

• From the server’s point of view: no support for maintaining consistency

10© 2002-2003 Hein Meling and Alberto Montresor

The Solution: The Object Group Paradigm

Object group:

• A dynamic collection of server objects that cooperate in order to deliver some service and maintain shared state

Group method invocations:

• The act of invoking a method on an object group

• The method is executed by a certain number of servers in the object group, depending on the invocation semantics

Client

Server Server

Server

ObjectGroup

11© 2002-2003 Hein Meling and Alberto Montresor

The Solution: The Object Group Paradigm

From the client’s point of view:

• Groups must be transparent - like standard remote objects

• Clients need not be aware that they are interacting with an object group instead of a single server

From the server’s point of view:

• Server implementation - as transparent as possible

• Servers forming a group• must cooperate to maintain shared state and• to appear as a single object

12© 2002-2003 Hein Meling and Alberto Montresor

Group Communication

Group communication has been shown to be a powerful paradigm for supporting the development of dependable applications in distributed systems

• Management of dynamic groups(join/leave operations)

• Failure monitoring(crashes / partitionings)

• “One-to-many” communication

• Ordering of events (FIFO, Causal, Atomic)

• State synchronization tools

Group MembershipService

Reliable MulticastService

State TransferService

13© 2002-2003 Hein Meling and Alberto Montresor

Other Object Group Systems

CORBA

• Electra [Cornell, Zurich]

• Object Group Service (OGS) [EPFL, Lausanne]

• Eternal [UC Santa Barbara, Eternal Systems]

• Newtop [Newcastle, UK]

Java RMI

• Filterfresh [Bell Labs]

• JavaGroups [Cornell]

• Aroma [UC Santa Barbara]

DCOM

• Quintet [Cornell]

14© 2002-2003 Hein Meling and Alberto Montresor

Jgroup: “Yet Another Object Group Service”?

Support for partition-awareness:

• Modern wide-area communication networks are often characterized as highly partitionable

• Jgroup supports the development of reliable and high-available applications in partitionable systems

Moreover:

• Is extends modern technologies like Java RMI and Jini

• Is completely written in Java (portability)

• Supports complex merging service

• Extensible: deployment, recovery and upgrade facilities

15© 2002-2003 Hein Meling and Alberto Montresor

Autonomous Replication Management

Support for transparent replica deployment

• Placing server replicas on machines in the network

• Selecting machines so that each application can tolerate both network and machine failures

Support for replica recovery

• Jgroup detect and report failures

• ARM replace any crashed server replica with a new instance

16© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

17© 2002-2003 Hein Meling and Alberto Montresor

Group Membership

Group membership service tracks both voluntary and involuntary changes in the group’s membership

Variations are reported to group members through the installation of views

Installed views

• Consist of a collection of members

• Correspond to the group’s current membership as perceived by the members included in the view

18© 2002-2003 Hein Meling and Alberto Montresor

Group Membership: A Simple Scenario

join

join

joinS1

S2

S3

S3 crashes!view

19© 2002-2003 Hein Meling and Alberto Montresor

Partition-awareness

What kind of behavior can we expect from fault-tolerant applications in the presence of network partitioning?

The primary-partition approach:

No serviceavailable !

How can I help You ?

No serviceavailable !

20© 2002-2003 Hein Meling and Alberto Montresor

Jgroup supports dependability in partitionable systems

• Development of applications aware of the existence of partitions (on the server-side)

• Partition-aware applications take advantage of their semantics in order to be more available

• Computations continue in all partitions of the system

How can I help You ?

How can I help You ?

How can I help You ?

Support for partition-awareness

21© 2002-2003 Hein Meling and Alberto Montresor

Group Membership: A Partitioning Scenario

join

join

joinS1

S2

S3

S1 and S2 partitioned

from S3!

Communicationwith S3

restored!

22© 2002-2003 Hein Meling and Alberto Montresor

Example: Task Execution Service

Server Server

Server

Primary Partition

Server

Client TaskTask

TaskTask

Client

Warning!

Server Server

Server Server

Client TaskTask

TaskTask

ClientTaskTask

TaskTask

Partition-aware

23© 2002-2003 Hein Meling and Alberto Montresor

Comparison

Primary-partition approach

+ Easy to maintain a single, coherent shared state(strong consistency)

- Servers in non-primary partitions unable to serve requests (low availability)

Partition-aware approach

+ Servers in multiple partitions may be able to serve requests(high availability)

- Partitions evolve independently, possibly leading to inconsistent states (loose consistency)

24© 2002-2003 Hein Meling and Alberto Montresor

Comparison (Cont.)

Primary-partition approach

+ Development of fault-tolerant applications is simpler(active replication of existing non fault-tolerant servers)

- Developers cannot exploit application semantics in order to provide a more available service

Partition-aware approach

+ Applications adapt their behavior and remain available in many partitions (perhaps by reducing their quality of service)

- Development of fault-tolerant applications is more complex (case-by-case design is needed)

25© 2002-2003 Hein Meling and Alberto Montresor

The State Merging Problem

During partitioning, the state of servers belonging to distinct partitions may become inconsistent

When the partition disappears, an application-specific state merging protocol may be needed

Servers participating in the protocol try to define a new shared state that reconciles (when possible) the divergences

Server Server

Server ServerTaskTask

TaskTaskServer Server

Server ServerTaskTask

TaskTask

26© 2002-2003 Hein Meling and Alberto Montresor

The State Merging Problem

During partitioning, the state of servers belonging to distinct partitions may become inconsistent

When the partition disappears, an application-specific state merging protocol may be needed

Servers participating in the protocol try to define a new shared state that reconciles (when possible) the divergences

Server Server

Server ServerTaskTask

TaskTaskServer Server

Server ServerTaskTask

TaskTask

TaskTask

TaskTask

27© 2002-2003 Hein Meling and Alberto Montresor

The State Merging Problem

State merging protocols are based on the exchange of information among servers that have been partitioned

Jgroup provides a state merging service (SMS) that simplifies the development of state merging protocols

NOTE

Determining

• what information needs to be exchanged

• how to use it to construct a new consistent shared state

is an application-dependent problem

28© 2002-2003 Hein Meling and Alberto Montresor

General Schema for State Merging Protocols

• In each of the merging partitions, a coordinator is selected

• SMS interrogates each coordinator to obtain information about its current state

• State information from a coordinator is passed to servers that used to be partitioned from it

• Each of the servers merge information from coordinators with their own state

S1

S2

S3

S4

getState()

putState()

29© 2002-2003 Hein Meling and Alberto Montresor

General Schema for State Merging Protocols

• In each of the merging partitions, a coordinator is selected

• SMS interrogates each coordinator to obtain information about its current state

• State information from a coordinator is passed to servers that used to be partitioned from it

• Each of the servers merge information from coordinators with their own state

S1

S2

S3

S4

getState()

putState()

30© 2002-2003 Hein Meling and Alberto Montresor

Full Object-Orientation

Server Server

Server

Client

Remote methodinvocations

Messagemulticasting

Stub

Existing object group systems fail to provide a completely object-oriented environment for software developers

31© 2002-2003 Hein Meling and Alberto Montresor

View Synchrony

View synchrony (1)

If a correct server S executes an invocation during a view, then

• all servers within the view will also execute the invocation,

• or S will install a new view

View synchrony does not admit executions like this:

S2

S3

S4

S1

admits

32© 2002-2003 Hein Meling and Alberto Montresor

View Synchrony

View Synchrony (2)

All servers that survive from one view to the same next view execute the same set of invocations in the original view

View synchrony does not admit executions like this:

S2

S3

S4

S1

admits

33© 2002-2003 Hein Meling and Alberto Montresor

Internal Group Method Invocations

Synchronous invocations

• The method invocation terminates by returning a vector of return values, one from each server at which the method was executed

Asynchronous invocations:

• The method invocation terminates immediately; replies (if any) are returned to a callback object

• Can be used to simulate message multicasting through void methods (one-way)

34© 2002-2003 Hein Meling and Alberto Montresor

Internal Invocations: example

Synchronous invocation

S1

S2

S3

int[] values =

group.getValue();

int getValue() {

return value

}

35© 2002-2003 Hein Meling and Alberto Montresor

Internal Invocations: example

S1

S2

S3

ValuesCallback cb;group.getValue(cb);…int[] values = cb.getResults();

public class ValuesCallback implements Callback { void result(Object value); int[] getResults();}

int getValue() {

return value

}

36© 2002-2003 Hein Meling and Alberto Montresor

External Group Method Invocations

Anycast invocations:

• Are executed by at least one server in the object group (unless the client is partitioned from the group)

• Efficiency (same cost as standard RMI interactions)

• Useful for “read” methods on replicated databases

Multicast invocations:

• Are executed by all servers in a view, following the view synchrony semantics

• More costly (involve several servers)

• Useful for “write” methods on replicated databases

37© 2002-2003 Hein Meling and Alberto Montresor

External invocations: example

S1

S2

S3

C1

C2

Multicast invocation:

registry.bind(“name”, obj);

Anycast invocation:

registry.lookup(“name”);

38© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

39© 2002-2003 Hein Meling and Alberto Montresor

Replication Management – The Problem

Object Group Systems support replication transparency:

• Membership management

• Reliable multicast

But does not support full failure transparency:

• Application or manual support to distribute replicas

• Application support or manual intervention required to recover from replica failures

Complicated tasks

• Application implementations prone to contain errors

• These tasks should not be left to the application developer

40© 2002-2003 Hein Meling and Alberto Montresor

Solution: Autonomous Replication Management

Support for creating object groups

• By placing individual members on distinct machines

• Each application may specify a replication policy• For example, redundancy level = 3

Support for failure recovery

• Jgroup detects and reports failures to ARM

• ARM reacts by creating a replacement member for each failed member, perhaps on a different machine

• Each application may specify a recovery policy

41© 2002-2003 Hein Meling and Alberto Montresor

ARM: Replica Distribution

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ReplicationManager

ManagementClient

createGroup()createReplica()

42© 2002-2003 Hein Meling and Alberto Montresor

ARM: Replica Distribution

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ManagementClient

createGroup()createReplica()

NettBankServer

NettBankServer

NettBankServer

43© 2002-2003 Hein Meling and Alberto Montresor

ARM: Recovery from Crash Failure

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ManagementClient

NettBankServer

NettBankServer

NettBankServer

Group Leader

notifyViewChange()

View agreement protocol

44© 2002-2003 Hein Meling and Alberto Montresor

ARM: Recovery from Crash Failure

ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon

ExecDaemonExecDaemon

ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon

Router

ux.his.no

item.ntnu.no

ReplicationManager

ReplicationManager

ManagementClient

NettBankServer

NettBankServer

Group Leader

notifyViewChange()

createReplica()

NettBankServer

45© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

46© 2002-2003 Hein Meling and Alberto Montresor

Introduction to Jini

Jini is an API built on top of the Java 2 platform:

• enables spontaneous networks of devices/software services to assemble into federations of objects

• addresses the distribution problems in these federations through a set of simple interfaces and protocols

Jini

Network

47© 2002-2003 Hein Meling and Alberto Montresor

Jini Architecture

The components of the Jini architecture may be divided in three categories:

• Infrastructure i.e. the components that enables building a federated Jini system

• Model that “supports and encourages the production of reliable distributed services”

• Services that can be made part of a federated Jini system and which offer functionality to any other member of the federation

• Javaspaces

48© 2002-2003 Hein Meling and Alberto Montresor

Jini Infrastructure

The infrastructure is composed of:

• Java RMI protocol:enables objects residing in different JVMs to communicate through remote method invocations

Client Server

Stub

Server-sideRMI

Runtime

Network

JVM1 JVM2

method() return x

49© 2002-2003 Hein Meling and Alberto Montresor

Jini Infrastructure

The infrastructure is composed of:

• Lookup Service: defines how services may become part of a Jini system and clients retrieve services by their types and attributes.

Client

Lookup Service

Server

StubStub

Join Stub

Lookup

Stub

Invocation

Lookup.

Stub

Discovery

50© 2002-2003 Hein Meling and Alberto Montresor

The Jini Programming Model

The programming model is based on three distinct paradigms for distributed computing:

• Leases extend the Java programming model by adding the time to the notion of holding a reference to a resource

• Transactionsallow a set of operations on one or more remote participants to be grouped in such a way that either all succeed or all fail

• Eventsenable objects to register interest in changes of the abstract state of remote objects

51© 2002-2003 Hein Meling and Alberto Montresor

Jini and Fault Tolerance

Jini fault tolerance is based on leases and transactions

• leases enable the detection of service failures

• transactions provide consistency by guaranteeing “all-or-nothing” semantics

Unfortunately, no support for high-availability is present in Jini

• No support for replication

• Failure of transaction manager clients and participants must wait for the recovery of the manager before serving further requests

52© 2002-2003 Hein Meling and Alberto Montresor

Enhancing Jini with Fault-Tolerance

Extending Jini with the Object Group Paradigm:

• Infrastructure• Extending Java RMI for Group Method Invocations

• Extending the Lookup Service for dealing with Group Proxies

• Programming Model

1. Object Group Paradigm as alternative programming model

2. Integration between transactional and object group model

• Services• Replicated JavaSpaces

53© 2002-2003 Hein Meling and Alberto Montresor

Extending Java RMI

RMI group at Javasoft designed Java RMI in order to be extensible

• The RemoteRef interface enables programmers to write their own references to remote objects on the client-side

Unfortunately, RemoteRefs are not sufficient

• There is no possibility to modify the behavior of RMI on the server side

RemoteRef

Client Stub

Server-sideRMI

Runtime

Server

54© 2002-2003 Hein Meling and Alberto Montresor

The Jgroup Approach (Current Version)

ServerProxy

Server

ClientProxy

Client

Statically or dynamicallygenerated – implementsthe remote interface

Fixed stub for server proxy

RMI Stub

Server-sideRMI

Runtime

RMI

ServerProxy

Server

Methoddispatchers

Multicast

RMI Stub

Server-sideRMI

Runtime

55© 2002-2003 Hein Meling and Alberto Montresor

Designing a New Java RMI API

We have cooperated with Sun Microsystems to design a new RMI API:

• Fully customizable, on both the client-side and the server-side

• Based on Dynamic Proxy Classes (JDK 1.3)(No need for static stub generators)

• Two different versions:

• One-to-one (remote method invocations)

• Voted down in JSR-078

• Being included in the "Davis" release of Jini

• One-to-many (group method invocations)

56© 2002-2003 Hein Meling and Alberto Montresor

ServerProxy

Server

ClientProxy

Client

Statically or dynamicallygenerated – implementsthe remote interface

ServerProxy

Server

Methoddispatchers

Jgroup with 1-to-1 Customizable RMI

RMI Stub

Server-sideRMI

Runtime

RMIMulticast

RMI Stub

Server-sideRMI

Runtime

RMI

57© 2002-2003 Hein Meling and Alberto Montresor

Jgroup with 1-to-Many Customizable RMI

ServerProxy

Server

ClientProxy

Client

ServerProxy

Server

ServerProxy

Server

Customizableobjects

Multicast RMI

58© 2002-2003 Hein Meling and Alberto Montresor

Extending the Lookup Service

Jini enables the registration of customized proxies for services

• this feature can be used to register group proxies using any implementation of the lookup service

Group proxies, however, differ from standard proxies as their contents may be dynamic

• server registration server reference added to group proxy

• server removal, lease expired server reference removed from group proxy

We have developed an alternative implementation of the lookup specification capable to deal with group proxies

59© 2002-2003 Hein Meling and Alberto Montresor

The Jgroup Lookup Service

Client

Lookup Service

Server Server Server

StubStub

Lookup

Invocation

Join Stub Join Stu

b Join Stub

Stub

60© 2002-2003 Hein Meling and Alberto Montresor

Extending the Jini Programming Model

Jgroup + Jini programming model for fault-tolerance

• Leases + transactions

• Object group communication

Problem:

• transactions and group communication considered as separate aspects of fault-tolerance

• their composition does not result in any meaningful combination of their respective strengths

We need the possibility of using replication in transactions:

• Transaction managers

• Participants

• Clients

61© 2002-2003 Hein Meling and Alberto Montresor

Summary

1. Introduction

2. Object Group Communication

3. The ARM framework

4. Integration with Java RMI / Jini

5. Conclusions

62© 2002-2003 Hein Meling and Alberto Montresor

Applications (Research)

Jgroup/ARM is being used for

• A distributed auction system• Partitionable auctions

• [Panzieri, Amoroso et al., University of Bologna, 2002]

• An online-upgrade service for active replication• [Solarski, GMD Fokus]

• A replication management framework• Application-specific replication and recovery strategies

• [Meling, HiS]

• Dependable naming service• Support for extensible group proxies (JERI)

• [Meling et al., HiS]

63© 2002-2003 Hein Meling and Alberto Montresor

Applications (Education)

Jgroup is being used at the

• Stavanger University College in the “Advanced Programming” course

• University of Bologna in the “Distributed System” course

• Norwegian University of Science and Technology in the “Dependable Systems” course

Source for several projects and thesis:

• Low-level communication protocols (Bologna)

• Replication services (Bologna)

• Wide-area distributed services (Padova)

• Management and deployment issues (HiS)

64© 2002-2003 Hein Meling and Alberto Montresor

Thank You!

http://jgroup.sourceforge.net/