23
State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor : Ittay Eyal Winter Semester 2010

State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

State Machine Replication

Project Presentation

Ido ZachevskyMarat Radan

Supervisor:Ittay Eyal

Winter Semester 2010

Page 2: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Goals

• Learn and understand Paxos and Python.

• Design program for fault-tolerant distributed system using the Paxos algorithm.

• Test on a real internet scale system, Planet-Lab.

Page 3: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

The Problem – Distributed Storage

• Using Distributed Algorithms on a network has many advantages

• It also has many problems

• This project focuses on the Synchronization Problem

Page 4: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Synchronization

• The task: Successfully issue a state machine which involves all the computers of a network

• All the computers need to be in sync regarding the Current State and the Next States.

• All the computers need to know the transitions.

Page 5: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Problems?

• Can any computer choose the next state?

• What if a computer disconnects ungracefully?

• What if a message is delayed due to congestion?

• Other problems…

• Solution: Use a dedicated algorithm

Page 6: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

A Solution – Paxos

• Keeping the Safety requirements ensures an agreed-upon value, by all computers, is chosen

• Keeping the Liveness requirements ensures a value will be chosen

Page 7: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Paxos - Background

Paxos Made Simple

Leslie Lamport01 Nov 2001

• Paxos Made Live

Page 8: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Principles

• The system consists of three agent classes:– Proposers– Acceptors– Learners

• Some of them distinguished

• Communicate via messages

Page 9: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Principles – continued

• A single computer – a Leader – is in charge

• Decision cycle in two phases:1. A majority must promise to commit to a

recent proposal.2. Once a majority has committed, all

computers are informed of the Decision.

Page 10: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Safety requirements

• Only a value that has been proposed may be chosen,

• Only a single value is chosen, and• A process never learns that a value has been

chosen unless it actually has been.

Page 11: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Liveness requirements

• Some proposed value is eventually chosen.• A process can eventually learn the value which

has been chosen.

Page 12: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Implementing a State Machine

• Collection of servers, each implementing a state machine.

• The i-th state machine command in the sequence is the value chosen by the i-th instance of the Paxos consensus algorithm.

• A pre-decided set of commands is necessary.

Page 13: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Planet-Lab

• Planet-Lab is a global research network that supports the development of new network services.

• Understanding the system is required• Monitoring is necessary

– Generally, implemented via NSSL-lab.

Page 14: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Project Design

• Chosen language for implementation: Python• Network framework: Twisted Matrix

• Implementation stages:– Single Decision on NSSL– Multiple Decisions on NSSL– Single Decision on Planet-Lab– Multiple Decisions on Planet-Lab

Page 15: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Clients 1

Server 1

Clients 2

Server 2

Clients N

Server N

The Network

……...

Transport

Listening Socket

Transport

Transport

Protocol

Protocol

Protocol

ProtocolFactory

Paxos Algorithm

Transport

Transport

Transport

Protocol

Protocol

Protocol

ProtocolFactory

Reactor Loop

... ...

... ...

Page 16: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Implementation

• Use Cases– Acceptor disconnects?

– Leader disconnects?• At which stage?

– Acceptor message fails to deliver?

Page 17: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Implementation

• Leader Election– In fact an inherent part of the algorithm

• Output and monitoring– Actual output not visible in general– Only via monitoring

Page 18: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Flow

1. Register Nodes 2. Verify and install necessary files3. Upload4. Initiate Monitor5. Run and wait for activity6. Review results

Page 19: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Implementation – File Structure

Initial Installation

Installationmy_install (csh)

Initial Communication send_install (py)

Alive Machines Server

install_serv (py)

Uploading and Running

Deployment my_deploy (csh)

Multi-Run my_multirun (csh)

Multi-Stop my_multistop (csh)

Core Paxos Program

Paxos Instancepaxos_inst (py)

Paxos Algorithmpaxos_alg (py)

Network Datapaxos_net_data

(txt)

ProjectFile Structure

Service Scripts and Files

Alive Nodes listnodes (txt)

Paxos Monitorpaxos_mon_serv

(py)

combine_nodes (csh)

conv_nodes (csh)

remove_done (csh)

Additional files

Page 20: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Results

• Everything works at the NSSL• In Real-Life, not necessarily• Communication phenomena – messages

arriving unordered, in large chunks, etc.• Works well for up to 20-30 Nodes• Use cases tested in Lab

Page 21: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Conclusions

• Preliminary work needed to understand Twisted Matrix and Planet-Lab

• Dealing with network problems– SSH Tunnel instead of “real” monitoring

• Requirements fulfilled

Page 22: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Further work

• Optimize networking protocol– Improve client-server interface– Inefficient startup – N(N-1) for N machines

• Partition Decision processes– Only few nodes decide each resolution

Page 23: State Machine Replication Project Presentation Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010

Thank you