22
Self Stabilizing Self Stabilizing Distributed File Distributed File System System Department of Computer Science, Ben- Department of Computer Science, Ben- Gurion University Gurion University A BGU – IBM joint A BGU – IBM joint project project

Self Stabilizing Distributed File System Department of Computer Science, Ben-Gurion University A BGU – IBM joint project

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Self Stabilizing Distributed Self Stabilizing Distributed File SystemFile System

Department of Computer Science, Ben-Gurion UniversityDepartment of Computer Science, Ben-Gurion University

A BGU – IBM joint projectA BGU – IBM joint project

DFS MotivationDFS Motivation

• Performance.Performance.• Fault tolerance, any server can Fault tolerance, any server can

take responsibility for any role.take responsibility for any role.• Place files closer to users (local file Place files closer to users (local file

access).access).

What Is Self-stabilizingWhat Is Self-stabilizing??

A self-stabilizing system is a system A self-stabilizing system is a system that can automatically recover that can automatically recover following the occurrence of following the occurrence of (transient) faults.(transient) faults.

The idea is to design system that can The idea is to design system that can be started in an arbitrary state and be started in an arbitrary state and still converge to a desired still converge to a desired behavior.behavior.

Self-Stabilization/S. Dolev

Self Stabilization Self Stabilization MotivationMotivation

• The combination and type of faults The combination and type of faults cannot be cannot be totallytotally anticipated in on- anticipated in on-going systems.going systems.

• Any on-going system Any on-going system mustmust be Self be Self stabilizing (or manually monitored).stabilizing (or manually monitored).

• Self-stabilizing algorithm can recover Self-stabilizing algorithm can recover from any arbitrary state reached due to from any arbitrary state reached due to the occurrence of faults.the occurrence of faults.

DesignDesign

• File system replication servers are File system replication servers are coordinated using a spanning tree.coordinated using a spanning tree.

• Tree is constructed by self-Tree is constructed by self-stabilizing update algorithm using stabilizing update algorithm using multicast messages.multicast messages.

• Updates are propagated using self-Updates are propagated using self-stabilizing stabilizing -synchronizer. -synchronizer.

Design (Cont’)Design (Cont’)

• Clients join the replication tree and Clients join the replication tree and forms a caching tree.forms a caching tree.

• File leases are used to provide File leases are used to provide cache consistency.cache consistency.

Replication TreeReplication Tree

• Using a layered self-stabilizing Using a layered self-stabilizing algorithm, we construct a single algorithm, we construct a single spanning tree consisting the file spanning tree consisting the file system servers.system servers.

Leader ElectionLeader Election

• A single leader coordinates the A single leader coordinates the construction of the spanning tree.construction of the spanning tree.

• If no leader exists, a server becomes a If no leader exists, a server becomes a leader.leader.

• If more than one leader exist, the server If more than one leader exist, the server with the minimal ID surviveswith the minimal ID survives

• Message are periodical sent using global Message are periodical sent using global multicast (or broadcast).multicast (or broadcast).

Leader Election AlgorithmLeader Election Algorithm

• Every T1 do:– If (p = leader) then send-multicast(‘I’m a leader’)– Leader-exists = true

• Every T1+Td do:– If (not leader-exists) then leader = p– Leader-exists = false

• Upon arrival of message do:– If (p.volume=volume) then

• If (p=leader) then leader = min(leader,sender)• Else leader = sender

– Leader-exists = true

Spanning Tree ConstructionSpanning Tree Construction

• A network version of the self-A network version of the self-stabilizing update algorithm.stabilizing update algorithm.

• Multicast messages with a limited Multicast messages with a limited -local TTL.-local TTL.

• Define Neighboring relation for the Define Neighboring relation for the update algorithm.update algorithm.

• Keep the communication graph Keep the communication graph connected.connected.

Induced Graph ExampleInduced Graph Example

Update AlgorithmUpdate Algorithm

• Collect routing tables from all Collect routing tables from all neighbors in the induced graph.neighbors in the induced graph.

• Build a distributed BFS spanning Build a distributed BFS spanning tree from the tables.tree from the tables.

• Select a manager (local leader) for Select a manager (local leader) for the tree, a server with the minimal the tree, a server with the minimal ID.ID.

Tree OptimizationTree Optimization

• Update algorithm creates connected Update algorithm creates connected components for the communication graph components for the communication graph that is induced by the that is induced by the radius. radius.

• Goal: Find the minimal Goal: Find the minimal radius that keeps radius that keeps connectivity.connectivity.

• Increase Increase by a factor of 2 until a single by a factor of 2 until a single component spans the system.component spans the system.

• Run a 2Run a 2ndnd instance of update with instance of update with << radius radius and compare outputs, if the same, decrease and compare outputs, if the same, decrease ..

• Search for Search for using binary search. using binary search.

Tree StructureTree Structure

Replication ConsistencyReplication Consistency

• A self-stabilizing A self-stabilizing -synchronizer verifies -synchronizer verifies that the signatures of accessed files are that the signatures of accessed files are identical in all servers.identical in all servers.

• If more than a single signature exist If more than a single signature exist then there is a conflict.then there is a conflict.

• The leader decides (user defined The leader decides (user defined algorithm) on the correct file content algorithm) on the correct file content and notifies the servers.and notifies the servers.

Caching TreeCaching Tree

• Clients extends the replication tree Clients extends the replication tree to a caching tree.to a caching tree.

• The same update algorithm The same update algorithm construct both replication and construct both replication and caching tree (minor modification caching tree (minor modification are required).are required).

Cache Tree DiagramCache Tree Diagram

File AccessFile Access

• Read request is sent to the tree parent Read request is sent to the tree parent (either a server or cache).(either a server or cache).

• Write request travels to the replication Write request travels to the replication tree root (leader) and propagates by the tree root (leader) and propagates by the -synchronizer.-synchronizer.

• Caching consistency depends on the Caching consistency depends on the propagation mechanism.propagation mechanism.

Read/Write ExampleRead/Write Example

Linux Based Linux Based bguFS (1)bguFS (1)

Application

User LevelKernel Level

VFS

bgu

FS

bgu

FS

Mod

ule

Mod

ule

Cache: valid data?

Local file system

Kernel update

SyncDaemon:Cache manager & Server

Upcalls

Network Communication

Updates

Linux Based Linux Based bguFS (2)bguFS (2)

Application

User LevelLinux libc library

Library File Commands

New implementation for “C” commands:fopen, fclose, fread, fwrite, etc …

SyncDaemon:Cache manager & Server

Upcalls

Network Communication

TasksTasks

• Leader election and a radius based Leader election and a radius based spanning tree.spanning tree.

• Optimal radius (binary) search and Optimal radius (binary) search and beta-synchronizer.beta-synchronizer.

• Distributed file R/W (operations) Distributed file R/W (operations) implementation.implementation.

• Kernel VFS module (1).Kernel VFS module (1).• C library “hacking” solution (2).C library “hacking” solution (2).