26
Petal and Frangipani Petal and Frangipani

Petal and Frangipani

  • Upload
    roz

  • View
    49

  • Download
    1

Embed Size (px)

DESCRIPTION

Petal and Frangipani. Petal/Frangipani. NFS. “NAS”. Frangipani. “SAN”. Petal. NFS. Frangipani. Petal. Petal/Frangipani. Untrusted OS-agnostic. FS semantics Sharing/coordination. Disk aggregation (“bricks”) Filesystem-agnostic Recovery and reconfiguration Load balancing - PowerPoint PPT Presentation

Citation preview

Page 1: Petal and Frangipani

Petal and FrangipaniPetal and Frangipani

Page 2: Petal and Frangipani

Petal/FrangipaniPetal/Frangipani

PetalPetal

FrangipaniFrangipani

NFSNFS

““SAN”SAN”

““NAS”NAS”

Page 3: Petal and Frangipani

Petal/FrangipaniPetal/Frangipani

PetalPetal

FrangipaniFrangipani

NFSNFSUntrustedOS-agnostic

FS semanticsSharing/coordinationDisk aggregation (“bricks”)Filesystem-agnosticRecovery and reconfigurationLoad balancingChained declusteringSnapshotsDoes not control sharing

Each “cloud” may resize or reconfigure independently.What indirection is required to make this happen, and where is it?

Page 4: Petal and Frangipani

Remaining SlidesRemaining SlidesThe following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at www.thekkath.org.

For CPS 212, several issues are important:• Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).• Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.• Understand the similarities/differences between Petal and the other reconfigurable cluster service work we have studied: DDS and Porcupine.• Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.• Understand the nature, purpose, and role of the three key design elements added for Frangipani: leased locks, a write-ownership consistent caching protocol, and server logging for recovery.

Page 5: Petal and Frangipani

5

Petal: Distributed Virtual DisksPetal: Distributed Virtual Disks

Systems Research Center

Digital Equipment Corporation

Edward K. Lee

Chandramohan A. Thekkath

04/20/23

Page 6: Petal and Frangipani

6

Logical System ViewLogical System View

/dev/vdisk1/dev/vdisk2 /dev/vdisk3 /dev/vdisk4

/dev/vdisk5

AdvFS NT FS PC FS UFS

Scalable Network

Petal

Page 7: Petal and Frangipani

7

Physical System ViewPhysical System View

Scalable Network

Petal Server Petal Server Petal Server Petal Server

Parallel Database or Cluster File System

/dev/shared1

Page 8: Petal and Frangipani

8

Virtual DisksVirtual Disks

Each disk provides 2^64 byte address space.

Created and destroyed on demand.

Allocates disk storage on demand.

Snapshots via copy-on-write.

Online incremental reconfiguration.

Page 9: Petal and Frangipani

9

Virtual to Physical TranslationVirtual to Physical Translation

PMap0

vdiskID

offset

(disk, diskOffset)

PMap1

Virtual Disk Directory

GMap

PMap2 PMap3

(server, disk, diskOffset)(vdiskID, offset)

Server 0 Server 1 Server 2 Server 3

Page 10: Petal and Frangipani

10

Global State ManagementGlobal State Management

Based on Leslie Lamport’s Paxos algorithm.

Global state is replicated across all servers.

Consistent in the face of server & network failures.

A majority is needed to update global state.

Any server can be added/removed in the presence of failed servers.

Page 11: Petal and Frangipani

11

Fault-Tolerant Global OperationsFault-Tolerant Global Operations

Create/Delete virtual disks.

Snapshot virtual disks.

Add/Remove servers.

Reconfigure virtual disks.

Page 12: Petal and Frangipani

12

Data Placement & RedundancyData Placement & Redundancy

Supports non-redundant and chained-declustered virtual disks.

Parity can be supported if desired.

Chained-declustering tolerates any single component failure.

Tolerates many common multiple failures.

Throughput scales linearly with additional servers.

Throughput degrades gracefully with failures.

Page 13: Petal and Frangipani

13

Chained DeclusteringChained Declustering

D0

Server0

D3

D4

D7

D1

Server1

D0

D5

D4

D2

Server2

D1

D6

D5

D3

Server3

D2

D7

D6

Page 14: Petal and Frangipani

14

Chained DeclusteringChained Declustering

D0

Server0

D3

D4

D7

Server1

D2

Server2

D1

D6

D5

D3

Server3

D2

D7

D6

D1

D0

D5

D4

Page 15: Petal and Frangipani

15

The PrototypeThe Prototype

Digital ATM network.

• 155 Mbit/s per link.

8 AlphaStation Model 600.

• 333 MHz Alpha running Digital Unix.

72 RZ29 disks.

• 4.3 GB, 3.5 inch, fast SCSI (10MB/s).

• 9 ms avg. seek, 6 MB/s sustained transfer rate.

Unix kernel device driver.

User-level Petal servers.

Page 16: Petal and Frangipani

16

The PrototypeThe Prototype

src-ss1

Digital ATM Network (AN2)

src-ss2 src-ss8

petal1 petal2 petal8

/dev/vdisk1

/dev/vdisk1 /dev/vdisk1 /dev/vdisk1

………

………

Page 17: Petal and Frangipani

17

Throughput ScalingThroughput Scaling

0

2

4

6

8

0 2 4 6 8

Number of Servers

Th

rou

pu

t S

cale

-up LINEAR

512B Rd

8KB Rd

64KB Rd

512B Wr

8KB Wr

64KB Wr

Page 18: Petal and Frangipani

18

Virtual Disk ReconfigurationVirtual Disk Reconfiguration

0

5

10

15

20

25

30

0 1 2 3 4 5 6

Elapsed Time in Minutes

Th

rou

gh

pu

t in

MB

/s

6 servers

8 servers

virtual disk w/ 1GB of allocated storage8KB reads & writes

Page 19: Petal and Frangipani

Frangipani: A Scalable Distributed File Frangipani: A Scalable Distributed File SystemSystem

C. A. Thekkath, T. Mann, and E. K. Lee

Systems Research Center

Digital Equipment Corporation

Page 20: Petal and Frangipani

Why Not An Old File System on Petal?Why Not An Old File System on Petal?

Traditional file systems (e.g., UFS, AdvFS) cannot share a block device

The machine that runs the file system can become a bottleneck

Page 21: Petal and Frangipani

FrangipaniFrangipani

Behaves like a local file system

• multiple machines cooperatively managea Petal disk

• users on any machine see a consistentview of data

Exhibits good performance, scaling, and load balancing

Easy to administer

Page 22: Petal and Frangipani

Ease of AdministrationEase of Administration

Frangipani machines are modular

• can be added and deleted transparently

Common free space pool

• users don’t have to be moved

Automatically recovers from crashes

Consistent backup without halting the system

Page 23: Petal and Frangipani

Components of FrangipaniComponents of Frangipani

File system core

• implements the Digital Unix vnode interface

• uses the Digital Unix Unified Buffer Cache

• exploits Petal’s large virtual space

Locks with leases

Write-ahead redo log

Page 24: Petal and Frangipani

Locks Locks

Multiple reader/single writer

Locks are moderately coarse-grained

• protects entire file or directory

Dirty data is written to disk before lock is given to another machine

Each machine aggressively caches locks• uses lease timeouts for lock recovery

Page 25: Petal and Frangipani

LoggingLogging

Frangipani uses a write ahead redo log for metadata

• log records are kept on Petal

Data is written to Petal

• on sync, fsync, or every 30 seconds

• on lock revocation or when the log wraps

Each machine has a separate log

• reduces contention

• independent recovery

Page 26: Petal and Frangipani

RecoveryRecovery

Recovery is initiated by the lock service

Recovery can be carried out on any machine

• log is distributed and available via Petal