Peer-to-peer Information SystemsUniversität des Saarlandes
Max-Planck-Institut für Informatik – AG5: Databases and Information Systems Group
Prof. Dr.-Ing. G. Weikum
Jörg Diesinger
WS 2003/04 - 25.11.2003
Load Management
introducing
FARSITE:Federated, Available, and Reliable Storagefor an Incompletely Trusted Environment
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
Centralized file server system
• Vulnerable to geographically localized faults• „Single-Point-of-Failure“
• Expensive hardware components (high-performance I/O,
RAID, CPU, etc.)
• Central administration required• System reliability depends on administrators
competence• System security depends on administrators
trustworthiness
• Backups are expensive and time-consuming
• Targets for malicious attacks and data theft
• Not scalable
Motivation
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
What is FARSITE?
• Serverless, distributed file system• Build on an existing set of desktop workstations• Runs entirely on clients
• Logically: centralized file server
• Physically: distributed among a set of desktop machines
• Symbiotic: working among cooperating but not completely
trusting clients
• Enables technology trends• Increase in unused disk capacity on client desktop machines• Decrease in computational cost of cryptographic operations
relative to I/O operations
Federated, Available, and Reliable Storage
for an Incompletely Trusted Environment
Systems and NetworkingResearch Group
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
• Objectives
• Design Assumptions
• Implementation• Fundamental Concepts• System Architecture• System Enhancements• Request Example
• Features
• Manageability
• Summary
Outline
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Objectives
• Harness resources of loosely coupled insecure, unreliable
machines
• Reliable file storage service
• Protect and preserve file data and directory metadata
• Heterogeneous soft- and hardware environment
• Data availability, data reliability
• Data security, privacy without centrally trusted authority
• Data consistency
• Data integrity
• Self-tuning, automatically configuring system
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Design Assumptions
• Desktop workstations of large corporations, universities
• High-bandwidth network
• Total scale: ~105 machines
• Total files: ~1010
• Total bytes: ~1016
• Large fraction of users try to read data without having
granted access
• No user-sensitive data persits beyond user logoff or
system reboot(not realizable by prototype operating system MS Windows!)
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Implementation
• Namespace Roots• Hierarchical directory tree representing the file repository• A specified set of machines manages the root (directory group)• Multiple roots are allowed (multiple virtual file server)
• Certificates• Semantically meaningful data structures• Signed with private key
Namespace Certificates• Associates the namespace root with a set of machines managing the root
User Certificates• Associates a user with his personal public key
Machine Certificates• Associates a machine with its public key
• Trust• Machines accept authorizations of any certificate that can be
validated with one or more public keys
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – System Architecture
Every machine in FARSITE may perform 3 roles:
• Client• Directly interacts with a user
• Member of Directory Group• Manage file metadata using a Byzantine-fault-tolerant-protocol (a third of members can
fail)
• File Host• Manage file content
One client‘s perspective
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Implementation (Enhancements)
• Local caching of file content on the client• Improves read performance
• Assign leases on requested files to the clients for a
specified period of time• Client operates locally on the files with cached file copy
• Delay pushing updates from the client to the directory
group• Reduces network traffic
• Client encrypts file data with all authorized public user
keys• Read-access control (user privacy)
• Directory group cryptographically validates user requests• Write-access control
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Implementation (Enhancements)
• Reduce replication factor for file hosts• Improves Byzantine-fault-tolerant-protocol agreement of
directory group replication:Tolerates failures of all but one machine
• Indirection pointers and secure hash of file content in the
directory group
• Directory group can delegate parts of its namespace to
other
(randomly selected) known machines• Reduces storage and/or operation load
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Read/Write Request Example
file request
namespace certificate, lease, file content secure hash, list of file hosts
validate file with secure hash,decrypt with private key
updated secure hash
verify write permission
address of client
filerequest
filecontent encrypte
dfile
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Features
Reliability / Availability
• File data replication on multiple file hosts (RF)
• RF – 1 file hosts can be unavailable
• Metadata replication among members (RD) of a directory group
(RD – 1) / 3 members can be unavailable
(Byzantine-fault-tolerant-protocol )
• Migration of one machine‘s functionality to one or more other
machines• Prevents permanently data loss
• Continuously relocate file replicas at a sustainable background rate• Swap machine locations of replicas of high-/low-availability files• Equalizes file availability
• Caching file data on client machines• Specified time interval for keeping data: ~ 1 week
(„cache retention period “)
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Features
Security
• Access Control• Metadata includes „access control list (ACL)“ containing all
public keysof authorized users for file/directory writing
• Privacy• Encryption of all user-sensitive file content and metadata:
„convergent encryption “(1)Secure hash for encryption of each data block of file(2)A randomly generated file key is encrypted using the public keys
of all authorized readers(3)The file key encrypts the hashes
• Enables client to write individual file blocks without rewriting the entire file or waiting for finished download
• Integrity• Integrity of directory metadata maintained by Byzantine-fault-
tolerant-protocol • Integrity of file data ensured by computing a hash tree over
file data blocks, stored in the file itself and in the directory group
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Features
Consistency (1)
File and directory control can be loaned to client machines.4 lease mechanisms are implemented for directory groups for consistency:
• Content leases (data consistency)Client machines can control file content
• Read/write control• Read-only control
• Name leases (namespace consistency)
Client machines can control a name of file or directory in the namespace
• Create new file (or sub-directory)• Rename file (or directory)
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Features
Consistency (2)
• Mode leasesClient machines can control file-sharing semantics by providing 6 types of mode leases
• Read, Write, Delete• Exclude-Read, Exclude-Write, Exclude-Delete
• Access leasesClient machines can control file-deletion semantics by providing 3 types of access leases
• Public (indicates an opened file)• Protected (public incl. no other access lease is granted)• Private (protected incl. no other access lease is active)
Deletion is not performed until file is closed by all lease holders
• Leases include expiration times depending on the type of
lease
• Number of leases per file is limited for performance
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Mode Lease Request Example
Read access request
Readmode lease
with read-sharing
withexclude-write,exclude-deletemode leases
Write access request
Conflict?Ask for revoking or downgrading
Write mode lease
Informationabout conflict
or
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Features
Scalability
Mechanisms to keep computation, communication and storage
from growing with the system size:
• Hint-based pathname translation
Problem:• Requesting a file with particular pathname• Which directory group manages the file information?
Solution:• Client caches file pathnames with mapping to responsible directory
group• Algorithm:
Translate file path by finding longest-matching path prefix in the cache and
contact the responsible directory group(1)Directory group manages the pathname -> STOP(2)Directory group manages a path prefix, it responses with all its
delegation certificates, which the client adds to its cache -> REPEAT(3)Directory group does not manage a path prefix, it informs the client,
which removes the pathname hint from its cache -> REPEAT
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Manageability
Autonomic System Operations (1)
Self-administration in a Byzantine-fault-tolerant way• Initiated as lazy follow-ups after client operations
(e.g. file/metadata updates)• Initiated as continuously performed background tasks
(e.g. file replication/relocation, directory delegation/migration)
• Conception(1)A single remote machine initiates an operation(2)The operation is performed by a Byzantine-fault-tolerant
directory group(3)The group modifies the shared state of its group members and
returns a result to the client machine
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
FARSITE – Manageability
Autonomic System Operations (2)
• Timed Byzantine Operations
Problem:• Initiate operations in response to a timer• Clocks of directory group members cannot be perfectly
synchronized
Solution:• Replicated state includes RD member times for RD group
members• Largest member time is regarded as group time
(1)Client‘s local time indicates to perform timed operation(2)Invoke Byzantine protocol to update replicated member time to
client machine‘s local time(3)Update changes group time(4)Perform all operations with scheduled time <= new group time
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
Summary
FARSITE is a• Scalable, decentralized network file system
• Loosley coupled collection of insecure, unreliable machines
• Secure, reliable virtual file server
FARSITE provides• Availability and reliability through replication
• Privacy and authentication through cryptography
• Integrity through Byzantine-fault-tolerant techniques
• Consistency through leases
• Scalability through namespace delegation
• Performance by local file caching, hint-based pathname translation, lazy update commit
FARSITE manages workload of directory group by• Hint-based pathname translation
• Local caching of file content
• Lazy update commit
WS 2003/04 - 25.11.2003 FARSITE: Federated, Available and Reliable Storage for an Incompletely Trusted Environment
Questions