24
Pastiche: Making Backup Cheap and Easy Presented by: Boon Thau Loo CS294-4

Pastiche: Making Backup Cheap and Easy

  • Upload
    ivrit

  • View
    35

  • Download
    2

Embed Size (px)

DESCRIPTION

Pastiche: Making Backup Cheap and Easy. Presented by: Boon Thau Loo CS294-4. Outline. Motivation and Goals Enabling Technologies System Design Implementation and Evaluation Conclusion. Motivation. Majority of users do not backup their data. Those who do Don’t backup very often. - PowerPoint PPT Presentation

Citation preview

Page 1: Pastiche: Making Backup Cheap and Easy

Pastiche: Making Backup Cheap and Easy

Presented by: Boon Thau Loo

CS294-4

Page 2: Pastiche: Making Backup Cheap and Easy

Outline Motivation and Goals Enabling Technologies System Design Implementation and Evaluation Conclusion

Page 3: Pastiche: Making Backup Cheap and Easy

Motivation Majority of users do not backup their

data. Those who do Don’t backup very often. Don’t backup everything.

Backup is a significant cost in large organizations.

Why not use excess disk space for backups? File systems are only half-full on average. Disks are cheap.

Page 4: Pastiche: Making Backup Cheap and Easy

Pastiche Goals P2P Backup System Target environment

Cooperation though untrusted machines. End-user machines

Leverage common data when possible for space efficiency (backup “buddies”).

Preserve privacy. Efficient, cost-free, administrative-free

Page 5: Pastiche: Making Backup Cheap and Easy

Enabling Technologies Pastry for self-organizing routing and object

location. Content-based Indexing (Manber94, LBFS)

Identify boundary regions (anchors) that divide file into chunks

Rabin fingerprinting Isolate changes in each chunk. SHA-1 hash of each chunk

Convergent encryption (used by FARSITE) Encrypt file using key derived from file’s contents. Further encrypt using client’s key. Encrypted key is stored with file in FARSITE

Page 6: Pastiche: Making Backup Cheap and Easy

System Design Data Chunks File meta-data Abstracts Joining Pastry Finding Backup buddies Backup Protocol Restoration Failures and Malicious Nodes Greed Prevention

Page 7: Pastiche: Making Backup Cheap and Easy

Data Chunks Data is stored on disk as immutable chunks.

Content-based indexing + convergent encryption Chunks are stored for local host and/or on

backup clients. Each chunk carries owner lists and maintains

reference count. When a newly written file is closed, it is

scheduled for chunking:Hc – Handle

Ic – Chunk ID

Kc – Encryption key

Chunk ID list forms file signature.

Page 8: Pastiche: Making Backup Cheap and Easy

Data Chunks (Cont…) Backup Request:

Remote hosts must supply public key with backup request.

If chunk exist, add requesting host to owner list. Local reference count is incremented.

Delete Request: Requests from remote hosts must be signed by secret

key. Check against public key (cached from earlier backup

request) When reference count = 0, chunk is removed.

Page 9: Pastiche: Making Backup Cheap and Easy

File Meta-data File meta-data

List of handles Hc for chunks comprising the file.

Ownership, permissions, creation and modification times.

Mutable with fixed Hc, Kc and Ic File system root meta-data: Hc

generated based on host-specific passphrase.

Page 10: Pastiche: Making Backup Cheap and Easy

Abstracts Initial backup of a freshly installed

machine is most expensive. Goal: Find a good buddy that owns all or

most of your data chunks. Naïve solution: Ship full signature of new

node around. Expensive: 20 bytes per chunk for a 16KB

chunk. Solution: Send a random subset of

signatures called an abstract.

Page 11: Pastiche: Making Backup Cheap and Easy

Joining Pastry Pastry:

Self-organizing, p2p overlay Each node maintains

• Leaf set: L/2 closest smaller (larger) nodeIDs• Neighborhood set: Closest nodes according to

proximity metric• Routing table: Prefix routing

Join Pastry overlay with nodeID set to Hash(hostname)

Find backup buddies…

Page 12: Pastiche: Making Backup Cheap and Easy

Finding Backup Buddies After joining network, route Pastry message with

abstract to a random nodeID. Each node along the route returns its coverage

(fraction of chunks in abstract stored locally) with the abstract

Lighthouse sweep: Rotating probe process repeated if there are insufficient candidate set by varying first digit of original nodeID

Page 13: Pastiche: Making Backup Cheap and Easy

Not Enough Buddies? Each node tries to find 5 buddies. What if you can’t find enough buddies? Real possibility for rare installations Create coverage-rate Pastry overlay

Replace network proximity distance metric with coverage-rate.

Pastry neighbor set: set of nodes encountered during join with best coverage available.

Find buddies in the neighborhood set A is a buddy for B, but may not vice versa (no

symmetry) Possibility of malicious nodes to misreport

coverage.

Page 14: Pastiche: Making Backup Cheap and Easy

Backup Protocol Each Pastiche node controls its own

archival plan. Snapshot: a discrete backup event. Meta-data skeleton for each snapshot stored

on per-file logs. State necessary for new snapshot: Add set,

delete set, meta-data list

Page 15: Pastiche: Making Backup Cheap and Easy

Backup Protocol (Cont..) Snapshot process (A stores snapshot on

B): A sends public key to B (for future validation) A forwards chunkIDs of add set to B. B fetch chunks not already stored locally. A sends delete list (signed with A’s private

key) A sends updated meta-data. A sends commit request, B responds when all

changes are persistent.

Page 16: Pastiche: Making Backup Cheap and Easy

Restoration Partial restores is straightforward. Obtain

chunks from buddy. Recover entire machine

Keep copy of root meta-data object in each member of leaf set.

Rejoin with same nodeID (based on hostname) Retrieve root meta-data object from any node

in leaf set. Root block contain list of buddies.

Page 17: Pastiche: Making Backup Cheap and Easy

Detecting Failure and Malice Failures:

Buddy can drop chunks if it runs out of disk space. Buddy may crash or leave the network. Malicious buddy may pretend to store your chunks.

Solutions: Before taking a new snapshot, query buddies for

random subset of chunks. Provides instantaneous assurance.

Periodic probing of buddy: Analysis shows that checking 0.1% of all chunks is enough.

Sybil attack? Malicious party occupy substantial fraction of nodeID space.

Page 18: Pastiche: Making Backup Cheap and Easy

Greed Prevention Greedy host can consumes storage. Three solutions:

Group backup clients based on resources consumed.

Cryptographic puzzles according to storage consumed.

Electronic currency • Currency accounting: requires atomicity

between exchange of currency and backup.

Page 19: Pastiche: Making Backup Cheap and Easy

Implementation Chunkstore file system

Container files – LRU cache of decrypted, recently used files for performance.

Chunks increase internal fragmentation. Backup daemon

Server: Manages remote requests for storage and restoration.

Client: Supervises selection of buddies and snapshots.

Page 20: Pastiche: Making Backup Cheap and Easy

Evaluation Compare ext2fs with chunkstore on modified Andrew benchmark:

Total overhead of 7.4% is reasonable.

Overheads due to meta-data management, and Rabin fingerprints computation (for finding anchors)

Backup and restore compares favorably to NFS cross-machine copy.

Conclusion: service does not penalize file system performance unduly.

Page 21: Pastiche: Making Backup Cheap and Easy

Evaluation (Cont…) Question: How large must the abstract be?

Compare machines with a freshly installed machine

Abstract size does not seem to matter much.

Page 22: Pastiche: Making Backup Cheap and Easy

Evaluation (Cont…) Question: How effective is the lighthouse sweep

in discovering buddies? Simulation: 50000 Pastiche nodes with 11 types of

nodes. Lighthouse is good enough for common nodes

(>=10%). Rare nodes would require coverage-rate overlay.

Page 23: Pastiche: Making Backup Cheap and Easy

Evaluation (Cont…)

For a neighborhood size of 256, 85% were able to find at least one buddy. 72% found at least 5.

Neighborhood size matters!

Question: How effective is the coverage-rate overlay in discovering buddies? 10000 nodes 3 types of nodes

• One of a thousand species (Same species share 70% of content)• One of a hundred genera (30%)• One of ten orders (20%)

Only same species can back each other up.

Page 24: Pastiche: Making Backup Cheap and Easy

Conclusion Pastiche: P2P backup mechanism. What is Pastiche engineered mostly for? What do end-users backup?

Data files (Overlap is minimal) Applications (Lots of overlaps, but would you

back up your apps?) Privacy? Closely coupled with Pastry

Lighthouse sweep. Needs large neighborhood set.