44
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

Peer-to-peer archival data trading

Brian Cooper

Joint work with Hector Garcia-Molina

(and others)Stanford University

Page 2: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

2 Data trading

Problem: Fragile Data

Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business

Page 3: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

3 Data trading

Replication-based preservation

Page 4: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

4 Data trading

Replication-based preservation

Page 5: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

5 Data trading

Motivation

Several systems use replication Preserve digital collections SAV, others

Archival part of digital library Individual organizations cooperate Not a lot of money to spend

Page 6: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

6 Data trading

Goal Reliable replication of digital collections Given that

Resources are limited Sites are autonomous Not all sites are equal

Traditional methods Central control Random Replicate popular

Metric Reliability Not necessarily “efficiency”

Page 7: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

7 Data trading

Our solution

Data trading “I’ll store a copy of your collection if you’ll store

a copy of mine” Sites make local decisions

Who to trade with How many copies to make How much space to provide Etc.

Page 8: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

8 Data trading

Trading network A series of binary, peer-to-peer trading

links

A

D

B

H

C

E

G

F

Page 9: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

9 Data trading

Reliability layer

Archived data

Architecture

Users

Users

Filesystem

InfoMonitor

SAV ArchiveSAV Archive

Archived data

Internet

Local archive

Remote archive

Reliability layer

Service layer

This architecture developed with Arturo Crespo

Page 10: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

10 Data trading

Overview

Trading model Trading algorithm Optimizing (and simulating)

trading Some results Some stuff we are still working on

Page 11: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

11 Data trading

Trading model

Page 12: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

12 Data trading

Trading model Archive site: an autonomous archiving

provider

Page 13: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

13 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials

Page 14: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

14 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections

Page 15: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

15 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials

Page 16: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

16 Data trading

Trading model Archive site: an autonomous archiving

provider Digital collection: a set of related digital

materials Archival storage: stores locally and remotely

owned digital collections Archiving client: deposit and retrieve

materials Data reliability: probability that data is not

lost

Page 17: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

17 Data trading

Deeds

A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred

Trading algorithm Sites trade deeds Sites exercise deeds to

replicate collections

Deed for spaceFor use by: Library of Congress

or for transfer

623 gigabytes

Stanford University

Page 18: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

18 Data trading

C

A B

Deed trading

Collection 1

Collection 1

Collection 2

Collection 2 Collectio

n 3Collection 3

Page 19: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

19 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2

Collection 1

Collection 2

Collection 3

Page 20: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

20 Data trading

C

The challenge

A B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 21: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

21 Data trading

Alternative solutions

Are there other ways besides trading?

Page 22: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

22 Data trading

Other solutions: central control

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 23: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

23 Data trading

Other solutions: client-based

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 24: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

24 Data trading

Other solutions: random

CA B

Collection 3

Collection 1

Collection 2Collection

1

Collection 3 Collection

2

Collection 3

Page 25: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

25 Data trading

Why is trading good?

High reliability Framework for replication

Site autonomy Make local decisions No submission to external authority

Fairness Contribute more = more reliability Must contribute resources

A

D

B

H

C

E

G

F

Page 26: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

26 Data trading

Decisions facing an archive Who to trade with How much to trade When to ask for a trade Providing space Advertising space Picking a number of copies Coping with varying site reliabilities What to do with acquired resources How to deliver other services

Many many degrees of freedom!

Page 27: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

27 Data trading

Our approach Define a basic trading protocol

Deed trading Assume all sites follow same rules

Basic system for trading Extend: not all sites are equal

Some are more reliable or trusted Extend: sites have freedom to negotiate

Bid trading Extend: some sites are malicious

Ensure documents survive despite evildoers For each model, what policies are best?

Page 28: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

28 Data trading

How do we evaluate policies?

Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy

Page 29: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

29 Data trading

Simulation parameters

Number of sites 2 to 15

Site reliability 0.5 to 0.8

Collections per site

4 to 25

Data per collection

50 Gb to 1000 Gb

Space per site 2x data to 7x data

Replication goal 2 to 15 copies

Scenarios per simulation

200

Page 30: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

30 Data trading

Reliability

Site reliability Will a site fail? Example: 0.9 = 10% chance of failure

Data reliability How safe is the data? Despite site failures Example: 320 year MTTF

Page 31: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

31 Data trading

Basic trading approach

How does trading work? Assuming all sites follow “the rules”

Example: advertising policy

“Let’s trade. How much space do you have?”

A B

Page 32: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

32 Data trading

Advertising policy

“I have 120 GB”120 GB

Space fractional policy

“I have 60 GB”60 GB

Data proportional policy

“I have 40 GB”

40 GB

40 GBData

A B

A B

A B

Page 33: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

33 Data trading

Result

0

0.2

0.4

0.6

0.8

1

1.2

2 3 4 5 6 7

Global FG (storage space as a multiple of data size)

Glo

bal

rel

iab

ilit

y (p

rob

abil

ity

of

no

dat

a lo

ss)

Space-fractional Data-proportional

Page 34: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

34 Data trading

Extend: some sites > others May prefer certain sites

More reliable Better reputation Part of same system

Example: who to trade with?

??

?A

Page 35: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

35 Data trading

1

10

100

1000

10000

0.5 0.6 0.7 0.8 0.9

Local site reliability

Av

era

ge

loc

al d

ata

MT

TF

Clustering MostReliable ClosestReliability

Who to trade with?

Page 36: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

36 Data trading

Extend: freedom to negotiate

Bid for trades

“80 GB”

“95 GB”

“120 GB”

“How much do I pay for 100 GB of your space?”

A

Page 37: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

37 Data trading

Bid trading

Questions When do I call auctions? How much do I bid? Can I take advantage of the system

by being clever?

Page 38: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

38 Data trading

Extend: some sites are malicious

Secure services Publish: Makes copies to survive failures Search: Find documents Retrieve: Get a copy of a document

Challenges Attacker may delete copy Attacker may provide fake search results Attacker may provide altered document Attacker may disrupt message routing …

Joint work with Mayank Bawa and Neil Daswani

Page 39: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

39 Data trading

Current and future work

Access Support searching over collections Distribute indexes via trading

Prototype implementation Basic SAV architecture implemented Trading protocol/policies must be

added Develop security techniques

further

Page 40: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

40 Data trading

Current and future work Other topics of interest

Designing peer-to-peer primitives Building other p2p services

Other ways of acquiring data How to archive active systems

Semantic archiving Managing “format obsolescence” Finding data once it is archived

Page 41: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

41 Data trading

Other parts of SAV project SAV data model

Write-once objects Signature-based naming

How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.)

Modeling archival repositories Arturo Crespo Choose best components and design

Page 42: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

42 Data trading

Related work Peer-to-peer replication

SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems

RAID, mirrored disks, replicated databases Caching systems (Andrew, Coda) Deep storage (Tivoli)

Barter/auction based systems ContractNet

Distributed resource allocation File Allocation Problem

Page 43: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

43 Data trading

Conclusion Important, exciting area

Preservation critical Difficult to accomplish

Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions

Trading networks replicate data Model for trading networks Trading algorithm Simulation results

A

D

B

H

C

E

G

F

Page 44: Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University

44 Data trading

For more information

[email protected] http://www-diglib.stanford.edu/