17
PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/05/03

PeerDB: A P2P-based System for Distributed Data Sharing

  • Upload
    neylan

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

PeerDB: A P2P-based System for Distributed Data Sharing. Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou. Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/05/03. Overview. A P2P “database” system Allows content-based search No global schema Utilizes mobile agents - PowerPoint PPT Presentation

Citation preview

Page 1: PeerDB: A P2P-based System for Distributed Data Sharing

PeerDB: A P2P-based System for Distributed Data Sharing

Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou

Shawn Jeffery

CS294-4 Peer-to-Peer Systems

11/05/03

Page 2: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 2

Overview

A P2P “database” system Allows content-based search

No global schema Utilizes mobile agents

Provides flexibility and extensibility Dynamically adjusts topology

Page 3: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 3

Background: P2P vs Distributed Databases

P2P DDBMS

Membership Ad-hoc Controlled

Schema No global schema

Shared (or at least some way to mediate)

Query result set Incomplete Complete

Content location “Word of mouth”

Shared catalog

Page 4: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 4

BestPeer Generic P2P platform Mobile Agents

Carry code and data Collect stats Security issues?

Dynamic Reconfiguration How does this compare to Gia?

Location Independent Global Names Lookup (LIGLO) Servers Small number Provides a global identity for peers and peer status Why not use a DHT/KBR/DOLR?

Page 5: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 5

BestPeer Security

Private and sharable data Agents only able to access sharable data Does this adequately restrict the power

of mobile agents? Communications on the wire also

encrypted What’s missing?

Page 6: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 6

Architecture

Sharable Data

Local Data

Database

Page 7: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 7

Schema “Mediation” Problems with supporting SQL queries:

No global schema information Different nodes could name the same

table/attribute differently (“len”, “length”) Solution: User supplies metadata for each

relation name and attribute Users expected to do a lot

Formula based on matching relation keywords and attribute keywords to determine if a query matches a table

What about other schema mediation work (such as Piazza)?

Page 8: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 8

Local Query Processing – Phase I

“Master Agent” coordinates the entire affair Check Local Dictionary for matching

relations Use the relation matching strategy even for the

local DB Create “Relation Matching Agents” and

flood to all neighbors Wait for responses

Display results to user as they arrive

Page 9: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 9

Local Query Processing – Phase II

User selects the relations he/she wants Create a “Data Retrieval Agent” Rewrite query in terms of new relations

If local, submit SQL to local db Contact remote nodes directly to

access the data Creates remote join plans locally -

optimization?

Page 10: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 10

Remote Query Processing Phase I: Find relations

Relation Matching Agents flood with TTL Check Export Dictionary for a match

Return matches directly Phase II: Get data

Data Retrieval Agent submits SQL to DBMS Return data to the requesting node directly Run further data processing before returning

Again, security issues

Page 11: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 11

Statistics Master Agents monitor stats in the network Keywords for some relations returned

during Phase I Update metadata

Number of objects returned for selected relations Can be used for topology change decisions

Use most recently returned results as metric to determine who to connect with Frequent updates – might need to change

neighbors after each result returned

Page 12: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 12

Caches

Cache all query results locally Soft state LRU replacement Users choose which copy they want

Only provided with peer id and an indication of which is the source

What about timestamp, etc? Again, user heavily involved

Page 13: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 13

Relation Matching Performance

Significant tradeoff between precision and recall

Which is more important? Is their approach acceptable?

Page 14: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 14

Experimental Methodology

Compare P2P Model vs Client/Server model CS returns via the search path (?)

Compare static vs reconfigurable networks

Compare agent vs message based approach

32 Nodes Is this enough?

Page 15: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 15

Evaluation Scenarios (Metrics?)

Fixed set of nodes Easily test P2P protocols, Reconfiguration

strategies Latency Quality and Quantity What else is important?

Page 16: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 16

Performance As you increase the amount of storage on

each node, latency decrease Due to caching

In general, reconfiguration performs better Response times O(1 Minute)

Is this acceptable? Agent based shown to be better

What if agent produces more data than it processes?

Page 17: PeerDB: A P2P-based System for Distributed Data Sharing

11/05/03 Shawn Jeffery PeerDB 17

Discussion: A P2P DBMS?

PeerDB represents a tiny step towards a P2P DB (also PIER, Piazza) What does it do right? What else is needed? Is it ideal to have a P2P DB? Is it feasible?