Upload
neylan
View
30
Download
0
Embed Size (px)
DESCRIPTION
PeerDB: A P2P-based System for Distributed Data Sharing. Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou. Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/05/03. Overview. A P2P “database” system Allows content-based search No global schema Utilizes mobile agents - PowerPoint PPT Presentation
Citation preview
PeerDB: A P2P-based System for Distributed Data Sharing
Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou
Shawn Jeffery
CS294-4 Peer-to-Peer Systems
11/05/03
11/05/03 Shawn Jeffery PeerDB 2
Overview
A P2P “database” system Allows content-based search
No global schema Utilizes mobile agents
Provides flexibility and extensibility Dynamically adjusts topology
11/05/03 Shawn Jeffery PeerDB 3
Background: P2P vs Distributed Databases
P2P DDBMS
Membership Ad-hoc Controlled
Schema No global schema
Shared (or at least some way to mediate)
Query result set Incomplete Complete
Content location “Word of mouth”
Shared catalog
11/05/03 Shawn Jeffery PeerDB 4
BestPeer Generic P2P platform Mobile Agents
Carry code and data Collect stats Security issues?
Dynamic Reconfiguration How does this compare to Gia?
Location Independent Global Names Lookup (LIGLO) Servers Small number Provides a global identity for peers and peer status Why not use a DHT/KBR/DOLR?
11/05/03 Shawn Jeffery PeerDB 5
BestPeer Security
Private and sharable data Agents only able to access sharable data Does this adequately restrict the power
of mobile agents? Communications on the wire also
encrypted What’s missing?
11/05/03 Shawn Jeffery PeerDB 6
Architecture
Sharable Data
Local Data
Database
11/05/03 Shawn Jeffery PeerDB 7
Schema “Mediation” Problems with supporting SQL queries:
No global schema information Different nodes could name the same
table/attribute differently (“len”, “length”) Solution: User supplies metadata for each
relation name and attribute Users expected to do a lot
Formula based on matching relation keywords and attribute keywords to determine if a query matches a table
What about other schema mediation work (such as Piazza)?
11/05/03 Shawn Jeffery PeerDB 8
Local Query Processing – Phase I
“Master Agent” coordinates the entire affair Check Local Dictionary for matching
relations Use the relation matching strategy even for the
local DB Create “Relation Matching Agents” and
flood to all neighbors Wait for responses
Display results to user as they arrive
11/05/03 Shawn Jeffery PeerDB 9
Local Query Processing – Phase II
User selects the relations he/she wants Create a “Data Retrieval Agent” Rewrite query in terms of new relations
If local, submit SQL to local db Contact remote nodes directly to
access the data Creates remote join plans locally -
optimization?
11/05/03 Shawn Jeffery PeerDB 10
Remote Query Processing Phase I: Find relations
Relation Matching Agents flood with TTL Check Export Dictionary for a match
Return matches directly Phase II: Get data
Data Retrieval Agent submits SQL to DBMS Return data to the requesting node directly Run further data processing before returning
Again, security issues
11/05/03 Shawn Jeffery PeerDB 11
Statistics Master Agents monitor stats in the network Keywords for some relations returned
during Phase I Update metadata
Number of objects returned for selected relations Can be used for topology change decisions
Use most recently returned results as metric to determine who to connect with Frequent updates – might need to change
neighbors after each result returned
11/05/03 Shawn Jeffery PeerDB 12
Caches
Cache all query results locally Soft state LRU replacement Users choose which copy they want
Only provided with peer id and an indication of which is the source
What about timestamp, etc? Again, user heavily involved
11/05/03 Shawn Jeffery PeerDB 13
Relation Matching Performance
Significant tradeoff between precision and recall
Which is more important? Is their approach acceptable?
11/05/03 Shawn Jeffery PeerDB 14
Experimental Methodology
Compare P2P Model vs Client/Server model CS returns via the search path (?)
Compare static vs reconfigurable networks
Compare agent vs message based approach
32 Nodes Is this enough?
11/05/03 Shawn Jeffery PeerDB 15
Evaluation Scenarios (Metrics?)
Fixed set of nodes Easily test P2P protocols, Reconfiguration
strategies Latency Quality and Quantity What else is important?
11/05/03 Shawn Jeffery PeerDB 16
Performance As you increase the amount of storage on
each node, latency decrease Due to caching
In general, reconfiguration performs better Response times O(1 Minute)
Is this acceptable? Agent based shown to be better
What if agent produces more data than it processes?
11/05/03 Shawn Jeffery PeerDB 17
Discussion: A P2P DBMS?
PeerDB represents a tiny step towards a P2P DB (also PIER, Piazza) What does it do right? What else is needed? Is it ideal to have a P2P DB? Is it feasible?