Upload
chester-perkins
View
219
Download
0
Embed Size (px)
DESCRIPTION
3 Introduction PeerDB and Related Concepts. p2p vs distribute database system vs BestPeer vs PeerDB. Architecture of PeerDB and Features of PeerDB. ● Sharing Data Without Shared Schema. Information retrieve system and matching strategy. ● Agent Assisted Query Processing. Mobile agent technology. ● Monitoring Statistics for Reconfiguration. ● Cache Management. A Performance Study shows the effectivity of PeerDB.
Citation preview
PeerDB: A P2P-based System for Distributed Data Sharing
Wee Siong Ng Beng Chin Ooi Kian-lee Tan Aoying Zhou
Course Number: CSI 5311Course Name: Distributed Databases and Transaction Processing Professor: Iluju Kiringa
Presented by
Zhihong Li Rasha Tawhid
Winter 2004
2
Outline
Introduction What is PeerDB ? Architecture of PeerDB Features of PeerDB A Performance Study Related works Conclusion and future work
3
Introduction PeerDB and Related Concepts. p2p vs distribute database system vs BestPeer vs PeerDB. Architecture of PeerDB and Features of PeerDB. ● Sharing Data Without Shared Schema. Information retrieve system and matching strategy. ● Agent Assisted Query Processing. Mobile agent technology. ● Monitoring Statistics for Reconfiguration. ● Cache Management. A Performance Study shows the effectivity of PeerDB.
4
What is PeerDB ?
A P2P-based system for distributed data sharing.
A database application is implemented on top of BestPeer.
? P2P system ? Distributed database system ? BestPeer
5
Concepts review:peer-to-peer (P2P) systems
A large number of nodes are pooled together to share their resources (provide and consume data or service).
These nodes can join and leave the P2P network at any time.
Limitation: ● provide only file level sharing (support coarse
granularity). ● No easy way to extend their application quickly to
fulfill new users needs. ● A node’s peers are statically defined.
6
Concepts review:Distributed Database systems
The nodes are added to and removed from the network in a controlled manner.
Data can be shared with a shared schema. They provide the complete set of answers that
satisfy a query. The exact location to direct the query is known.
7
The Features of BestPeer Systems
An adaptive platform for P2P applications. P2P applications can be developed easily and
efficiently on BestPeer. Integrates two technologies: Mobile agents and P2P. Facilitates a finer granularity of data sharing. Also share computational power. A node’s neighbors in the network can be
dynamically reconfigured by itself. Introduces a Location Independent Global Names
Lookup Server (LIGLO) to provide each node with a unique global identity.
8
The Features of PeerDB
The node is a data management system. Supports finer granularity of data sharing. Data can be shared without shared global schema. It combines the power of mobile agents into P2P systems
to perform operations at peers’ sites. The node in the network can dynamically reconfigure it’s
neighbors by itself.
9
Architecture of PeerDB Four components ● Data management system : DBMS, Local Dictionary, Export Dictionary ● DBAgent: Mobile agents: master
agent and worker agent ● Cache manager: Caching remote data in
secondary storage. ● User interface: Users search for data using
SQL-like queries.
10
Features of PeerDB:Sharing Data Without Shared Schema
Objective ● Users manage their (private and sharable) data using DBMS. ● Users share the interested sharable data without sharing schema. Problem There is no predetermined and uniform schema that nodes share. In naming a relation : Different users name “protein” relation by protein name (e.g.,
Kinases, annexin) or after the species (e.g, human,zebrafish) Also similar at attribute level : Some users call the length of sequences as “length” , others
might use the term “len”. Solution Adopting Information Retrieval (IR) based approach.
11
Information Retrieval (IR) Based Approach Create Meta-data for each relation ● The meta-data (schema, keywords, etc) should be
provided by the users upon creation of the table. ● Also the meta-data should be maintained for each relation
name and attributes. ● The relevant data might have same keywords.
Locate matching relations ● Apply Relation-matching Strategy to determine relevant
relations. ● The relations and meta-data returned to the user first, who
then decide which relations will be queried further.
12
Relation-matching Strategy:
● Given a query Q (R, A, C) R : relations A: attributes C: conditions ● Also given a relation D with attributes T ● The set of relations that potentially contain answers to Q are those that have Match(Q,D) above a certain threshold value. ● wtr : relation weight; wta : attribute weight. ● r : 1, relations match; r : 0, otherwise. ● N match (A C, T∪ ) : the number of matching keywords between attributes. ● N (A C∪ ) : the number of distinct keywords for attributes in Q.
13
Illustrate the Strategy with an Example
Suppose we have peers P1, P2, P3 and P4; A query Q is from P1; SELECT SeqId,ProteinSeq FROM Kinases WHERE length > 30 ; Apply the matching strategy: ● P2, P3 and P4 all match query Q. ● P4 will be ranked lower than P2 and P3 ● Semantically, P2’s data are not
interested by P1. Need user to select. ● Return multiple relations from P3.
such as:
14
Features of PeerDB:Agent Assisted Query Processing
Two-Phase query processing strategy. Phase I: ● Locate potential relations using relation matching strategy. ● User selects more relevant relations. Benefit: Minimize information overload. Better utilize the network bandwidth. Phase II: ● Begins after the user has selected the desired relations. ● Directs the query to the nodes containing the desired relations. ● Answers are finally returned. Mobile Agents perform operations at peers’ sites.
15
Query Processing on PeerDB nodes DBAgent component responses for the Query Processing. ● Local query: A query is local to a node if it is initiated there; ● Remote query: Otherwise. Query Processing is completely assisted by mobile agents. ● Master agent : When a query is issued, a master agent is created on the user node to oversee the evaluation of the query. The master agent will clone worker agents (Relation matching agent or
Information Retrieval Agent ) and dispatch them to all neighbors of the node. ● Worker agent: Worker agent works on neighbor nodes and return results to Master
agent.
16
Processing Local Query Phase I
User Interfac
e
Object Management System (DBMS)
DBAgent
Neighbor PeerDBnodes
Export Dictionary
Local Dictionary
PeerDB node
Query
1.User query is sent to dbagent
2.A master agent (MA) is created3.MA extracts the Q(R,A,C) list4.2 MA CloneS relation matching agents (RMAs)
4.1 Match(Q,D) is applied to local dictionary
4.11 Matching relations
4.21 RMAs is dispatched to all neighbors,carries with (a) IP address of the query node(b) TTL (Time-to-live) indicates lifetime of an agent.
4.22 relevant relations and meta-data returned by RMAs
4.23 Answers returned to user
Cache Manager
DBAgent
17
Processing Local Query Phase II
User Interfac
e
Object Management System (DBMS)
DBAgent
Export Dictionary
Local Dictionary
PeerDB node
1. User selects relevant relations semantically.2. Send selected relations to MA
3. MA Clones a data retrieval agent (DRA) for each selected relation4. DRA reformulates the query for a selected relation
4.1 DRA retrieves data from local DBMS if the selected relation is local
5. DMA is dispatched to relevant nodes, carries with (a) IP address of the query node
6. data returned
7. data returned to user
Cache Manager
Neighbor PeerDBnodes
DBAgent
4.2 data returned to Agent
4.3 formulated data returned to user
18
Processing Remote Query Phase I: Relation Matching Agent
User Interfac
e
Object Management System (DBMS)
DBAgent
PeerDB NodeQuery
Export Dictionary
Local Dictionary
PeerDB node
1.Relatin matching agent (RMA) come from query nodeTTL - 1,first time visit
Cache Manager
3. Matched relations returned
DBAgent
2. RMA searches the export dictionary
4.Matched relations returned to query node
Neighbor PeerDBnodes
DBAgent
5. If TTL >0 RMA clones more RMAs and dispatches them to the current node’s neighborsOtherwise, RMA is dropped
19
Processing Remote Query Phase II: Data Retrieval Agent
User Interfac
e
Object Management System (DBMS)
DBAgent
PeerDB NodeQuery
Export Dictionary
Local Dictionary
PeerDB node
1.Data Retrieval agent (DRA) comes from query node
Cache Manager
3. Answers are retrieved, processed
DBAgent
2. DRA formulates an SQL query, submits it to DBMS
4. Answers are returned to query node5. DRA is dropped
20
Features of PeerDB: Monitoring Statistics for Reconfigration
Performed by Master agent on the query node, for reconfiguring the network. Monitors two types of statistics: ● Relation information (schemas, keywords) obtained from Relation Matching
Agent, for exchanging the key words of selected relations. ● The number of answer objects obtained from Data Retrieval Agent, for
determining which nodes are to be connected directly. Reconfiguration policy:
● Favorite nodes are that have most recently provide answers. ● Use the notion of stack distance to measure the temporal locality. ● The top K peers in the stack are retained as the K directly connected peers.
…K
P4P3P2
Pk……
21
Features of PeerDB:Cache Management
Caching answers returned from remote nodes by Cache Manager component.
Reducing the response time for subsequent answers. However, caching raises complicated issues: ● Problem: The cached copy may be outdated. Solution: keeps the answers for a fixed period of time.
● Problem: Caching storage space is limited. Solution: Least Recently Used data is replaced when space runs out.
● Problem: PeerDB nodes may be caching the same data. Solution: All relations, except one, with the same keywords from the
same source node will be pruned away during phase I of query processing.
22
A Performance Study
The experimental environment: ● 32 PCs with Intel Pentium 200MHz processor and 64M of RAM. ● all the PCs are running on WinNT4.0operating system. ● The physical network layout isshown in the Figure.
23
Studies relation matching strategy. ● The lift time of Worker agent is 1. Looks at the performance of PeerDB. ● Effect of Storage Capacity on Caching ● PeerDB vs CS (Client and Server System) ● Benefits of Agent-based Querying
Remark: The extensive experimental studies show that PeerDB is a promising system for distributed processing.
A Performance Study (cont’d)
24
Conclusion and future work A P2P-based distributed data sharing system called PeerDB. PeerDB has several nice features. ● Employs a data management system and shares data without shared
schema. ● Query processing is assisted by mobile agents. ● Reconfigures a node’s peers dynamically by itself. ● Cache management for efficiency. Experimental studies show that PeerDB is a promising system for
distributed processing. Extending in two directions in the future: ● Making a node more intelligent by adopting code-shipping or data
shipping technology. ● Looking for “similar” schema by integrating keyword-based search in
PeerDB.
25
References
[1] W. Ng, B. Ooi, K. Tan and A. Zhou. PeerDB: A P2P-based System for Distributed Data Sharing. The 19th International Conference on Data Engineering 2003. (ICDE 2003).
[2] N. Karnik. Security in Mobile Agent System. http://www.cs.umn.edu/Ajanta/defense/
[3] C. Rijsbergen. Information Retrieval. London: Butterworths, 1979. http://www.dcs.gla.ac.uk/Keith/Preface.html
26
Thanks
Welcome Questions?