26
PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong Ng Beng Chin Ooi Kian-lee Tan Aoying Zhou Course Number: CSI 5311 Course Name: Distributed Databases and Transaction Processing Professor: Iluju Kiringa Presented by Zhihong Li Rasha Tawhid Winter 2004

PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

Embed Size (px)

DESCRIPTION

3 Introduction PeerDB and Related Concepts. p2p vs distribute database system vs BestPeer vs PeerDB. Architecture of PeerDB and Features of PeerDB. ● Sharing Data Without Shared Schema. Information retrieve system and matching strategy. ● Agent Assisted Query Processing. Mobile agent technology. ● Monitoring Statistics for Reconfiguration. ● Cache Management. A Performance Study shows the effectivity of PeerDB.

Citation preview

Page 1: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

PeerDB: A P2P-based System for Distributed Data Sharing

Wee Siong Ng Beng Chin Ooi Kian-lee Tan Aoying Zhou

Course Number: CSI 5311Course Name: Distributed Databases and Transaction Processing Professor: Iluju Kiringa

Presented by

Zhihong Li Rasha Tawhid

Winter 2004

Page 2: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

2

Outline

Introduction What is PeerDB ? Architecture of PeerDB Features of PeerDB A Performance Study Related works Conclusion and future work

Page 3: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

3

Introduction PeerDB and Related Concepts. p2p vs distribute database system vs BestPeer vs PeerDB. Architecture of PeerDB and Features of PeerDB. ● Sharing Data Without Shared Schema. Information retrieve system and matching strategy. ● Agent Assisted Query Processing. Mobile agent technology. ● Monitoring Statistics for Reconfiguration. ● Cache Management. A Performance Study shows the effectivity of PeerDB.

Page 4: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

4

What is PeerDB ?

A P2P-based system for distributed data sharing.

A database application is implemented on top of BestPeer.

? P2P system ? Distributed database system ? BestPeer

Page 5: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

5

Concepts review:peer-to-peer (P2P) systems

A large number of nodes are pooled together to share their resources (provide and consume data or service).

These nodes can join and leave the P2P network at any time.

Limitation: ● provide only file level sharing (support coarse

granularity). ● No easy way to extend their application quickly to

fulfill new users needs. ● A node’s peers are statically defined.

Page 6: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

6

Concepts review:Distributed Database systems

The nodes are added to and removed from the network in a controlled manner.

Data can be shared with a shared schema. They provide the complete set of answers that

satisfy a query. The exact location to direct the query is known.

Page 7: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

7

The Features of BestPeer Systems

An adaptive platform for P2P applications. P2P applications can be developed easily and

efficiently on BestPeer. Integrates two technologies: Mobile agents and P2P. Facilitates a finer granularity of data sharing. Also share computational power. A node’s neighbors in the network can be

dynamically reconfigured by itself. Introduces a Location Independent Global Names

Lookup Server (LIGLO) to provide each node with a unique global identity.

Page 8: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

8

The Features of PeerDB

The node is a data management system. Supports finer granularity of data sharing. Data can be shared without shared global schema. It combines the power of mobile agents into P2P systems

to perform operations at peers’ sites. The node in the network can dynamically reconfigure it’s

neighbors by itself.

Page 9: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

9

Architecture of PeerDB Four components ● Data management system : DBMS, Local Dictionary, Export Dictionary ● DBAgent: Mobile agents: master

agent and worker agent ● Cache manager: Caching remote data in

secondary storage. ● User interface: Users search for data using

SQL-like queries.

Page 10: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

10

Features of PeerDB:Sharing Data Without Shared Schema

Objective ● Users manage their (private and sharable) data using DBMS. ● Users share the interested sharable data without sharing schema. Problem There is no predetermined and uniform schema that nodes share. In naming a relation : Different users name “protein” relation by protein name (e.g.,

Kinases, annexin) or after the species (e.g, human,zebrafish) Also similar at attribute level : Some users call the length of sequences as “length” , others

might use the term “len”. Solution Adopting Information Retrieval (IR) based approach.

Page 11: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

11

Information Retrieval (IR) Based Approach Create Meta-data for each relation ● The meta-data (schema, keywords, etc) should be

provided by the users upon creation of the table. ● Also the meta-data should be maintained for each relation

name and attributes. ● The relevant data might have same keywords.

Locate matching relations ● Apply Relation-matching Strategy to determine relevant

relations. ● The relations and meta-data returned to the user first, who

then decide which relations will be queried further.

Page 12: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

12

Relation-matching Strategy:

● Given a query Q (R, A, C) R : relations A: attributes C: conditions ● Also given a relation D with attributes T ● The set of relations that potentially contain answers to Q are those that have Match(Q,D) above a certain threshold value. ● wtr : relation weight; wta : attribute weight. ● r : 1, relations match; r : 0, otherwise. ● N match (A C, T∪ ) : the number of matching keywords between attributes. ● N (A C∪ ) : the number of distinct keywords for attributes in Q.

Page 13: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

13

Illustrate the Strategy with an Example

Suppose we have peers P1, P2, P3 and P4; A query Q is from P1; SELECT SeqId,ProteinSeq FROM Kinases WHERE length > 30 ; Apply the matching strategy: ● P2, P3 and P4 all match query Q. ● P4 will be ranked lower than P2 and P3 ● Semantically, P2’s data are not

interested by P1. Need user to select. ● Return multiple relations from P3.

such as:

Page 14: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

14

Features of PeerDB:Agent Assisted Query Processing

Two-Phase query processing strategy. Phase I: ● Locate potential relations using relation matching strategy. ● User selects more relevant relations. Benefit: Minimize information overload. Better utilize the network bandwidth. Phase II: ● Begins after the user has selected the desired relations. ● Directs the query to the nodes containing the desired relations. ● Answers are finally returned. Mobile Agents perform operations at peers’ sites.

Page 15: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

15

Query Processing on PeerDB nodes DBAgent component responses for the Query Processing. ● Local query: A query is local to a node if it is initiated there; ● Remote query: Otherwise. Query Processing is completely assisted by mobile agents. ● Master agent : When a query is issued, a master agent is created on the user node to oversee the evaluation of the query. The master agent will clone worker agents (Relation matching agent or

Information Retrieval Agent ) and dispatch them to all neighbors of the node. ● Worker agent: Worker agent works on neighbor nodes and return results to Master

agent.

Page 16: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

16

Processing Local Query Phase I

User Interfac

e

Object Management System (DBMS)

DBAgent

Neighbor PeerDBnodes

Export Dictionary

Local Dictionary

PeerDB node

Query

1.User query is sent to dbagent

2.A master agent (MA) is created3.MA extracts the Q(R,A,C) list4.2 MA CloneS relation matching agents (RMAs)

4.1 Match(Q,D) is applied to local dictionary

4.11 Matching relations

4.21 RMAs is dispatched to all neighbors,carries with (a) IP address of the query node(b) TTL (Time-to-live) indicates lifetime of an agent.

4.22 relevant relations and meta-data returned by RMAs

4.23 Answers returned to user

Cache Manager

DBAgent

Page 17: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

17

Processing Local Query Phase II

User Interfac

e

Object Management System (DBMS)

DBAgent

Export Dictionary

Local Dictionary

PeerDB node

1. User selects relevant relations semantically.2. Send selected relations to MA

3. MA Clones a data retrieval agent (DRA) for each selected relation4. DRA reformulates the query for a selected relation

4.1 DRA retrieves data from local DBMS if the selected relation is local

5. DMA is dispatched to relevant nodes, carries with (a) IP address of the query node

6. data returned

7. data returned to user

Cache Manager

Neighbor PeerDBnodes

DBAgent

4.2 data returned to Agent

4.3 formulated data returned to user

Page 18: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

18

Processing Remote Query Phase I: Relation Matching Agent

User Interfac

e

Object Management System (DBMS)

DBAgent

PeerDB NodeQuery

Export Dictionary

Local Dictionary

PeerDB node

1.Relatin matching agent (RMA) come from query nodeTTL - 1,first time visit

Cache Manager

3. Matched relations returned

DBAgent

2. RMA searches the export dictionary

4.Matched relations returned to query node

Neighbor PeerDBnodes

DBAgent

5. If TTL >0 RMA clones more RMAs and dispatches them to the current node’s neighborsOtherwise, RMA is dropped

Page 19: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

19

Processing Remote Query Phase II: Data Retrieval Agent

User Interfac

e

Object Management System (DBMS)

DBAgent

PeerDB NodeQuery

Export Dictionary

Local Dictionary

PeerDB node

1.Data Retrieval agent (DRA) comes from query node

Cache Manager

3. Answers are retrieved, processed

DBAgent

2. DRA formulates an SQL query, submits it to DBMS

4. Answers are returned to query node5. DRA is dropped

Page 20: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

20

Features of PeerDB: Monitoring Statistics for Reconfigration

Performed by Master agent on the query node, for reconfiguring the network. Monitors two types of statistics: ● Relation information (schemas, keywords) obtained from Relation Matching

Agent, for exchanging the key words of selected relations. ● The number of answer objects obtained from Data Retrieval Agent, for

determining which nodes are to be connected directly. Reconfiguration policy:

● Favorite nodes are that have most recently provide answers. ● Use the notion of stack distance to measure the temporal locality. ● The top K peers in the stack are retained as the K directly connected peers.

…K

P4P3P2

Pk……

Page 21: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

21

Features of PeerDB:Cache Management

Caching answers returned from remote nodes by Cache Manager component.

Reducing the response time for subsequent answers. However, caching raises complicated issues: ● Problem: The cached copy may be outdated. Solution: keeps the answers for a fixed period of time.

● Problem: Caching storage space is limited. Solution: Least Recently Used data is replaced when space runs out.

● Problem: PeerDB nodes may be caching the same data. Solution: All relations, except one, with the same keywords from the

same source node will be pruned away during phase I of query processing.

Page 22: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

22

A Performance Study

The experimental environment: ● 32 PCs with Intel Pentium 200MHz processor and 64M of RAM. ● all the PCs are running on WinNT4.0operating system. ● The physical network layout isshown in the Figure.

Page 23: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

23

Studies relation matching strategy. ● The lift time of Worker agent is 1. Looks at the performance of PeerDB. ● Effect of Storage Capacity on Caching ● PeerDB vs CS (Client and Server System) ● Benefits of Agent-based Querying

Remark: The extensive experimental studies show that PeerDB is a promising system for distributed processing.

A Performance Study (cont’d)

Page 24: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

24

Conclusion and future work A P2P-based distributed data sharing system called PeerDB. PeerDB has several nice features. ● Employs a data management system and shares data without shared

schema. ● Query processing is assisted by mobile agents. ● Reconfigures a node’s peers dynamically by itself. ● Cache management for efficiency. Experimental studies show that PeerDB is a promising system for

distributed processing. Extending in two directions in the future: ● Making a node more intelligent by adopting code-shipping or data

shipping technology. ● Looking for “similar” schema by integrating keyword-based search in

PeerDB.

Page 25: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

25

References

[1] W. Ng, B. Ooi, K. Tan and A. Zhou. PeerDB: A P2P-based System for Distributed Data Sharing. The 19th International Conference on Data Engineering 2003. (ICDE 2003).

[2] N. Karnik. Security in Mobile Agent System. http://www.cs.umn.edu/Ajanta/defense/

[3] C. Rijsbergen. Information Retrieval. London: Butterworths, 1979. http://www.dcs.gla.ac.uk/Keith/Preface.html

Page 26: PeerDB: A P2P-based System for Distributed Data Sharing Wee Siong NgBeng Chin OoiKian-lee TanAoying Zhou Course Number: CSI 5311 Course Name: Distributed

26

Thanks

Welcome Questions?