IDEAS 2011 I nternational D atabase E ngineering & A pplications S ymposium September 21-23, Lisbon – Portugal Aggregates and Priorities in P2P Data Management

IDEAS 2011

International Database Engineering & Applications

Symposium

September 21-23, Lisbon – Portugal

Aggregates and Priorities in P2P Data Management Systems

DEIS – University Of Calabria - Italy

Luciano Caroprese - Ester Zumpano

P2P Systems

Peer

D P4

P3

P2

P1

Query

P2P System

• Autonomous system

Import

Export

• Import/export data from/to other peers

IC

• Imported data should not ‘violate’ local integrity constraints

P1

FOL semantics:

q(X) r(X)

r(a)r(b)

q(a)q(b)

The whole system is inconsistent!

To “isolate” the inconsistency…

q(a)q(b)

q(X) r(X)

P2

X=Y Ü q(X), q(Y)

r(a)r(b)

q(a)q(b)

AN EXAMPLE

q(a)q(b)

X=YÜq(X), q(Y)

r(a)

r(b)

The P2P system is consistent after removing inconsistent P2

AN EXAMPLE

P1

q(X) r(X)

P2

r(a)

r(b)

Which are the ‘true’ atoms?

2 possible scenarios :

M1={ r(a), r(b),

q(a)q(b)

M2={ r(a), r(b),

The first step is…

…modeling mapping rules tocapture this semantics

q(a)}

q(b)}

X=Y Ü q(X), q(Y)

Our proposed semantics for mapping rules

P1

q(X) r(X)

P2

q(X) r(X)

FOL semantics:

q(X) r(X)

Satisfied if…

Val(q(X)) ≥Val(r(X))

New semantics:

Satisfied if…

Val(q(X)) ≤Val(r(X))

Possible scenarios:

r(a) r(b) q(b)

r(a) r(b) q(a)

r(a) r(b)

r(a)

r(b)q(a)q(b)

X=Y Ü q(X), q(Y)

Maximal weak model semantics

P1

q(X) r(X)

P2

Our system

q(X) r(X)

Its weak models…

M3={r(a), r(b), q(b) }

M2={r(a), r(b), q(a) }

M1={r(a), r(b)}

r(a)

r(b)

q(X), q(Y), X≠Y

r(a)r(b)

PS

Maximal models are those that contain maximal subset of imported atoms (M2, M3)

In each weak model welook for imported atoms.

MWM(PS)={M2,M3}

Its maximal weak models

X=Y Ü q(X), q(Y)

Modeling a P2P system with a disjunctive logic program with priorities

An equivalent characterization

The program:

a Å b c

c

A (positive) exclusive disjunctive Datalog rule is of the form:

A1 Å … Å Am B1, … ,BnM1 = { a, c }, M2 = { b, c }.

A priority rule is of the form: a ≥ b

The preference rule intuitively reads: a is preferable over

b. Thus M1 is the preferred minimal model.

Modeling a P2P system with a disjunctive logic program with priorities

An equivalent characterization

If r(X) is true in the source peer then it is possible either to import or not to import q(X) in the target peer.

q(X) r(X)

q(X) Å q’(X) r(X)Obviously, we prefer to import as much knolewdge as possible in each peer.Thus…

q(X) ≥ q’(X)

Preferred Minimal Model Semantics

P1

q(X) r(X)

P2

r(a)

r(b)

Our system becomes…

q(X) r(X) q(X), q(Y), X≠Y

r(a)r(b)

PS

q(X) ≥ q’(X)

q(X) Å q’(X) r(X)

Its minimal models…

M1={r(a), r(b), q’(a),q’(b)}

M2={r(a), r(b), q(a), q’(b)

M3={r(a), r(b), q’(a), q(b) }

Used to select preferred models…

Its preferred minimal models

PMM(PS)

Deleting primed atoms, we obtain…

= MWM(PS)

}

X=Y Ü q(X), q(Y)

The problem of this framework is that is does not allow to set preferences among Maximal Weak Models.

Example…

``in the case of conflicting information, it is preferable to import data from the neighbor peer that can provide the maximum number of tuples“

``in the case of conflicting information, it is preferable to import data from the neighbor peer such that the sum of the values of an attribute is minimum"

P1 cons(1,N,S)

emp(N,S)

P1

P3

emp(john,200),

P2

Pa=PbÜcons(Pa,Na,Sa), .

cons(Pb,Nb,Sb)

emp(mary,50),emp(tom,50)}

emp(dan,200),emp(lucy,50)}

cons(2,N,S) emp(N,S)

DB1={

DB2={

M1={cons(1,john,200),cons(1,mary,50),cons(1,tom,50)} U DB1 U DB2

M2={cons(2,dan,200),cons(2,lucy,50)} U DB1 U DB2

We introduce in our framework:1) aggregate functions2) priorities

New Framework…

cons(1,mary,50)cons(1,tom,50)

cons(2,dan,200)cons(2,lucy,50)

cons(1,john,200)

b(Source,<<Salary>>)cons(Source,Name,Salary)s(Source,<Salary>) cons(Source,Name,Salary)

b(1, {200,50,50})

s(1, {200,50})

Bag

Setb(2, {200,50})s(2, {200,50})

DB1 U DB2 DB1 U DB2

M1 M2

New Framework…

cons(1,mary,50)cons(1,tom,50)

cons(2,dan,200)cons(2,lucy,50)

cons(1,john,200)

s(1, 300) s(2, 250)

To bags and sets we can apply manyAggregate Functions:1) MIN / MAX2) AVG (Average)3) Count 4) SUM

S(Source,SUM<<Salary>>) cons(Source,Name,Salary)

DB1 U DB2 DB1 U DB2

M1 M2

Using aggregate functions we can derive aggregate data.Then we can apply priority rules.

Our goal…

P1 cons(1,N,S)

emp(N,S)

P1P3 emp(john,200)

P2

Pa=PbÜcons(Pa,Na,Sa), cons(Pb,Nb,Sb)

emp(mary,50)emp(tom,50)

emp(dan,200)emp(lucy,50)


S(Source,SUM<<Salary>>)cons(Source,Name,Salary)

<{S(P1,Sum1) ≥ S(P2,Sum2) | Sum1 < Sum2}>:

LP:

IC:

M1={cons(1,john,200),cons(1,mary,50),cons(1,tom,50),S(1,300)} U DB1 U DB2

M2={cons(2,dan,200),cons(2,lucy,50),S(2,250)} U DB1 U DB2

The complete framework allows to define many levels of preferences!

Levels of preferences…

P1 cons(1,N,S)

emp(N,S)

P1P3 emp(john,200)

P2

Pa=PbÜcons(Pa,Na,Sa), . cons(Pb,Nb,Sb)

emp(mary,50)




<{C(P1,Count1) ≥ C(P2,Count2) | Count1 > Count2},:

LP:

IC:

We prefer to import tuples from the peer that can provide the maximum number of tuples. In the case the peers provide the same number of tuples we prefer to import tuples from the peer that can provide tuples s.t. the total amount of the salary is minimum.

C(Source,COUNT<<Salary>>)cons(Source,Name,Salary)

{S(P1,Sum1) ≥ S(P2,Sum2) | Sum1 < Sum2}>

The complete framework allows to define many levels of preferences!

Levels of preferences…

P1 cons(1,N,S)

emp(N,S)

P1P3 emp(john,200)

P2

Pa=PbÜcons(Pa,Na,Sa), . cons(Pb,Nb,Sb)

emp(mary,50)




<{C(P1,Count1) ≥ C(P2,Count2) | Count1 > Count2},:

LP:

IC:

C(Source,COUNT<<Salary>>)cons(Source,Name,Salary)

{S(P1,Sum1) ≥ S(P2,Sum2) | Sum1 < Sum2}>

M1={cons(1,john,200),cons(1,mary,50), C(1,2),S(1,250)} U DB1 U DB1

M2={cons(2,dan,150),cons(2,lucy,50), C(2,2),S(2,200)} U DB1 U DB2

We allow many levels of priorities.

Extended Prioritized Logic Program

The program:

a Å b c

c

M1 = { a, c,d }, M2 = { b, c ,d}.M3 = { a, c ,e}, M4 = { b, c ,e}.

A preference rule is of the form: <{a ≥ b}, {d ≥ e}>

We first apply the first level containing a ≥ b and then the second level containing d ≥ e

d Å e c

We extend the previous rewriting allowing levels of priorities.

Let us suppose that our P2P system has just a mapping rule and the following levels of priorities.

Priorities:< 1,..., n>

q(a) r(a)

q(a) Å q’(a) r(a)

<{q(a) ≥ q’(a)}, 1, ..., n >

Computation

The levels of priorities will be used sequentially in order to select the preferred stable models of the logic program.The priorities derived from mapping rules are the most important (the first level)!

This work enhances a previous semantics for P2P systems, introducing aggregate functions and priorities in order to define preferences among maximal weak models.

Presents an alternative characterization of the proposed semantics rewriting the P2P system into an extended prioritized logic program.

Conclusions

The problem of deciding whether an atom is true in some preferred weak models is S - complete.

The problem of deciding whether an atom is true in all preferred weak models is P -complete.

Complexity Results

2

p

2

p

Documents

IDEAS 2011 I nternational D atabase E ngineering & A pplications S ymposium September 21-23, Lisbon – Portugal Aggregates and Priorities in P2P Data Management