36
Skyline Snippets Markus Endres and Werner Kießling

Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Skyline SnippetsMarkus Endres and Werner Kießling

Page 2: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Outline

2

1. Skyline and Preference Queries

2. Skyline Snippets 3. Performance Benchmarks

4. Summary and Outlook

Page 3: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

3

1. Skyline Queries

Page 4: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Cal

[mg]

Fat [mg]

0.5 1.0 1.5 2.0

0.0

5

10

15

20

Drink 5

Drink 4

Drink 3

Drink 3

Drink 1

Drink 6

Drink 7

Drink 8

Drink 9

Skyline QueriesSkyline Queries and Pareto Preferences

4

Beverages with lowest calories and lowest fat?

Literature: • On Finding the Maxima of a Set of Vectors (Kung et. al, 1975)• The Skyline Operator (Börzsönyi et. al, 2001)• Foundations of Preferences in Database Systems (Kießling, 2002)

Skyline / Preference SQL query

SELECT *FROM Beverage BPREFERRING B.cal LOWEST AND B.fat LOWEST

Page 5: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣ Skyline results become large for• high dimensionality (dimensions up to 10 are not uncommon)

• large database relations

‣Computing the full Skyline is time and memory consuming

‣ In many applications a fraction of the full Skyline is sufficient, e.g. Web-Services, Mobile Internet

‣ State of the Art: • Full Skyline: BNL, LESS, Hexagon / Lattice Skyline, ...

Algorithm with and without indexes.

• Progressive Skyline: BBS, Bitmap, PDS, ...Highly specialized indexes necessary.

Skyline QueriesMotivation

5

Page 6: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣ Skyline queries are a subset of Pareto preference queries

‣ Preference: strict partial order on dom(A) means: I like y more than x

‣ Preference selection of a preference P

σ[P ](R) := {t ∈ R | ¬∃t� ∈ R : t <P t�}

Skyline QueriesPreference Background (Kießling)

6

x <P y

Skyline / BMO-set / Winnow

<P

Page 7: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣ Weak Order Preference (WOP)Dominance test by a numerical utility functionwhich depends on the type of preference

‣ Base preference constructorsLOWEST, HIGHEST, POS, NEG, ...

The d-parameter allows the partitioning of the range of domain values

Skyline QueriesPreference Background (Kießling)

7

fP : dom(A) → R+0

x <P y ⇐⇒ fP (x) > fP (y)

P:=LOWESTd(A)

P:=HIGHESTd(A)fP (x) :=

�x−min

d

fP (x) :=

�max−x

d

Page 8: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣ Complex preference constructors, e.g. Pareto (Skyline)

For weak order preferences P1 = (A1, <P1), . . . , Pm = (Am, <Pm), a Paretopreference is defined as

P := ⊗(P1, . . . , Pm) = (A1 × · · ·×Am, <P )

(x1, . . . , xm) <P (y1, . . . , ym) ⇐⇒∃i ∈ {1, . . . ,m} : fPi(xi) > fPi(yi) ∧∀j ∈ {1, . . . ,m}, j �= i : fPj (xj) ≥ fPj (yj)

Skyline QueriesPreference Background (Kießling)

8

A tuple is said to dominate another tuple if it is better in at least one dimension and not worse in all other dimensions.

Page 9: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Taxonomy of Base Preference Constructors

‣Complex Preference Constructors• Equal importance: Pareto

• More important: Prioritization

• Weighted importance: Rank, ...

POS NEG LOWESTd HIGHESTd

EXPLICIT

POS/POS POS/NEG AROUNDd

LAYEREDm BETWEENd

SCOREd

CONTAINS GEO PREFERENCE

NEARBYd

WITHINd BUFFERd

ONROUTEd

Skyline QueriesPreference Constructor - An Overview

9

Page 10: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Skyline QueriesHigh Dimensional Preference Query

10www.trial.PreferenceSQL.comA Demo of Preference SQL is available at

A high dimensional preference query

SELECT r.id, r.name, FROM restaurant r, city_map c PREFERRING

c.location NEARBY <lat>, <lon>, 1000 ANDc.ascent LESS THAN 200, 20 ANDr.cuisine IN (`Italian`, `Mexican`) NOT IN (`German`) ANDr.priceCategory NOT IN (`Expensive`, `Luxury`) ANDr.rating BETWEEN `2star` AND `3star` ANDr.ambient IN (`pleasant`) ANDr.waitingTime LOWEST ANDr.customerFriendly HIGHEST

Page 11: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

11

2. Skyline Snippets

Page 12: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Skyline Snippets

are a general method to computea fraction of the full Skylinewithout any index structure

Skyline Snippets

12

Page 13: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

13

2.1 Pareto k-partition

Page 14: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Sub-preference: a lower-dimensional Pareto preference (similar to the concept of subspace Skylines)

‣ Example

Sample sub-preferences

Skyline QueriesSub-Preferences

P := ⊗(P1, P2, P3)

• P {P1,P2} := ⊗(P1, P2)

• P {P1,P3} := ⊗(P1, P3)

• P {P2,P3} := ⊗(P2, P3)

Page 15: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣ A k-partition of is a decomposition of P into k disjoint Pareto sub-preferences such that

‣ Example:A few partitions of are

P := ⊗(P1, . . . , Pm)

⊗(P1, . . . , Pm) = ⊗(P I1 , . . . , P Ik)

P = ⊗(P1, . . . , P4), k = 2

15

Skyline SnippetsPareto k-Partition

• P = ⊗(P {P1,P2}, P {P3,P4})

• P = ⊗(P {P1,P3}, P {P2,P4})

• P = ⊗(P {P1}, P {P2,P3,P4})

Page 16: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

16

2.2 The Skyline Snippets Algorithm

Page 17: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣The Skyline Snippets Theorem• Given a Pareto preference

and a k-partition

• Let be the Skyline on a relation R.

⊗(P I1 , . . . , P Ik)

S := σ[P ](R)

1. Let Sk =�k

i=1 σ[PIi ](R), then

• σ[P ](Sk) �= ∅• σ[P ](Sk) ⊆ S

σ[P ](Sk) is called a k-snippet of the skyline S.

2. Let Lk =�k

i=1 σ[PIi ](R). If Lk �= ∅, then Lk ⊆ S.

17

Skyline Snippets

P := ⊗(P1, . . . , Pm)

Page 18: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S18

Skyline SnippetsExample

Page 19: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S19

Skyline SnippetsExample

Page 20: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S20

Skyline SnippetsExample

Page 21: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S21

Skyline SnippetsExample

Page 22: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S22

Skyline SnippetsExample

Page 23: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

23

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

Page 24: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

24

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

Page 25: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

25

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

Page 26: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

26

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

Page 27: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

27

Skyline SnippetsThe Skyline Snippets Algorithm (SSA)

Note:Line 4 can be done in parallel in multi-core architectures.

Page 28: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

28

3. Performance Benchmarks

Page 29: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Performance Benchmarks

29

‣ SSA Algorithm vs. • Hexagon (Lattice Skyline) Preisinger, Kießling: The Hexagon Algorithm for Pareto Preference Queries (2007)

• Progressive Hexagon

‣ Implementation in Preference SQL• Java Framework for preference queries on conventional databases• Oracle 11g database

‣ Experiments • Synthetic data sets: ANTI, COR, IND (Data generator, Börzsönyi 2001)

• Vary data cardinality, number of distinct values, d-parameter

Page 30: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

0

10

20

30

40

50

60

70

80

90

2 3 4 5 6 7 8 9 10

Ru

ntim

e in

se

c

Dimension m

HexagonSSA

Performance Benchmarks

30

Benchmark 1: Computation time Hexagon vs. SSA

• Pareto preference, only LOWEST preferences (MIN)• Hexagon computes full Skyline, whereas SSA computes a few Skyline points• n = 500K tuples, domain size c = 100K, d_value d = 10K

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

2 3 4 5 6 7 8 9 10

Runtim

e in s

ec

Dimension m

HexagonSSA

ANTI COR

Page 31: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Benchmark 2: Progressive Hexagon vs. SSA

• Pareto preference: • Stop progressive Hexagon after it has computed as many Skyline points as SSA• k-partitions k = 2, 4, 8 to evaluate the influence of the partitions • n = 500K tuples, domain size c = 100K, d_value d = 10K• Full Skyline size: 5902

⊗(P1, . . . , P8)

Table 1: Hexagon (prog.) vs. SSA (ANTI).

#Skylines sec #Skylines sec #Skylines sec

Hexagonp 3801 6.22 1075 5.95 419 5.29

SSA 3801 3.81 1075 0.812 419 0.198

k = 2 k = 4 k = 8

Performance Benchmarks

Page 32: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Benchmark 3: Number of Skyline points computed by Hexagon and SSA

• Pareto preference: • m/2-partitions• Hexagon computes full Skyline• n = 500K tuples, domain size c = 100K, d_value d = 10K

Table 1: Skyline points computed by Hexagon and SSA (ANTI).

m # Skyline points σ[P ](Sk) P {P1,P2} P {P3,P4} P {P5,P6} P {P7,P8}

4 12312 1348 1211 1394 - -6 18771 2851 1378 1631 1299 -8 24432 5495 1812 1919 1058 1403

Table 2: Skyline points computed by Hexagon and SSA (COR).

m # Skyline points σ[P ](Sk) P {P1,P2} P {P3,P4} P {P5,P6} P {P7,P8}

4 3126 982 706 703 - -6 8931 117 516 621 581 -8 11026 1131 643 681 657 597

Performance Benchmarks

⊗(P1, . . . , Pm), m = 4, 6, 8

Page 33: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

33

4. Summary and Outlook

Page 34: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Summary and Outlook

34

Summary

‣ Too many Skyline points in high-dimensional space‣ Skyline evaluation on high-dimensional space is time and memory consuming‣ Some Snippets of the full Skyline often sufficient, e.g. Mobile Internet, Web Services‣ Skyline Snippets algorithm (SSA) without any specialized index structure‣ Very fast computation of some Skyline points

Page 35: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

Summary and Outlook

35

Outlook

‣ Extended performance benchmarks investigating• Influence of the different types of preference constructors• Performance impact of different k-partitions

‣ Development of heuristics for choosing k-partitions

Page 36: Skyline Snippets - informatik.uni-augsburg.de · • The Skyline Operator (Börzsönyi et. al, 2001) ... LOWEST, HIGHEST, POS, NEG, ... • • The d-parameter allows the partitioning

36

Thank you for your attention!

Questions ?

{endres,kiessling}@informatik.uni-augsburg.de