59
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1 , Ada Wai-Chee Fu 2 , Jian Pei 3 , Yip Sing Ho 2 , Tai Wong 2 and Yubao Liu 4 The Hong Kong University of Science and Technology 1 The Chinese University of Hong Kong 2 Simon Fraser University 3 Sun Yat-Sen University 4 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

Efficient Skyline Querying with Variable User Preferences on Nominal Attributes

Raymond Chi-Wing Wong1, Ada Wai-Chee Fu2, Jian Pei3,Yip Sing Ho2, Tai Wong2 and Yubao Liu4

The Hong Kong University of Science and Technology1

The Chinese University of Hong Kong2

Simon Fraser University3

Sun Yat-Sen University4

Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong

Page 2: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

Outline

1. Introductiona. Skylineb. Contributions

2. Problem Definition3. Adaptive SFS4. IPO-Tree5. Conclusion

Page 3: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

a 1600 4

b 2400 1

c 3000 5

3 packages

Suppose we want to look for a vacation package

Package a “dominates” package b

We want to have a cheaper package. We want to have a higher hotel-

class.

We know that 1. Package a has a cheaper price2. Package a has a higher hotel-class

We want to find a set of packages which are NOT dominatedby any other pacakges All of the “best”

possible choices.i.e., {a, c}

skyline

Page 4: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

6 packages

Suppose we want to look for a vacation package

We want to have a cheaper package. We want to have a higher hotel-

class.

How about this one?

Different customers may have different preferences on Hotel-group.

Suppose a customer has the

following preferences. H < T < MThe skyline points are packages a and c.

Suppose another customerhas the following

preferences. H < M < TThe skyline points are packages a, c and e.

In other words, differentpreferences give differentskyline points.

Page 5: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

6 packages

Suppose we want to look for a vacation package

Suppose a customer has the

following preferences. H < T < MThe skyline points are packages a and c.

Suppose another customerhas the following

preferences. H < M < TThe skyline points are packages a, c and e.

In other words, differentpreferences give differentskyline points.

Problem: Given a preference

on Hotel-group, we want tofind the skyline with

respect to this preference efficiently

Page 6: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a preference

on Hotel-group, we want tofind the skyline with

respect to this preference efficiently

1. Introduction

Page 7: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a preference

on Hotel-group, we want tofind the skyline with

respect to this preference efficiently

Page 8: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a preference

on Hotel-group, we want tofind the skyline with

respect to this preference efficiently

Straightforward solution:

Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query

It works. However, this solution is not scalable and the results cannot be returned efficiently.

Page 9: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a preference

on Hotel-group, we want tofind the skyline with

respect to this preference efficiently

Straightforward solution:

Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query

Full Materialization solution:

Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline query

It works when there are limited number of preferences.

However, this solution is not scalable when there are a lot of possible preferences.

e.g. three nominal attributes (like Hotel-Group)

each of which contains 40 possible values

there are 4.1 x 109 possible preferences (in our

problem setting).

Page 10: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a preference

on Hotel-group, we want tofind the skyline with

respect to this preference efficiently

Straightforward solution:

Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query

Full Materialization solution:

Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:

Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query

Good tradeoff between storage consumption and efficiency

Page 11: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Introduction

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given a preference

on Hotel-group, we want tofind the skyline with

respect to this preference efficiently

Straightforward solution:

Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query

Full Materialization solution:

Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:

Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query

Adaptive SFS

IPO-Tree (Implicit Preference Order Tree)

Questions: 1. What preferences should be stored?2. With these preferences, how can we perform a skyline query

efficiently?

Page 12: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

1. Contributions Most Existing Work

Assume that each attribute has a certain ordering (either totally ordered or partially ordered) on the attribute values

Our Work Different users can have different

preferences (i.e., the ordering on attribute values are different with different users)

Propose a semi-materialization method IPO-tree to answer the skyline query efficiently.

Page 13: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

2. Problem Definition Usually, a user should NOT specify an

ordering on all possible values on attribute Hotel-Group

Only list a few of the most favorite choicese.g. M < H < *Implicit preference

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Page 14: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

2. Problem Definition Usually, a user should NOT specify an

ordering on all possible values on attribute Hotel-Group

Only list a few of the most favorite choicese.g. M < H < *Implicit preference

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

A user prefers M to H.

Page 15: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

2. Problem Definition Usually, a user should NOT specify an

ordering on all possible values on attribute Hotel-Group

Only list a few of the most favorite choicese.g. M < H < *Implicit preference

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

A user prefers H to *.

All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)

This is the reason why we call an implicit preference.

Problem: Given an implicit preference

on Hotel-group, we want to find theskyline with respect to this

preferenceefficiently

Page 16: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

2. Problem Definition Usually, a user should NOT specify an

ordering on all possible values on attribute Hotel-Group

Only list a few of the most favorite choicese.g. M < H < *Implicit preference

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given an implicit preference

on Hotel-group, we want to find theskyline with respect to this

preferenceefficiently

Binary orders ={ }

All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)

M<H

Page 17: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

2. Problem Definition Usually, a user should NOT specify an

ordering on all possible values on attribute Hotel-Group

Only list a few of the most favorite choicese.g. M < H < *Implicit preference

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given an implicit preference

on Hotel-group, we want to find theskyline with respect to this

preferenceefficiently

Binary orders ={ }

All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)

M<H , M<T

Page 18: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

2. Problem Definition Usually, a user should NOT specify an

ordering on all possible values on attribute Hotel-Group

Only list a few of the most favorite choicese.g. M < H < *Implicit preference

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given an implicit preference

on Hotel-group, we want to find theskyline with respect to this

preferenceefficiently

Binary orders ={ }

All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)

M<H , M<T , H<T

Page 19: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

2. Problem Definition Usually, a user should NOT specify an

ordering on all possible values on attribute Hotel-Group

Only list a few of the most favorite choicese.g. M < H < *Implicit preference

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Problem: Given an implicit preference

on Hotel-group, we want to find theskyline with respect to this

preferenceefficiently

Since the user gives only TWO choices, we define the order ofhis preference to be TWO.

We also call this preferencethe second-order implicit preference.

All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)

Idea of our proposed semi-materialization IPO-tree

1. Store the skyline wrt the first-order implicit preference ONLY

2. Find the skyline wrt the implicit preference of any ordering from the skyline wrt the first-order implicit preference

Questions: 1. What preferences should be stored?2. With these preferences, how can we perform a skyline query

efficiently?

Page 20: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Straightforward solution:

Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query

Full Materialization solution:

Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:

Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query

Adaptive SFS

IPO-Tree (Implicit Preference Order Tree)

Page 21: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS Original SFS

Idea: Suppose we have a function f Each tuple is assigned with a score obtained by f Sort the tuples in ascending order of the scores Process the tuples with this ordering

Adaptive SFS Similar idea However, the original score function is based on

Numeric attributes NOT nominal attributes

What we change is the score function

Idea:1. Pre-Computation:

first pre-sort the tuples according to this new score function

2. Skyline Query:re-sort the tuples for a skyline query

Page 22: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Straightforward solution:

Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query

Full Materialization solution:

Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:

Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query

Adaptive SFS

IPO-Tree (Implicit Preference Order Tree)

Page 23: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Idea of our proposed semi-materialization IPO-tree

1. Store the skyline with respect to the first-order implicit preference ONLY

2. Find the skyline with respect the implicit preference of any ordering from the skyline with respect to the first-order implicit preference

Questions: 1. What preferences should be stored?2. With these preferences, how can we perform a skyline query

efficiently?

Page 24: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Binary Orders:{M < T, M <

H} Some values other than

“M” (i.e., “H” and “T”)

Page 25: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

Binary Orders:{M < T, M <

H} Some values other than

“H” (i.e., “T” and “M”)

Binary Orders:{H < T, H <

M}

f is NOT a skyline point. Why?

Page 26: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

Binary Orders:{H < T, H <

M}

f is NOT a skyline point. Why?

With the binary order H<M, c dominates f

We say that “H<M” disqualifiesf as a skyline point.

Page 27: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Binary Orders:{H < T, H <

M}

M<H

Page 28: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Some values other than

“M” and “H” (i.e., “T”)

Binary Orders:{H < T, H <

M}

M<H, M<T

Page 29: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Some values other than

“M” and “H” (i.e., “T”)

Binary Orders:{H < T, H <

M}

M<H, M<T , H<T

Page 30: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Binary Orders:{H < T, H <

M}

M<H, M<T , H<T

PSKY1 = a set of data points in SKY1 with value “M” = {e, f}

Page 31: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Binary Orders:{H < T, H <

M}

M<H, M<T , H<T

PSKY1

= {e, f}

Page 32: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Binary Orders:{H < T, H <

M}

M<H, M<T , H<T

PSKY1 = {e, f}

Page 33: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Binary Orders:{H < T, H <

M}

M<H, M<T , H<T

PSKY1 = {e, f}

SKY3={ }

SKY3 = (SKY1 SKY2) U PSKY1

U

= {a, c, e} U {e, f}= {a, c, e, f}

a, c, e, f

Additional binary order!

This binary order may

disqualify some datapoints in SKY3 like “f”

Observation: These points must be in

PSKY1

Page 34: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

H < *SKY2 = {a, c, e}

M < H < *

Binary Orders:{M < T, M <

H}

Binary Orders:{ }

Binary Orders:{H < T, H <

M}

M<H, M<T , H<T

PSKY1 = {e, f}

SKY3={ }

SKY3 = (SKY1 SKY2) U PSKY1

U

= {a, c, e} U {e, f}= {a, c, e, f}

a, c, e, f

Skyline wrt thefirst-order preference

Skyline wrt thesecond-order preference

Skyline wrt thefirst-order preference

Page 35: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

H < *SKY2 = {a, c, e}

M < H < * SKY3={ }a, c, e, f

Skyline wrt thefirst-order preference

Skyline wrt thesecond-order preference

Skyline wrt thefirst-order preference

v1 < v2 <*

v1 < *

v2 < *Merging Property

Page 36: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Second-order Preference

Skyline wrt thefirst-order preference

Skyline wrt thesecond-order preference

Skyline wrt thefirst-order preference

Third-order Preference

Skyline wrt thefirst-order preference

Skyline wrt thethird-order preference

Skyline wrt thesecond-order preference

Fourth-order Preference

Skyline wrt thefirst-order preference

Skyline wrt thefourth-order preference

Skyline wrt thethird-order preference

v1 < v2 <*

v1 < *

v2 < *

v1 < v2 < v3 < *

v1 < v2 < *

v3 < *

v1 < v2 < v3 < v4 < *

v1 < v2 < v3 < *

v4 < *

Page 37: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

5. Empirical Study Datasets

Synthetic Dataset Anti-correlated dataset

Real Dataset (from UCI) Nursery Dataset

Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of nominal dimensions = 2 No. of values in a nominal dimension = 20 Order of implicit preference = 3

Page 38: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

5. Empirical Study Variation

No. of data points No. of numeric dimensions No. of nominal dimensions Cardinality of nominal dimensions Order of implicit preference

Comparison SFS-D SFS-A IPO Tree IPO Tree-10

Original SFS

Adaptive SFS

IPO Tree which stores 10most frequent values for eachnominal attribute (for

comparison)

Page 39: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

5. Empirical StudySynthetic Data Set

Page 40: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

5. Empirical StudyReal Data Set

Page 41: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

6. Conclusion

Different customers have different preferences different skylines

Skyline Query on Nominal Attributes

Adaptive SFS algorithm IPO-Tree algorithm Experiments

Page 42: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

Q&A

Page 43: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Page 44: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Using some existing algorithms, we can first removesome data points which must not be in skylinewith respect to any implicit preference

Page 45: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Package ID

Score

a

c

e

f

Step 1 (Pre-computation): pre-sort the tuples according to the new score function

Each value in attribute Hotel-Group

is assigned with a SPECIAL value

This special value is set tothe total number of possible

valuesin Hotel-Group (i.e., 3)

Page 46: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Package ID

Score

a

c

e

f

Step 1 (Pre-computation): pre-sort the tuples according to the new score function

Score of point a is1600 + 1 + 3

Each value in attribute Hotel-Group

is assigned with a SPECIAL value

This special value is set tothe total number of possible

valuesin Hotel-Group (i.e., 3)

= 1604

1604

Page 47: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Package ID

Score

a

c

e

f

Step 1 (Pre-computation): pre-sort the tuples according to the new score function

Score of point c is3000 + 0 + 3

Each value in attribute Hotel-Group

is assigned with a SPECIAL value

This special value is set tothe total number of possible

valuesin Hotel-Group (i.e., 3)

= 3003

1604

3003

Page 48: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Package ID

Score

a

c

e

f

Step 1 (Pre-computation): pre-sort the tuples according to the new score function

Each value in attribute Hotel-Group

is assigned with a SPECIAL value

This special value is set tothe total number of possible

valuesin Hotel-Group (i.e., 3)

1604

3003

2406

3005

Package ID

Score

a 1604

e 2406

c 3003

f 3005

Page 49: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Package ID

Score

a 1604

e 2406

c 3003

f 3005

Page 50: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*)

Package ID

Score

a 1604

e 2406

c 3003

f 3005

Value “H” is assigned with value 1.

Value “T” is assigned with value 2.

All values other than “H” and “T”

(i.e.,“M”) are still equal to value 3.

Pre-computation:

Package ID

Score

a

e

c

f

Skyline Query:

Score of point a is1600 + 1 + 2 = 1603

1603

2406

3005

Page 51: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*)

Package ID

Score

a 1604

e 2406

c 3003

f 3005

Value “H” is assigned with value 1.

Value “T” is assigned with value 2.

All values other than “H” and “T”

(i.e.,“M”) are still equal to value 3.

Pre-computation:

Package ID

Score

a

e

c

f

Skyline Query:

Score of point c is3000 + 0 + 1 =3001

1603

3001

2406

3005

Since the score of a and c areupdated, we need to re-sorta and c.

Note that the ordering of all OTHER

points not containing “H” nor “T”

remains unchanged.

Page 52: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

3. Adaptive SFS

Package ID

Price ReverseHotel-class

Hotel-group

a 1600 1 T (Tulips)

b 2400 4 T (Tulips)

c 3000 0 H (Horizon)

d 3600 1 H (Horizon)

e 2400 3 M (Mozilla)

f 3000 2 M (Mozilla)

Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*)

Package ID

Score

a 1604

e 2406

c 3003

f 3005

Pre-computation:

Package ID

Score

a

e

c

f

Skyline Query:

1603

3001

2406

3005

We just use the original SFS.

With this sorted list, we find the skyline = {a, c}

Page 53: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Idea Pre-computation

Store the skyline wrt the first-order preference

Skyline Query Find the skyline wrt the preference of

any order according to the stored skylines wrt the first-order preference

e.g.1 Hotel-Group: M<* Airline : G<*e.g.2 Hotel-Group: M<* Airline : e.g.3 Hotel-Group: Airline : G<*

How can we do it efficiently?

We propose an indexing structure called IPO-tree

Page 54: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Package ID

Price ReverseHotel-class

Hotel-group

Airline

a 1600 1 T (Tulips) G (Gonna)

b 2400 4 T (Tulips) G (Gonna)

c 3000 0 H (Horizon) G (Gonna)

d 3600 1 H (Horizon) R (Redish)

e 2400 3 M (Mozilla) R (Redish)

f 3000 2 M (Mozilla) W (Wings)root

T<* H<* M<*

G<* R<* W<* G<* R<* W<* G<* R<* W<* G<* R<* W<*

Hotel-group: T<*Airline : G<*

Hotel-group: T<*Airline :

Hotel-group: Airline : G<*

Hotel-Group

Airline

e.g. three nominal attributes (like Hotel-Group) each of which contains 40 possible values

Full Materialization

there are 4.1 x 109 possible preferences (in our problem setting).

Semi-Materialization IPO-tree

there are 70,644 nodes (which is significantly smaller than4.1 x 109).

Page 55: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

One nominal attribute Merging Property

Multiple nominal attributes Consider ONE nominal attribute at a time

with Merging Property Fix the ordering of OTHER nominal

attributes Then, consider each of other nominal

attributes with Merging Property

Page 56: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Package ID

Price Hotel-class

Hotel-group

a 1600 4 T (Tulips)

b 2400 1 T (Tulips)

c 3000 5 H (Horizon)

d 3600 4 H (Horizon)

e 2400 2 M (Mozilla)

f 3000 3 M (Mozilla)

Page 57: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Package ID

Price Hotel-class

Hotel-group

Airline

a 1600 4 T (Tulips) G (Gonna)

b 2400 1 T (Tulips) G (Gonna)

c 3000 5 H (Horizon) G (Gonna)

d 3600 4 H (Horizon) R (Redish)

e 2400 2 M (Mozilla) R (Redish)

f 3000 3 M (Mozilla) W (Wings)

Hotel-Group: M<H<*Airline : G<R<*

Hotel-Group: M<*Airline : G<R<*

Hotel-Group: H<*Airline : G<R<*

Hotel-Group: M<*Airline : G<*

Hotel-Group: H<*Airline : R<*

Hotel-Group: H<*Airline : G<*

Hotel-Group: H<*Airline : R<*

Page 58: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

M < *SKY1 = {a, c, e, f}

H < *SKY2 = {a, c, e}

M < H < *

PSKY1 = {e, f}

SKY3={ }

SKY3 = (SKY1 SKY2) U PSKY1

U

= {a, c, e} U {e, f}= {a, c, e, f}

a, c, e, f

Page 59: Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

4. IPO-Tree

Theorem: Given a user query with x-th order implicit preference on m’’ nominal attributes, the number of set operations required for an x-th order implicit preference is O(xm’’).

m’’ = 2x = 2

No. of set operations = O(22)

Hotel-Group: M<H<*Airline : G<R<*

e.g.