36
Author: Hua Lu, et al. Aalborg University, Denmark Reported by: Tzu-Li Tai National Cheng Kung University, Taiwan High Performance Parallel and Distributed Systems Lab Elsevier: Information Systems, Volume 38, 2013

Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

Embed Size (px)

DESCRIPTION

[Paper Study] Hua Lu, et al., Aalborg University, Denmark 2013 Elsevier Volume 38.

Citation preview

Page 1: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

Author: Hua Lu, et al.

Aalborg University, Denmark

Reported by: Tzu-Li Tai

National Cheng Kung University, Taiwan

High Performance Parallel and Distributed Systems Lab

Elsevier: Information Systems, Volume 38, 2013

Page 2: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

A. Background Knowledge

B. The Problem: Efficient Continuous Skyline Monitoring

C. The Approach: Two-Phase Monitoring

D. Personal Feedback

Page 3: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Before anything else……

What is a skyline?

Page 4: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Definition of “tuple A dominates tuple B”:

A is not worse than B for all attributes, and A is better than B for at least one

attribute

Notation:

𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵

𝑡𝑝𝐴 = 𝑝1, 𝑝2, … , 𝑝𝑛𝑡𝑝𝐵 = 𝑝1, 𝑝2, … , 𝑝𝑛

Page 5: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

𝑡𝑝 = 𝑝𝑟𝑖𝑐𝑒, 𝑟𝑎𝑡𝑖𝑛𝑔

Page 6: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

⇒ 𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵

𝑡𝑝𝐴 = 5, 4000

𝑡𝑝𝐵 = 2.5, 5000

A

B

Page 7: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

⇒ 𝑡𝑝𝐴 ≻ 𝑡𝑝𝐵

𝑡𝑝𝐴 = 4, 1500

𝑡𝑝𝐵 = 4, 4500A B

Page 8: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

⇒ 𝑡𝑝𝐴 ⊁ 𝑡𝑝𝐵⇒ 𝑡𝑝𝐵 ⊁ 𝑡𝑝𝐴

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

𝑡𝑝𝐴 = 2, 2000

𝑡𝑝𝐵 = 4, 4500

A

B

Page 9: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Definition of Skyline:

The subset of all tuples that are not

dominated by any other tuple.

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

Page 10: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

0

1

2

3

4

5

6

0 1000 2000 3000 4000 5000 6000 7000

Rating

Price

Price and Rating of Hotels

Definition of Skyline:

The subset of all tuples that are not

dominated by any other tuple.

Page 11: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Now that we know what a skyline is……

What is a two-tier streaming

setting for continuous skyline

monitoring?

Page 12: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Central Server

(Query Interface)

Data Sites

Page 13: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Background Knowledge

Page 14: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Problem: Efficient Continuous Skyline Monitoring

Page 15: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Problem: Efficient Continuous Skyline Monitoring

Problem Statement:

Concerning a geographically distributed

computing environment characterized by a

central server and multiple data sites, there is

a demand for a more efficient method for

continuous skyline monitoring.

Page 16: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Page 17: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Page 18: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Initialization phase

• Obtain initial query result by merging all local

skylines

• Categorize all tuples based on their membership in the

local skyline and global skyline

Maintenance phase

• Continuously monitor global skyline by referring to

formalized cases of possible skyline changes

Page 19: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Page 20: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Page 21: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Page 22: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

Site 2

Site 3

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3 }𝑆𝐾𝑙 = {𝑡𝑝1}

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2}

𝑆𝐾𝑓𝑝 = {𝑡𝑝1, 𝑡𝑝3}𝑆𝐾𝑙𝑔 = {𝑡𝑝1}

𝑆𝐾𝑓𝑝 = {∅}

Initialization

Phase

Page 23: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Page 24: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Maintenance

Phase

Site 1

Site 2

Site 3Site 3

𝒕𝒑

𝑡𝑝 𝑡 → 𝑡𝑝(𝑡′)

⟹ 𝒕𝒑 𝒕 ∈ {𝑵𝑺, 𝑭𝑺, 𝑮𝑺}

⟹ Dominance Relationship

between and𝒕𝒑(𝒕) 𝒕𝒑(𝒕′)

Page 25: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Question 1. Is 𝑡𝑝(𝑡′) not dominated by any global skyline point? If yes, 𝑡𝑝 𝑡′ is in the global skyline.

Question 2.Does 𝑡𝑝 𝑡′ dominate any global skyline point? If yes, the dominated skyline point will be

eliminated from the set of skyline points.

Question 3.𝑡𝑝(𝑡) was a global skyline point. If 𝑡𝑝(𝑡) solely dominates some non-skyline point, does

𝑡𝑝(𝑡′) stop dominating them? If yes, the previously non-skyline point will enter the set of

skyline points.

Question 4.Does 𝑡𝑝 𝑡′ stop being a false-positive global skyline points since it is now dominated by

some other point? If yes, remove 𝑡𝑝 from the false-positive set from the data site side.

Page 26: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Site 1

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

𝒕𝒑𝟐 is updated at 𝒕 = 𝒕′

𝑡𝑝2 ∈ ? 𝒕𝒑𝟐 ∈ 𝑵𝑺

Dominance? 𝒕𝒑𝟐 𝒕 ∽ 𝒕𝒑𝟐(𝒕′)

⇒ 𝑪𝒂𝒔𝒆 𝟏

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

The Approach: Two-Phase Monitoring

Page 27: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

⇒ 𝑪𝒂𝒔𝒆 𝟏

Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Page 28: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑙 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

𝑆𝐾𝑙𝑔 = {𝑡𝑝1, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

Page 29: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑔 = (1, 𝑡𝑝1 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

Page 30: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

𝑆𝐾𝑔 = { 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

Page 31: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

The Approach: Two-Phase Monitoring

Site 1

⇒ 𝑪𝒂𝒔𝒆 𝟏Consider Q1 and Q2

Q1: 𝑡𝑝2 ≻ 𝑡𝑝1 & 𝑡𝑝2 ~ 𝑡𝑝3YES!

Q2: 𝑡𝑝2 ≻ 𝑡𝑝1YES!

𝑆𝐾𝑙 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑙𝑔 = {𝑡𝑝2, 𝑡𝑝3}

𝑆𝐾𝑓𝑝 = {∅}

𝑆𝐾𝑔 = (1, 𝑡𝑝2 , 1, 𝑡𝑝3 , 2, 𝑡𝑝2 , (3, 𝑡𝑝3)}

Page 32: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

I/O rate is increased dramatically

The performance of the proposed approach still

remains arguable due to the massive increase of I/O rates

(as opposed to the traditional two-tier streaming setting).

Keeping all skyline datasets in main-memory

throughout the whole maintenance phase is a considerable

option, but this will bring up fault-tolerance issues.

Page 33: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

Critical Path

Page 34: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

Critical Path

Page 35: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU

Personal Feedback

Further enhancing real-time response

for two-tier streaming settings

Remote distributed shared memory datasets across

data sites (clouds)?

Is it possible?

Page 36: Efficient and Continuous Skyline Monitoring in Two Tier Streaming Settings

HPDS Lab, Institute of Computer and Communication Engineering, Electrical Engineering - NCKU