61
Efficient Skyline Computation over Efficient Skyline Computation over Low Low - - Cardinality Domains Cardinality Domains Michael Michael Morse Morse 1 1 Jignesh Jignesh M. M. Patel Patel 2 2 H.V H.V . . Jagadish Jagadish 2 2 1 1 MITRE MITRE Corporation Corporation 2 2 University University of Michigan of Michigan

Efficient Skyline Computation over Low -Cardinality Domains · Efficient Skyline Computation over Low -Cardinality Domains ... –Celestial Sleep. Nap Motel ... Slumber Well Soporific

  • Upload
    trantu

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Efficient Skyline Computation over Efficient Skyline Computation over

LowLow--Cardinality DomainsCardinality Domains

Michael Michael MorseMorse11 JigneshJignesh M. M. PatelPatel22 H.VH.V. . JagadishJagadish22

11MITREMITRE CorporationCorporation22UniversityUniversity of Michiganof Michigan

OverviewOverview

�� Skyline Example and Definition.Skyline Example and Definition.

�� Discuss LowDiscuss Low--Cardinality Attributes.Cardinality Attributes.

�� Present the Lattice Skyline (LS) Algorithm.Present the Lattice Skyline (LS) Algorithm.

�� Discuss Experimental Results.Discuss Experimental Results.

�� Conclusions.Conclusions.

Traveling to Traveling to VLDBVLDB

Traveling to Traveling to VLDBVLDB

Number of Stars

Hotels in Vienna

Price

Euro

s

50

100

150

200

4 3 12

Traveling to Traveling to VLDBVLDB

Number of Stars

Hotels in Vienna

Price

Euro

s

50

100

150

200

4 3 12

Hotel Academie

Ibis Wein Schön.

Arcotel Boltzmann

Ibis Wein Mariahilf

Hotel Astoria

Austria Trend Ananas

Traveling to Traveling to VLDBVLDB

�� Hotels that are Hotels that are more expensive more expensive than others and no than others and no higher rated are higher rated are uninteresting.uninteresting.�� e.g. The H. Astoria e.g. The H. Astoria

is more expensive is more expensive than the than the Boltzmann,withBoltzmann,with the the same rating.same rating.

�� Such data points Such data points are said to be are said to be ‘‘dominated.dominated.’’ Number of Stars

Hotels in Vienna

Price

Euro

s

50

100

150

200

4 3 12

Hotel Academie

Ibis Wein Schön.

Arcotel Boltzmann

Ibis Wein Mariahilf

Hotel Astoria

Austria Trend Ananas

Traveling to Traveling to VLDBVLDB

�� Remove Dominated Remove Dominated

Hotels from Hotels from

consideration.consideration.

Number of Stars

Hotels in Vienna

Price

Euro

s

50

100

150

200

4 3 12

Hotel Academie

Ibis Wein Schön.

Arcotel Boltzmann

Traveling to Traveling to VLDBVLDB

�� Remove Dominated Remove Dominated

Hotels from Hotels from

consideration.consideration.

�� We Obtain the We Obtain the

Skyline for this Skyline for this

Dataset.Dataset.

Number of Stars

Hotels in Vienna

Price

Euro

s

50

100

150

200

4 3 12

Hotel Academie

Ibis Wein Schön.

Arcotel Boltzmann

Skyline DefinitionSkyline Definition

�� Skylines are an elegant summarization method Skylines are an elegant summarization method

for multidimensional datasets.for multidimensional datasets.

�� Def: The skyline is the set of all points Def: The skyline is the set of all points pp in a in a

dataset that are not dominated by some other dataset that are not dominated by some other

point in that dataset.point in that dataset.

�� Equivalent to the Pareto Set or Maximal Equivalent to the Pareto Set or Maximal

Vectors.Vectors.

OverviewOverview

�� Skyline Example and Definition.Skyline Example and Definition.

�� Discuss LowDiscuss Low--Cardinality Attributes.Cardinality Attributes.

�� Present the Lattice Skyline (LS) Algorithm.Present the Lattice Skyline (LS) Algorithm.

�� Discuss Experimental Results.Discuss Experimental Results.

�� Conclusions.Conclusions.

Attribute DomainsAttribute Domains�� LowLow--Cardinality Domain: the domain of possible Cardinality Domain: the domain of possible

values for attribute values for attribute aaii is a small number.is a small number.

Attribute DomainsAttribute Domains�� LowLow--Cardinality Domain: the domain of possible Cardinality Domain: the domain of possible

values for attribute values for attribute aaii is a small number.is a small number.

�� We will consider datasets with d lowWe will consider datasets with d low--cardinality cardinality domains and optionally 1 unrestricted domain.domains and optionally 1 unrestricted domain.

Attribute DomainsAttribute Domains�� LowLow--Cardinality Domain: the domain of possible Cardinality Domain: the domain of possible

values for attribute values for attribute aaii is a small number.is a small number.

�� We will consider datasets with d lowWe will consider datasets with d low--cardinality cardinality domains and optionally 1 unrestricted domain.domains and optionally 1 unrestricted domain.

�� Example: We are interested in finding a highly Example: We are interested in finding a highly rated hotel according to two different rating rated hotel according to two different rating measures that is inexpensive.measures that is inexpensive.

Attribute DomainsAttribute Domains�� LowLow--Cardinality Domain: the domain of possible Cardinality Domain: the domain of possible

values for attribute values for attribute aaii is a small number.is a small number.

�� We will consider datasets with d lowWe will consider datasets with d low--cardinality cardinality domains and optionally 1 unrestricted domain.domains and optionally 1 unrestricted domain.

�� Example: We are interested in finding a highly Example: We are interested in finding a highly rated hotel according to two different rating rated hotel according to two different rating measures that is inexpensive.measures that is inexpensive.

101101LowLow⋆⋆ ⋆⋆Nap MotelNap Motel

101101MediumMedium⋆⋆ ⋆⋆ ⋆⋆Celestial SleepCelestial Sleep

110110HighHigh⋆⋆ ⋆⋆Drowsy HotelDrowsy Hotel

6565LowLow⋆⋆ ⋆⋆Soporific InnSoporific Inn

120120MediumMedium⋆⋆Slumber WellSlumber Well

PricePriceSurveySurveyStarsStarsHotelHotel

Related AlgorithmsRelated Algorithms

�� Methods requiring indexing/preprocessing.Methods requiring indexing/preprocessing.�� Nearest Neighbor [Nearest Neighbor [KossmanKossman et al., et al., VLDBVLDB 2002].2002].

�� BBS [BBS [PapadiasPapadias et al., et al., SIGMODSIGMOD 2003].2003].

�� Bitmap, Index [Tan et al., Bitmap, Index [Tan et al., VLDBVLDB 2001].2001].

�� Methods that require no preprocessing.Methods that require no preprocessing.�� BNLBNL [[BorzsonyiBorzsonyi et al., et al., ICDEICDE 2001].2001].

�� SFSSFS [[ChomickiChomicki et al., et al., ICDEICDE 2003].2003].

�� LESS [Godfrey et al., LESS [Godfrey et al., VLDBVLDB 2005].2005].

�� Many other related problems cited in the paper.Many other related problems cited in the paper.�� Probabilistic Skylines [Pei et al., Probabilistic Skylines [Pei et al., VLDBVLDB 2007].2007].

�� ZBtreeZBtree [Lee et al., [Lee et al., VLDBVLDB 2007].2007].

�� Reverse Skylines [Reverse Skylines [DellisDellis et al., et al., VLDBVLDB 2007].2007].

Related AlgorithmsRelated Algorithms

�� Best Alternative: LESS Best Alternative: LESS [Godfrey et al. [Godfrey et al. ““Maximal Vector Computation in Large DatasetsMaximal Vector Computation in Large Datasets”” VLDB 05]VLDB 05]

1.1. Preprocessing.Preprocessing.

2.2. Sorts data.Sorts data.

3.3. PairwisePairwise comparison of remaining comparison of remaining tuplestuples..

�� Cost: between Cost: between O(nO(n) and O(n) and O(n22).).

�� One downside, can be sensitive to the dataset One downside, can be sensitive to the dataset

distribution and the distribution and the tupletuple ordering.ordering.

Our ContributionOur Contribution

�� We develop a new algorithm called the Lattice We develop a new algorithm called the Lattice

Skyline (LS) algorithm for skyline evaluation Skyline (LS) algorithm for skyline evaluation

for datasets with lowfor datasets with low--cardinality domains.cardinality domains.

�� What we show in the experiments is that while What we show in the experiments is that while

LESS is more general, it is less efficient than LESS is more general, it is less efficient than

LS.LS.

OverviewOverview

�� Skyline Example and Definition.Skyline Example and Definition.

�� Discuss LowDiscuss Low--Cardinality Attributes.Cardinality Attributes.

�� Present the Lattice Skyline (LS) Algorithm.Present the Lattice Skyline (LS) Algorithm.

�� Discuss Experimental Results.Discuss Experimental Results.

�� Conclusions.Conclusions.

Partial Order Imposed by SkylinePartial Order Imposed by Skyline

�� The skyline operator imposes a partial order on a The skyline operator imposes a partial order on a

dataset through the dataset through the ‘‘dominancedominance’’ relationship relationship ‘‘> > ’’..

Partial Order Imposed by SkylinePartial Order Imposed by Skyline

�� The skyline operator imposes a partial order on a The skyline operator imposes a partial order on a

dataset through the dataset through the ‘‘dominancedominance’’ relationship relationship ‘‘> > ’’..

Boltzmann > H. Astoria

Boltzmann > Mariahilf

Mariahilf > Boltzmann

Number of Stars

Hotels in Vienna

Price

Euro

s

50

100

150

200

4 3 12

Hotel Academie

Ibis Wein Schön.

Arcotel Boltzmann

Ibis Wein Mariahilf

Hotel Astoria

Austria Trend Ananas

Partial Order Imposed by SkylinePartial Order Imposed by Skyline

�� The skyline operator imposes a partial order on a The skyline operator imposes a partial order on a

dataset through the dataset through the ‘‘dominancedominance’’ relationship relationship ‘‘> > ’’..

�� This dataset and the skyline operator are not a lattice This dataset and the skyline operator are not a lattice

since there isnsince there isn’’t an upper or lower bound.t an upper or lower bound.

Partial Order Imposed by SkylinePartial Order Imposed by Skyline

�� The skyline operator imposes a partial order on a The skyline operator imposes a partial order on a

dataset through the dataset through the ‘‘dominancedominance’’ relationship relationship ‘‘> > ’’..

�� This dataset and the skyline operator are not a lattice This dataset and the skyline operator are not a lattice

since there isnsince there isn’’t an upper or lower bound.t an upper or lower bound.

�� DataspacesDataspaces with attributes drawn from lowwith attributes drawn from low--

cardinality domains and the skyline operator are a cardinality domains and the skyline operator are a

lattice.lattice.

Lattice StructureLattice Structure

�� If we consider the lowIf we consider the low--cardinality attribute space cardinality attribute space

(Stars, Survey), we obtain a lattice:(Stars, Survey), we obtain a lattice:

(⋆⋆⋆⋆⋆⋆,High)

(⋆⋆⋆⋆⋆⋆,Med)(⋆⋆⋆⋆,High)

(⋆⋆⋆⋆ ,Med)

(⋆⋆⋆⋆,Low)

(⋆⋆⋆⋆⋆⋆,Low)(⋆⋆,High)

(⋆⋆,Med)

(⋆⋆,Low)

((⋆⋆⋆⋆,Low),Low)Nap MotelNap Motel

((⋆⋆⋆⋆⋆⋆,Med),Med)Celestial SleepCelestial Sleep

((⋆⋆⋆⋆,High),High)Drowsy HotelDrowsy Hotel

((⋆⋆⋆⋆,Low),Low)Soporific InnSoporific Inn

((⋆⋆,Med),Med)Slumber WellSlumber Well

PositionPositionHotelHotel

Determining DominanceDetermining Dominance

�� Elements that are reachable from others in the Elements that are reachable from others in the

latticelattice--graph structure are dominated.graph structure are dominated.

((⋆⋆⋆⋆,Low),Low)Nap MotelNap Motel

((⋆⋆⋆⋆⋆⋆,Med),Med)Celestial SleepCelestial Sleep

((⋆⋆⋆⋆,High),High)Drowsy HotelDrowsy Hotel

((⋆⋆⋆⋆,Low),Low)Soporific InnSoporific Inn

((⋆⋆,Med),Med)Slumber WellSlumber Well

PositionPositionHotelHotel

(⋆⋆⋆⋆⋆⋆, High)

(⋆⋆⋆⋆⋆⋆, Med)(⋆⋆⋆⋆, High)

(⋆⋆⋆⋆, Med)

(⋆⋆⋆⋆, Low)

(⋆⋆⋆⋆⋆⋆, Low)(⋆⋆, High)

(⋆⋆, Med)

(⋆⋆, Low)

Determining DominanceDetermining Dominance

�� Elements that are reachable from others in the Elements that are reachable from others in the

latticelattice--graph structure are dominated.graph structure are dominated.

�� Ex: (Ex: (⋆⋆⋆⋆,Low) ,Low) ––Soporific InnSoporific Inn-- is reachable from is reachable from

((⋆⋆⋆⋆⋆⋆,Med) ,Med) ––Celestial Sleep. Celestial Sleep.

((⋆⋆⋆⋆,Low),Low)Nap MotelNap Motel

((⋆⋆⋆⋆⋆⋆,Med),Med)Celestial SleepCelestial Sleep

((⋆⋆⋆⋆,High),High)Drowsy HotelDrowsy Hotel

((⋆⋆⋆⋆,Low),Low)Soporific InnSoporific Inn

((⋆⋆,Med),Med)Slumber WellSlumber Well

PositionPositionHotelHotel

(⋆⋆⋆⋆⋆⋆, High)

(⋆⋆⋆⋆⋆⋆, Med)(⋆⋆⋆⋆, High)

(⋆⋆⋆⋆, Med)

(⋆⋆⋆⋆, Low)

(⋆⋆⋆⋆⋆⋆, Low)(⋆⋆, High)

(⋆⋆, Med)

(⋆⋆, Low)

Celestial Sleep

Soporific Inn,Slumber Well

Drowsy Hotel

Nap Motel

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� For each lattice entry, maintain 2 pieces of For each lattice entry, maintain 2 pieces of

information:information:1.1. Whether an element is present or not present in the data.Whether an element is present or not present in the data.

2.2. The best value of theThe best value of the unrestricted attribute.unrestricted attribute.

((⋆⋆⋆⋆,Low),Low)Nap MotelNap Motel

((⋆⋆⋆⋆⋆⋆,Med),Med)Celestial SleepCelestial Sleep

((⋆⋆⋆⋆,High),High)Drowsy HotelDrowsy Hotel

((⋆⋆⋆⋆,Low),Low)Soporific InnSoporific Inn

((⋆⋆,Med),Med)Slumber WellSlumber Well

PositionPositionHotelHotel

(⋆⋆⋆⋆⋆⋆, High)

(⋆⋆⋆⋆⋆⋆, Med)(⋆⋆⋆⋆, High)

(⋆⋆⋆⋆, Med)

(⋆⋆⋆⋆, Low)

(⋆⋆⋆⋆⋆⋆, Low)(⋆⋆, High)

(⋆⋆, Med)

(⋆⋆, Low)

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� For each lattice entry, maintain 2 pieces of For each lattice entry, maintain 2 pieces of

information:information:1.1. Whether an element is present or not present in the data.Whether an element is present or not present in the data.

2.2. The best value of theThe best value of the unrestricted attribute.unrestricted attribute.[np,-]

[np,-][np,-]

[np,-]

[np,-]

[np,-][np,-]

[np,-]

[np,-]

((⋆⋆⋆⋆,Low),Low)Nap MotelNap Motel

((⋆⋆⋆⋆⋆⋆,Med),Med)Celestial SleepCelestial Sleep

((⋆⋆⋆⋆,High),High)Drowsy HotelDrowsy Hotel

((⋆⋆⋆⋆,Low),Low)Soporific InnSoporific Inn

((⋆⋆,Med),Med)Slumber WellSlumber Well

PositionPositionHotelHotel

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the dataset.Iterate through the dataset.

[np,-]

[np,-] [np,-]

[np,-]

[np,-]

[np,-][np,-]

[np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the dataset.Iterate through the dataset.

[np,-]

[np,-] [np,-]

[np,-]

[np,-]

[np,-][np,-]

[np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the dataset.Iterate through the dataset.

�� Modify the lattice position corresponding to the data Modify the lattice position corresponding to the data

point.point.

[np,-]

[np,-] [np,-]

[np,-]

[np,-]

[np,-][np,-]

[np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the dataset.Iterate through the dataset.

�� Modify the lattice position corresponding to the data Modify the lattice position corresponding to the data

point.point.

[np,-]

[np,-] [np,-]

[np,-]

[np,-]

[np,-][np,-]

[p,120]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the dataset.Iterate through the dataset.

�� Modify the lattice position corresponding to the data Modify the lattice position corresponding to the data

point.point.

[np,-]

[np,-] [np,-]

[np,-]

[np,-]

[np,-][np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the dataset.Iterate through the dataset.

�� Modify the lattice position corresponding to the data Modify the lattice position corresponding to the data

point.point.

[np,-]

[np,-] [np,-]

[np,-]

[p,65]

[np,-][np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the dataset.Iterate through the dataset.

�� Modify the lattice position corresponding to the data Modify the lattice position corresponding to the data

point.point.

[np,-]

[np,-] [np,-][np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120] [p,65]

[p,110] [p,101]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

[np,-]

[np,-] [np,-][np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120] [p,65]

[p,110] [p,101]

[d,65]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

[np,-]

[np,-] [np,-][np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120] [p,65]

[p,110] [p,101]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

[np,-]

[np,-] [np,-][np,-]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120] [p,65]

[p,110] [p,101]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

[np,-]

[np,-] [np,-][d,110]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120] [p,65]

[p,110] [p,101]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

[np,-]

[np,-] [np,-][d,110]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120] [p,65]

[p,110] [p,101]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

[np,-]

[d,101] [np,-][d,110]

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,120] [p,65]

[p,110] [p,101]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

[np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[p,65]

[p,110] [p,101]

[d,101] [d,101][d,110]

[d,65]

[d,101]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Compare each Lattice Element with Immediate Compare each Lattice Element with Immediate

Dominators in previous level.Dominators in previous level.

�� At this point, we know the skyline values present in At this point, we know the skyline values present in

the dataset.the dataset. [np,-]

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[d,101] [p,65]

[p,110] [p,101]

[d,101] [d,101][d,110]

[d,65]

Lattice Skyline (LS) AlgorithmLattice Skyline (LS) Algorithm

�� Iterate through the data.Iterate through the data.

�� Output hotels matching the skyline values.Output hotels matching the skyline values.

Nap MotelNap Motel

Celestial SleepCelestial Sleep

Drowsy HotelDrowsy Hotel

Soporific InnSoporific Inn

Slumber WellSlumber Well

HotelHotel

((⋆⋆⋆⋆,Low),Low)

((⋆⋆⋆⋆⋆⋆,Med),Med)

((⋆⋆⋆⋆,High),High)

((⋆⋆⋆⋆,Low),Low)

((⋆⋆,Med),Med)

PositionPosition

101101

101101

110110

6565

120120

PricePrice

[(⋆⋆⋆⋆,Low),65]

[(⋆⋆⋆⋆,High),110] [(⋆⋆⋆⋆⋆⋆,Med),101]

Cost AnalysisCost Analysis

�� LS has 2 stages:LS has 2 stages:

Complexity AnalysisComplexity Analysis

�� LS has 2 stages:LS has 2 stages:

�� Iterating through the data and marking elements of Iterating through the data and marking elements of

the lattice [the lattice [O(dnO(dn) cost].) cost].

��d is the number of low cardinality dimensionsd is the number of low cardinality dimensions

��n is the number of n is the number of tuplestuples..

Complexity AnalysisComplexity Analysis

�� LS has 2 stages:LS has 2 stages:

�� Iterating through the data and marking elements of Iterating through the data and marking elements of the lattice [the lattice [O(dnO(dn) cost].) cost].

��d is the number of low cardinality dimensionsd is the number of low cardinality dimensions

��n is the number of n is the number of tuplestuples..

�� Finding skyline values in the lattice by examining the Finding skyline values in the lattice by examining the immediate dominators of each lattice position immediate dominators of each lattice position [[O(dVO(dV) cost].) cost].

��V is the domain cardinality product.V is the domain cardinality product.

�� This produces This produces O(dn+dVO(dn+dV) complexity.) complexity.

Additional advantagesAdditional advantages

�� The operation of LS does not vary with the The operation of LS does not vary with the

input.input.

1.1. Data ordering.Data ordering.

2.2. Data distribution.Data distribution.

�� Additional advantage: Estimating running time is Additional advantage: Estimating running time is

easy for an optimizer.easy for an optimizer.

OverviewOverview

�� Skyline Example and Definition.Skyline Example and Definition.

�� Discuss LowDiscuss Low--Cardinality Attributes.Cardinality Attributes.

�� Present the Lattice Skyline (LS) Algorithm.Present the Lattice Skyline (LS) Algorithm.

�� Discuss Experimental Results.Discuss Experimental Results.

�� Conclusions.Conclusions.

ExperimentsExperiments

�� We tested LS against the best alternative technique We tested LS against the best alternative technique

LESSLESS11..

�� We implemented LS and LESS with a 4KB page size We implemented LS and LESS with a 4KB page size

and 500 buffer pool pages.and 500 buffer pool pages.

�� 1.7 GHz Intel Xeon processor running Linux.1.7 GHz Intel Xeon processor running Linux.

�� Each Each tupletuple is a constant 100 bytes (includes some is a constant 100 bytes (includes some

padding which models selection attributes such as a text padding which models selection attributes such as a text

attribute).attribute).

�� We have run experiments on both synthetic and real We have run experiments on both synthetic and real

datasets. Several of these results I will highlight here.datasets. Several of these results I will highlight here.

11[Godfrey[Godfrey et al. et al. ““Maximal Vector Computation in Large DatasetsMaximal Vector Computation in Large Datasets”” VLDBVLDB 05]05]

Synthetic DatasetsSynthetic Datasets�� Three synthetic datasets are commonly used in the Three synthetic datasets are commonly used in the

evaluation of skyline techniques:evaluation of skyline techniques:�� CorrelatedCorrelated

�� IndependentIndependent

�� AntiAnti--correlatedcorrelated

�� The antiThe anti--correlated dataset usually requires the most correlated dataset usually requires the most processing of the three.processing of the three.

�� We vary the We vary the 1.1. number of data number of data tuplestuples..

2.2. Number of dimensions.Number of dimensions.

3.3. Size of the lowSize of the low--cardinality domains.cardinality domains.

CorrelatedCorrelated IndependentIndependent AntiAnti--CorrelatedCorrelated

Real DatasetReal Dataset

�� ZillowZillow Housing Dataset: Housing Dataset: zillow.comzillow.com lists lists

information about real estate.information about real estate.

�� We obtained a regional dataset with more than We obtained a regional dataset with more than

160K entries with the below attributes.160K entries with the below attributes.

�� Low cardinality attributes include # of bedrooms, Low cardinality attributes include # of bedrooms,

bathrooms, floors, and total rooms, and the garage bathrooms, floors, and total rooms, and the garage

capacity, with the estimated price as the capacity, with the estimated price as the

unrestricted attribute.unrestricted attribute.

Results: Varying Domain Card.Results: Varying Domain Card.

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

1.3

6 8 10

Domain Cardinality

Execu

tio

n T

ime (

secs)

LESS

LS

Correlated

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

6 8 10

Domain Cardinality

Execu

tio

n T

ime (

secs)

LESS

LS

Independent

0

5

10

15

20

25

6 8 10

Domain Cardinality

Execu

tio

n T

ime (

secs)

LESS

LS

Anti-CorrelatedNote: Graph Scales are not uniform.

Results: Varying DimensionalityResults: Varying Dimensionality

0

0.5

1

1.5

2

2.5

5 6 7

Dimensionality

Execu

tio

n T

ime (

secs)

LESS

LS

Correlated

0

1

2

3

4

5

6

7

8

9

10

5 6 7

Dimensionality

Execu

tio

n T

ime (

secs)

LESS

LS

Independent

0

5

10

15

20

25

30

35

5 6 7

Dimensionality

Execu

tio

n T

ime (

secs)

LESS

LS

Anti-CorrelatedNote: Graph Scales are not uniform.

Results: Varying Results: Varying TupleTuple Card.Card.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

250 500 750

Tuple Cardinality (K)

Execu

tio

n T

ime (

secs)

LESS

LS

Correlated

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

250 500 750

Tuple Cardinality (K)

Execu

tio

n T

ime (

secs)

LESS

LS

Independent

0

5

10

15

20

25

250 500 750

Tuple Cardinality (K)

Execu

tio

n T

ime (

secs)

LESS

LS

Anti-CorrelatedNote: Graph Scales are not uniform.

Real DatasetReal Dataset

�� ZillowZillow Housing Dataset: Housing Dataset: zillow.comzillow.com lists lists

information about real estate.information about real estate.

�� We obtained a regional dataset with more than We obtained a regional dataset with more than

160K entries with the below attributes.160K entries with the below attributes.

�� Low cardinality attributes include # of bedrooms, Low cardinality attributes include # of bedrooms,

bathrooms, floors, and total rooms, and the garage bathrooms, floors, and total rooms, and the garage

capacity, with the estimated price as the capacity, with the estimated price as the

unrestricted attribute.unrestricted attribute.

�� Each Each tupletuple is a constant 100 bytes (usually includes is a constant 100 bytes (usually includes

some padding which models selection attributes some padding which models selection attributes

such as a text attribute).such as a text attribute).

Results for Results for ZillowZillow Housing DatasetHousing Dataset

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5 6 7# dimensions

Execu

tio

n T

ime (

secs)

LESS

LS

Zillow Dataset

OverviewOverview

�� Skyline Example and Definition.Skyline Example and Definition.

�� Discuss LowDiscuss Low--Cardinality Attributes.Cardinality Attributes.

�� Present the Lattice Skyline (LS) Algorithm.Present the Lattice Skyline (LS) Algorithm.

�� Discuss Experimental Results.Discuss Experimental Results.

�� Conclusions.Conclusions.

ConclusionsConclusions

�� We have proposed the Lattice Skyline Algorithm We have proposed the Lattice Skyline Algorithm for skyline evaluation in the presence of datasets for skyline evaluation in the presence of datasets with lowwith low--cardinality attribute domains.cardinality attribute domains.

�� The performance of the algorithm has been The performance of the algorithm has been shown to be independent of dataset distribution shown to be independent of dataset distribution and and tupletuple ordering, both highly desirable ordering, both highly desirable properties for skyline evaluation.properties for skyline evaluation.

�� LS was shown to perform better than its nearest LS was shown to perform better than its nearest competitor, the LESS algorithm, in a number of competitor, the LESS algorithm, in a number of synthetic and real dataset experiments.synthetic and real dataset experiments.

Thank You!Thank You!

Questions?Questions?

Back Up SlidesBack Up Slides

Real DatasetReal Dataset

�� ZillowZillow Housing Dataset: Housing Dataset: zillow.comzillow.com lists lists

information about real estate.information about real estate.

�� We obtained a regional dataset with more than We obtained a regional dataset with more than

160K entries with the below attributes.160K entries with the below attributes.

1010IntegerInteger# of Rooms# of Rooms

22Yes or NoYes or NoColonial ArchColonial Arch

Nearly 80K valuesNearly 80K valuesDollar ValueDollar ValueEstimated PriceEstimated Price

22Yes or NoYes or NoAsphalt RoofAsphalt Roof

77Integer no. of carsInteger no. of carsGarage CapacityGarage Capacity

33IntegerInteger# of Floors# of Floors

44½½ IncrementsIncrements# of Bathrooms# of Bathrooms

77IntegerInteger# of Bedrooms# of Bedrooms

Domain CardinalityDomain CardinalityValuesValuesDescriptionDescription