Upload
jeremy-francis
View
214
Download
1
Embed Size (px)
Citation preview
Reaching the Top-k of the Skyline: A Reaching the Top-k of the Skyline: A efficient Indexed Algorithm for Top-k efficient Indexed Algorithm for Top-k Skyline QueriesSkyline Queries
Marlene Goncalves and María-Esther VidalUniversidad Simón Bolívar, Caracas, Venezuela
{mgoncalves,mvidal}@usb.ve
Universidad Simón Universidad Simón BolívarBolívar
Page 2
Motivating Example
«There are two Open Faculty Positions»
«Candidates will be evaluated in terms of:
Degree, Publications, Experience»
«Criteria to select the best Candidates: higher academic degree,
maximum number of publications and maximum years of experience»
«Ties will be broken by using the GPA»
Solutions: Skyline and Top-k
Page 3
4 MsC 13 4 3.65 BEng 7 3 4.5
Id Degree Publications Experience GPA
Query: Candidates with the best academic degree, number of publications and experience
Answer: None of the candidates is better in all criteria simultaneous.
Motivation
1 Post Dr 9 2 3.752 Post Dr 10 1 43 PhD 12 2 3.75
6 BEng 6 2 3.5
7 BEng 5 1 4
Page 4
4
Skyline
Query: Select the candidates with better degree, number of publications and experience
4 MsC 13 4 3.65 BEng 7 3 4.5
Id Degree Publications Experience GPA
1 Post Dr 9 2 3.752 Post Dr 10 1 43 PhD 12 2 3.75
6 BEng 6 3 3.5
7 BEng 5 1 4
User Criteria (Equally Important!)
• Degree Maximum
• Publications Maximum Multicriteria Function
• Experience Maximum
Skyline selects candidates 1,2,3 and 4.
i.e., multi-criteria induce a partial order, and ties need to be broken
Page 5
Top-k
Top-k
Select two candidates with the best GPA
1 Post Dr 9 2 3.753 PhD 12 2 3.75
Id Degree Publications Experience GPA
5 BEng 7 3 4.52 Post Dr 10 1 47 BEng 5 1 4
4 MsC 13 4 3.6
6 BEng 6 3 3.5
Top-k identifies candidates 5 and 2, but these candidates have not the best academic merit necessarily
User Criteria (Score Function!)
• GPA Maximum
Page 6
Preference based Queries
Select two candidates with higher GPA between the candidates with better degree, number of publications and Experience.
– Cases:
• Skyline produces the candidates with better degree, number of publications and Experience
– Skyline may be very huge and a post-processing over the Skyline is required to select k.
• Top-k identifies the two candidates with better GPA
– False answers
– Loss of results
Top-k selects two candidates with good GPA
Skyline selects four candidates in equality of conditions
So…
A combined approach is required!!
Page 7
Answer: The twocandidates with the highest value in score function between the candidates preselected in terms of multicriteria function `
Top-k Skyline
Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience
4 MsC 13 4 3.65 BEng 7 3 4.5
Id Degree Publications Experience GPA
1 Post Dr 9 2 3.752 Post Dr 10 1 43 PhD 12 2 3.75
6 BEng 6 3 3.5
7 BEng 5 1 4
Top-k
Top-k
Skyline
Top-k Skyline
Top-k Skyline selects candidates 1 and 2 with the highest GPAs among the ones with similar academic records
Page 8
Outline
Related Work
Our Approach
Top-k Skyline Evaluation
Experimental Study
Conclusions and Future Work
Page 9
Poor Ranking Capabilities
Multi-criteria-basedapproaches
Score-based Approaches
SKYLINE
High Ranking capabilities
Combined Approaches
BNL, SFS, LESS Top-kTop-k Skyline
MPro, Upper, TA, FA, NRA.
BMORTKS, BDTKSMetrics:Skyline Frequency
Related Work
Answers can be huge!
Answers may be incomplete
Neither Skyline nor Top-k provides high expressivity and high ranking capabilities.
Existing Techniques of Top-k Skyline completely build the Skyline.
Techniques to efficiently evaluate ranking approaches are required.
Page 10
Our Challenge
• Efficient Implementation of Top−k Skyline operator: Build the Top-k Skyline set minimizing the non-necessary probes.
A probe p of functions m or f is necessary if and only if p is evaluated on an object o that belongs to the Top-k Skyline.
4 MsC 13 4 3.6
5 BEng 7 3 4.5
Id Degree Publications Experience GPA
1 Post Dr 9 2 3.752 Post Dr 10 1 4
3 PhD 12 2 3.75
6 BEng 6 3 3.5
7 BEng 5 1 4
Non-Necessary Probes
(Evaluations of multi-criteria or score function)!
Goal: Only identify the elements of the Skyline that belongs to the answer
Page 11 Pagina
Top-k Skyline Evaluation
Indexed Solutions
– BDTKS (Basic Distributed Top-k Skyline)
– BMORTKS (Basic Multi-Objective Retrieval for Top-k Skyline)
– TKSI (Top-K SkyIndex)
Page 12
BDTKS
Top-k Skyline Evaluation
Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience.
5 7
4 13
Id Publications
1 9
2 10
3 12
6 6
7 5
4 4
5 3
Id Experience
1 2
2 1
3 2
6 3
7 1
4 MsC
5 BEng
Id Degree
1 Post Dr2 Post Dr
3 PhD
6 BEng
7 BEng
Final Object!
Index 1 Index 2 Index 3
Page 13
2 Post Dr 10 1 4
BDTKS
Top-k Skyline Evaluation
Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and Experience
4 MsC 13 4 3.6
Id Degree Publications Experience GPA
1 Post Dr 9 2 3.753 PhD 12 2 3.75
Partial Scanning of database (the final object is found)But, BDTKS completely builds the Skyline.
Page 14
BMORTKS
Top-k Skyline Evaluation
Query: Select two candidates with higher GPA between the candidates that have better degree, number of publications and experience.
4 MsC
5 BEng
Id Degree
1 Post Dr2 Post Dr
3 PhD
6 BEng
7 BEng
5 7
4 13
Id Publications
1 9
2 10
3 12
6 6
7 5
4 4
5 3
Id Experience
1 2
2 1
3 2
6 3
7 1
PostDr,?,?PostDr,13,4PostDr,13,?PostDr,12,4PhD,12,3PostDr,12,3PostDr,13,4PhD,10,3PhD,10,3MsC,10,3MsC,9,3
Virtual (Last score seen):
Index 1 Index 2 Index 3
Page 15
2 Post Dr 10 1 4
BMORTKS
Top-k Skyline Evaluation
Query: Select the two candidates with higher GPA between the candidates that have better degree, number of publications and experience
4 MsC 13 4 3.6
Id Degree Publications Experience GPA
1 Post Dr 9 2 3.753 PhD 12 2 3.75Partial Scanning of database (until a seen object
dominates the final object)But, BMRTKS also completely builds the Skyline
Page 16
TKSI (Top-K SkyIndex)
Top-k Skyline Evaluation
1 3.75
3 3.75
Id GPA
5 4.5
2 4
7 4
4 3.6
6 3.5
4 MsC
5 BEng
Id Degree
1 Post Dr
2 Post Dr
3 PhD
6 BEng
7 BEng
5 7
4 13
Id Publications
1 9
2 10
3 12
6 6
7 5
4 4
5 3
Id Experience
1 2
2 1
3 2
6 3
7 1
Partial Scanning of database (until k incomparable objects are found)TKSI partially builds the Skyline, and minimizes the non-necessary probes
Index 1 Index 2 Index 3 Index 4
Page 17 Pagina
Dataset and Queries
– 100.000 Random data:
• Value Domain: Float between 0 and 1
• Data Distribution: Uniform, Gaussian and Mixed
– Sixty random queries. Multi-criteria dimensions range between 2-6.
Plataform
– SunFire V440, OS SunOS 5.10, two processors Sparcv9 of 1.281 MHZ, 16 GB of RAM and four disks Ultra320 SCSI of 73 GB.
– Java 1.5 and Oracle 9i.
Experimental Study
Page 18 Pagina
Average Skyline Size & Probes
Experimental Study
Data Distribution Average Skyline Size
(60 queries)
Uniform 2405
Gaussian 2477
Mixed 2539
Skyline size can be up to 2.6% of the input data!
Probes
BDTKS BMORTKS23,749,796 27,201,877
Probes on virtual object increase the number of probes of multi-criteria function!
Page 19 Pagina
BDTKS and TKSI
Experimental Study
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,0
BDTKS k=1 k=10 k=50 k=100 k=500 k=1000
Log(#
Pro
bes) .
BDTKS k=1000
0,0
1,0
2,0
3,0
4,0
5,0
6,0
BDTKS k=1 k=10 k=50 k=100 k=500 k=1000
Log(#
Acc
ess)
.
BDTKS k=1000
0,0
1,0
2,0
3,0
4,0
5,0
6,0
BDTKS k=1 k=10 k=50 k=100 k=500 k=1000
Log(#
See
n O
bje
cts)
.
BDTKS k=1000
0,0
0,5
1,0
1,5
2,0
2,5
3,0
3,5
BDTKS k=1 k=10 k=50 k=100 k=500 k=1000
Log(T
ime
(sec
))
.
BDTKS k=1000
BDTKS executes less probes and requires less evaluation time than BMORTKS.
For small k, TKSI outperforms BDTKS!
Page 20
TKSI builds the Skyline until it has calculated the k objects.
Our experimental results show that TKSI executed less probes and consumed less evaluation time.
In the Future, we plan to extend TKSI over Web data sources, and incorporate the TKSI into an existing DBMS.
Conclusions and Future Work