Upload
augustus-gaines
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Information Technology
Selecting Representative Objects Considering Coverage and Diversity
Shenlu Wang1, Muhammad Aamir Cheema2, Ying Zhang3, Xuemin Lin1
1 The University of New South Wales, Australia2 Monash University, Australia3 The University of Technology, Australia
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse Top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Influence Set
In a data set consisting of facilities and users, a facility influences a user if considers as one of its most “important” facilities
A set of users influenced by is called influence set of
Influence
Influence Set
U1
U2f2
f1
Influence Set of Coles
Faculty of Information Technology
Influence Set
A facility f is important for u if it is one of the top-k facilities for a user u considering her preferences, e.g., Distance Rating Price
Important facility?
Who are my potential customers ?
Faculty of Information Technology
Influence Set
Important to identify potential users/customers Used in various applications such as marketing, cluster and
outlier analysis, and decision support systems
Significance
Reverse Nearest Neighbors Reverse Top- Reverse Skyline
Types
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Reverse k Nearest Neighbors (RkNN)
• Definition of importance– A facility f is important to a user if f is
one of its k closest facilities
• Reverse k Nearest Neighbors– Find every user u for which the query
facility q is important, i.e., q is one of its k-closest facilities.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u3}
K=1
u2
f1
f2
u1
u3
Faculty of Information Technology
RkNN Algorithms
Pruning Verification
Half-space
Region-based
TPL (VLDB 2004),FINCH (VLDB 2008),InfZone (ICDE 2011)
Six-regions (SIGMOD 2000)
SLICE (ICDE 2014)
Six-regions (Stanoi et al., SIGMOD 2000)
TPL (Tao et al., VLDB 2004) FINCH (Wu et al., VLDB 2008) Boost (Emrich et al., SIGMOD
2010) InfZone (Cheema et al.,
ICDE2011)SLICE (Yang et al., ICDE 2014)
Faculty of Information Technology
• Regions-based Pruning:
-Six-regions
[Stanoi et al., SIGMOD 2000]
1. Divide the whole space centred at the query q into six equal regions
2. Find the k-th nearest neighbor in each Partition.
3. The k-th nearest facility of q in each region defines the area that can be pruned
k=2
The user points that cannot be pruned should be verified by range query
ba
c
d
q
u1
u2
RkNN Algorithms
Faculty of Information Technology
• Half-space Pruning: the space that is contained by k half- spaces can be pruned
-TPL [Tao et al., VLDB 2004]1. Find the nearest facility f in the
unpruned area.
2. Draw a bisector between q and f, prune by using the half-space
3. Go to step 1 unless all facilities in the unpruned area have been accessed
k=2
ba
c
d
q
RkNN Algorithms
u
Checking which k-half spaces prune a point/node is expensive TPL ++ [Yang et al., PVLDB 2015]
Faculty of Information Technology
• FINCH [Wu et al., VLDB 2008]– Approximate the unpruned
area by a convex polygon
k=2
ba
c
d
q
RkNN Algorithms
Faculty of Information Technology
• InfZone [Cheema et al., ICDE 2011]
1. The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning.
2. A user u is a RkNN of q if and only if u lies inside the influence zone
3. No verification phase.
k=2
ba
c
d
q
RkNN Algorithms
Faculty of Information Technology
• SLICE [Yang et al., ICDE 2014]
1. Divide the whole space centred at the query q into t equal regions
2. Draw arcs for each facility
3. k-th arc in each partition defines the pruning region
Pruning requires checking only one distance
RkNN Algorithms
q
f1
f2
k=2
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Influence Set based on Reverse Top-k
• Definition of importance– Each user u has a preference function– A facility f is important to a user u if f is
one of the top-k facilities for u• Reverse Top-k Query (RTk)
– Find every user u for which the query facility q is one of her top-k facilities.
Influence set of f1 is {u2}
Influence set of f2 is {u1,u3}
K=1
u2
f1
f2
u1
u3
Price=1
Price=22
3
0.9*price + 0.1*distance
0.5*price + 0.5*distance
1*distance
Faculty of Information Technology
Existing work on Reverse Top-k
Vlachou et al., “Reverse top-k queries”, ICDE 2010 Chester et al., “Indexing reverse top-k queries in two dimensions,” DASFAA
2013 Cheema et al., “A Unified Framework for Efficiently Processing Ranking
Related Queries”, EDBT 2014 Vlachou et al., “Branch-and-bound algorithm for reverse top-k queries”,
SIGMOD 2013 Ge et al., “Efficient all top-k computation: A unified solution for all top-k, reverse
top-k and top-m influential queries”, TKDE 2013. Vlachou et al., “Monitoring reverse top-k queries over mobile devices”, MobiDE
2011 Yu et al., “Processing a large number of continuous preference top-k queries”,
SIGMOD 2012
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Influence Set based on Reverse Skyline • Dominance
A facility x dominates another facility y w.r.t. a user u, if for every attribute, u prefers x over y
• Definition of importance A facility f is important to a user u if f is not
dominated by any other facility• Reverse Skyline
Find every user u for which the query facility q is not dominated by any other facility.
Influence set of f1 is {u1,u2}
Influence set of f2 is {u1,u2,u3}
u2
f1
f2
u1
u3
Price=1
Price=2
Faculty of Information Technology
Existing work on Reverse Skylines
Dellis et al., “Efficient computation of reverse skyline queries”, VLDB 2007 Lian et al., “Reverse skyline search in uncertain databases”, TODS 2010 Prasad et al., “Efficient reverse skyline retrieval with arbitrary non-metric
similarity measures”, EDBT 2011 Wang et al., “Energy-efficient reverse skyline queries processing over wireless
sensor networks”, TKDE 2012 Wu et al., “Finding the influence set through skylines”, EDBT 2009
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Representative Objects
Given a set of facilities and a set of users, choose t representative facilities considering coverage and diversity
CoverageLet I(f) denote the influence set of a facility.Given a set of facilities F, its coverage is the measure of total
number of distinct users that are influenced by the facilities in F
• Koh et al., “Finding k most favorite products based on reverse top-t queries”, VLDB J. 2014
• Gkorgkas et al., “ Finding the most diverse products using preference queries”, EDBT 2015
Faculty of Information Technology
Representative Objects
DiversityLet I(f) denote the influence set of a facility.Dissimilarity between two facilities is defined based on the Jaccard
similarity of their influence sets
Diversity of a set of facility F is the minimum of the pair-wise dissimilarities between the facilities in the set
Faculty of Information Technology
Representative Objects
Problem DefinitionScore of a set of facilities F is
Given a set of facilities and a set of users, return a set of t facilities with maximum score.
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Techniques
ChallengesProblem is NP-HardRequires computing influence sets for many facilitiesRequires set intersection and union operations to compute diversity
Faculty of Information Technology
Techniques
Phase 1: Compute influence setsPrune the facilities that cannot be among the representative facilitiesCompute influence sets of remaining facilities
Phase 2: Greedy Algorithm Iteratively select a facility f that maximizes the score of current setStop when t facilities have been selected
Faculty of Information Technology
Techniques
Phase 1: Compute influence setsPrune the facilities that cannot be among the representative facilitiesCompute influence sets of remaining facilities
1. Apply existing reverse top-k algorithm for each remaining facility
2. Compute top-k facilities for each user and populate the influence sets of each facilitya) Use branch-and-bound top-k algorithm for each user
b) Use brute-force algorithm to compute top-k for each user
RTK
TK
NBF
Faculty of Information Technology
Techniques
Phase 2: Greedy Algorithm Iteratively select a facility f that maximizes the score of current setStop when t facilities have been selected
Selecting f requires computing set intersection and union operations
1. Compute exact set operations
2. Compute approximate set intersection and union
ESO
MK
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Experimental Results
Faculty of Information Technology
Experimental Results
Faculty of Information Technology
Outline
Influence SetsReverse k Nearest Neighbors QueriesReverse top-k QueriesReverse Skyline Queries
Representative Objects using Influence SetsTechniquesExperiment ResultsSummary
Faculty of Information Technology
Summary
We studied the problem of computing representative objects using influence sets based on reverse top-k queries
Proposed a two phase greedy algorithm with approximation guarantee
Experimental results demonstrate that the greedy algorithms produce high quality results
Faculty of Information Technology
Thanks