Upload
cubrik-project
View
159
Download
0
Tags:
Embed Size (px)
DESCRIPTION
presentation of "Top-k Bounded Diversification" research paper
Citation preview
+
Top-k bounded diversification
Piero Fraternali, Davide Martinenghi, Marco TagliasacchiPolitecnico di Milano, Italy
Scottsdale, AZ, USA - May 24, 20121
+Motivation
Diversification is useful in application domains where objects can be described by a score a 2- or 3-dimensional feature vector
Many examples from search (real estate, image search, …) Apartments distributed over a map
Score (e.g., price) + 2D feature vector (geo-localization) Evolution in time of price of apartments over a map
Score (e.g., price) + 3D feature vector (geo-localization + time)
Properties of images (e.g., HSI color features) Score (e.g., relevance to a given keyword) + 3D feature
vector (e.g., average HSI components in the image)
2
+Diversified result setLooking for good restaurants in Milan
3
+Diversified result setLooking for good restaurants in Milan
4
top 15
+Diversified result setLooking for good restaurants in Milan
5
top 15 diversified
over the region
top 15
+Diversification
We are given a set O of N objects is the vector-space representation of
object o is the relevance score of object o
Diversification problem
6
+Diversification
We are given a set O of N objects is the vector-space representation of
object o is the relevance score of object o
Diversification problem
7
Best diversified set of K objects
Relevance to query (as
score)
Diversity (as distance)
Set of objects
Objective function
+ Greedy approach to diversification
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted
score K steps in total
MMR (Maximum Marginal Relevance)
8
+ Greedy approach to diversification
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted
score K steps in total
MMR (Maximum Marginal Relevance)
9
Balance between
relevance and diversity
RelevanceDiversity
Diversity-weighted score
+ Greedy approach to diversification
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted
score K steps in total
Corresponding objective function:
MMR (Maximum Marginal Relevance)
10
+ Greedy approach to diversification
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function) Find K objects that are both relevant and diverse At each step, pick the object with largest diversity-weighted
score K steps in total
Main disadvantage: All objects must be available from the beginning
MMR (Maximum Marginal Relevance)
11
+Bounded diversification
Objects are embedded in a bounded region of space E.g., a bounding rectangle
Accessing objects is costly Objects are progressively accessed (not available at time 0) The number of accessed objects (sumDepths) should be
minimized
Indexes for sorted access to objects are available Access by score (in descending order) Access by distance from a given point (in ascending order) Both are very common in services on the Web (e.g.,
apartments search)
12
+Distance-based accessRestaurants by distance from a given point q
13
+
Size of icon proportional to score
+Score-based accessRestaurants by score
14
+
Size of icon proportional to score
+ Attacking bounded diversification
Goal: achieve the same quality of result as MMR But minimizing the number of accessed objects
K iterations: within each of them do this as long as needed Pulling strategy: choose an access method (by score or
distance) If by distance, choose from which point (probing location)
Bounding scheme: compute an upper bound on the diversity-weighted score that can be achieved by unseen objects
If a seen object exceeds the bound, select it and do next iteration
Credits to [Schnaitter&Polyzotis 2008] for their Pull-Bound Rank Join template
The Pull-Bound MMR (PBMMR) template
15
+Choosing probing locations
Goal of distance-based access: Exploring the region of space in which the object with the
best diversity-weighted score is most likely to be found
At each of the K iterations, we fix the probing locations at the most promising points of the unexplored space Vertices of the bounded Voronoi diagram of the points
selected at the previous iterations
Of these, the most promising ones are as far as possible from all the objects of the current selection
16
+Example
4 objects x1, …, x4 selected during the first 4 iterations
Bounding region is a square
Voronoi diagram of selected objects
17
+Example
4 objects x1, …, x4 selected during the first 4 iterations
Bounding region is a square
Voronoi diagram of selected objects
18
Probing locations
+Example
A new object is selected
Voronoi diagram of selected objects
19
+
Probing locations: v1, …, v4 (vertices of the bounding region)
Shading: distance from closest points (brightest in vertices)
ExampleBounded Voronoi diagram of selected objects
20
+
Probing locations: v1, …, v6 (vertices of bounded Voronoi diagram)
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest point between x1 and x2” are among v1, …, v6
ExampleBounded Voronoi diagram of selected objects
21
+
Probing locations: v1, …, v8
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest point among x1, …, x3” are among v1, …, v8
ExampleBounded Voronoi diagram of selected objects
22
+
Probing locations: v1, …, v10
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest point among x1, …, x4” are among v1, …, v10
ExampleBounded Voronoi diagram of selected objects
23
+
Probing locations: v1, …, v12 (no other intersection in region)
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest point among x1, …, x5” are among v1, …, v12
ExampleBounded Voronoi diagram of selected objects
24
+Example
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
A running state
25
+Example
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
A running state
26
+Example
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
A running state
27
+Example
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
A running state
28
+Example
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
A running state
29
+Bounding schemeComputing a tight upper bound
30
A bound is tight if it can be achieved in some hypothetical continuation of the instance being explored
A tight upper bound can be computed as follows:
+Bounding schemeComputing a tight upper bound
31
A bound is tight if it can be achieved in some hypothetical continuation of the instance being explored
A tight upper bound can be computed as follows:
Maximal minimal
distance from the selected
objectsSet of selected objects
Unexplored region of space
Highest score possible (last seen by score-based access)
+Bounding schemeComputing a tight upper bound
32
A bound is tight if it can be achieved in some hypothetical continuation of the instance being explored
A tight upper bound can be computed as follows:
Theorem: the point x* that maximizes the minimal distance from all the selected objects is a vertex of the convex hull of unexplored part of a cell of the bounded Voronoi diagram
Theorem: the bound obtained in this way is tight
+Selecting the next probing location
In 2D, the point maximizing the minimal distance can only be A vertex of the bounded
Voronoi diagram An intersection between
an edge and a circumference
An intersection between two circumferences
The corresponding vertex is selected as the next probing location
33
+Selecting the next probing location
In 2D, the point maximizing the minimal distance can only be A vertex of the bounded
Voronoi diagram An intersection between
an edge and a circumference
An intersection between two circumferences
The corresponding vertex is selected as the next probing location
34
Point maximizing the minimal
distance
Vertex selected as next probing
location
+Selecting the next probing location
In 2D, the point maximizing the minimal distance can only be A vertex of the bounded
Voronoi diagram An intersection between
an edge and a circumference
An intersection between two circumferences
The corresponding vertex is selected as the next probing location
35
Point maximizing the minimal
distance
Vertex selected as next probing
location
+Pulling strategy
Round robin: select, in alternation, each probing location Some loose form of instance optimality can already be
achieved with a tight bounding scheme and round robin
Potential adaptive: Choose the probing location that is most likely to reduce
the upper bound Potential adaptive is never worse than round robin Choice between access by score or by distance
Looking at how they reduce the upper bound wrt. the number of accessed objects
36
+Batched access
In the model so far, objects are accessed one by one Not practical for many scenarios “Batched access” modes available in many practical
systems: Give a point and a radius and receive all objects that fall
within
Strategy with batched access: Perform exactly one request per probing location with an
optimal choice of the radius This amounts to solving an optimization problem that
Minimizes the threshold by appropriately choosing the radii
Is subject to a budget constraint (how many objects am I willing to retrieve)
37
+ExperimentsSynthetic data, uniform distribution
38
+ExperimentsSynthetic data, exponential distribution
39
+ExperimentsReal data
40
+Conclusion
Diversification revisited Sorted access modes to avoid accessing all objects Same quality as MMR A structured template with bounding scheme and pulling
strategy
Optimality guarantees with one-by-one access to objects Tight bound Instance optimality (in a loose sense)
Extreme practical efficiency with batched access mode
Future work: Adaptation to other diversification algorithms
41
+Acknowledgments:CUbRIK Project CUbRIK is a research project
financed by the European Union
Goals: Advance the architecture of
multimedia search Exploit the human
contribution in multimedia search
Use open-source components provided by the community
Start up a search business ecosystem
http://www.cubrikproject.eu/
42