Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California,
Speaker: Sattam Alsubaiee Supporting Location-Based
Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and
Chen Li University of California, Irvine Sattam Alsubaiee,
Alexander Behm, and Chen Li University of California, Irvine
Supporting Location-Based Approximate-Keyword Queries ACM
SIGSPATIAL GIS 2010 1
Speaker: Sattam Alsubaiee Just One Typo Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
4
Slide 5
Speaker: Sattam Alsubaiee Problem Formulation Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
5 Object Collection chaochi restaurant starbucks apple store sams
club Object Collection chaochi restaurant starbucks apple store
sams club Find objects in San Jose with keywords similar to chochi
& resturant
Slide 6
Speaker: Sattam Alsubaiee Preliminaries: Location-Based Keyword
Search Find objects within a given spatial region that have a given
set of keywords Supporting Location-Based Approximate-Keyword
Queries ACM SIGSPATIAL GIS 2010 Augment a hierarchal spatial index
with textual information 6
Slide 7
Speaker: Sattam Alsubaiee Preliminaries: Approximate String
Search Supporting Location-Based Approximate-Keyword Queries ACM
SIGSPATIAL GIS 2010 7 chaochi chucho church Query q: chochi Query
q: chochi Collection of strings s Search Output: strings s that
satisfy Sim(q,s) Sim functions: Edit distance, Jaccard, Cosine,
etc
Slide 8
Speaker: Sattam Alsubaiee Preliminaries: Approximate String
Search chaochi 2-grams {ch, ha, ao, oc, ch, hi} Intuition: similar
strings share a certain number of grams Sliding Window Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
8 Gram-based inverted-index Gram-based inverted-index
Speaker: Sattam Alsubaiee Contributions How to combine those
indexes Three Algorithms 1) Simple fixed-level solution 2)
Utilizing local spatial distribution of objects 3) Exploiting
frequency distribution of keywords How to combine those indexes
Three Algorithms 1) Simple fixed-level solution 2) Utilizing local
spatial distribution of objects 3) Exploiting frequency
distribution of keywords Supporting Location-Based
Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 10
Speaker: Sattam Alsubaiee Query Example Query: objects in San
Jose with keywords similar to chochi & resturant Based on edit
distance of 1 Query: objects in San Jose with keywords similar to
chochi & resturant Based on edit distance of 1 Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
12
Slide 13
Speaker: Sattam Alsubaiee Query Example Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
13 Query: objects in San Jose with keywords similar to chochi &
resturant Based on edit distance of 1 Query: objects in San Jose
with keywords similar to chochi & resturant Based on edit
distance of 1
Slide 14
Speaker: Sattam Alsubaiee Query Example Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
14 Query: objects in San Jose with keywords similar to chochi &
resturant Based on edit distance of 1 Query: objects in San Jose
with keywords similar to chochi & resturant Based on edit
distance of 1
Slide 15
Speaker: Sattam Alsubaiee Query Example Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
15 Query: objects in San Jose with keywords similar to chochi &
resturant Based on edit distance of 1 Query: objects in San Jose
with keywords similar to chochi & resturant Based on edit
distance of 1
Slide 16
Speaker: Sattam Alsubaiee How to Choose Level L? Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
16 Trade off between space and time until some level (both
increase)
Slide 17
Speaker: Sattam Alsubaiee Observations Query time & index
size sensitive to approximate-index locations Fixed-level solution
ignores local spatial distribution of objects Query time &
index size sensitive to approximate-index locations Fixed-level
solution ignores local spatial distribution of objects Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
17 Prefer to build approximate index at parent Prefer to build
approximate indexes at children
Speaker: Sattam Alsubaiee Selecting Nodes for Approximate
Indexes Goal: find optimal set of nodes that should have
approximate indexes Supporting Location-Based Approximate-Keyword
Queries ACM SIGSPATIAL GIS 2010 Optimization problem: given an
R*-tree and a space budget, choose nodes to store approximate
indexes, to minimize query time NP-hard (Knapsack problem)
Optimization problem: given an R*-tree and a space budget, choose
nodes to store approximate indexes, to minimize query time NP-hard
(Knapsack problem) 19
Speaker: Sattam Alsubaiee Cost/Benefit Estimation Effects of
pushing index down Increase space cost Increase or decrease average
query time Typically Higher levels: good to push index down
Intermediate levels: unclear whether to push it down Effects of
pushing index down Increase space cost Increase or decrease average
query time Typically Higher levels: good to push index down
Intermediate levels: unclear whether to push it down Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
21
Slide 22
Speaker: Sattam Alsubaiee Algorithm 3: Exploiting Frequency
Distribution of Keywords Supporting Location-Based
Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 22
(Spatial-Approximate Nodes) (Spatial-Keyword Nodes)
Slide 23
Speaker: Sattam Alsubaiee Experiments Settings Four-core Intel
Xeon E5520 2.26Ghz 12GB of RAM Ubuntu OS C++ implementation
LBAK-tree in main memory Keyword-frequency threshold = 1 R*-tree
fanout = 40 Settings Four-core Intel Xeon E5520 2.26Ghz 12GB of RAM
Ubuntu OS C++ implementation LBAK-tree in main memory
Keyword-frequency threshold = 1 R*-tree fanout = 40 Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
23
Slide 24
Speaker: Sattam Alsubaiee Experiments Dataset CoPhIR Test
Collection (CoPhIR) 3.75 million objects Raw data size: 500MB
Business listings (Business) 20.4 million business listings in the
U.S Raw data size: 4GB Queries 10,000 queries for each dataset
30km-by-30km query window around randomly selected object Randomly
chose two keywords of the randomly chosen object Normalized
edit-distance of 0.8 Dataset CoPhIR Test Collection (CoPhIR) 3.75
million objects Raw data size: 500MB Business listings (Business)
20.4 million business listings in the U.S Raw data size: 4GB
Queries 10,000 queries for each dataset 30km-by-30km query window
around randomly selected object Randomly chose two keywords of the
randomly chosen object Normalized edit-distance of 0.8 Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
24
Slide 25
Speaker: Sattam Alsubaiee Terminology FL: fixed-level approach
e.g.,FL-0 approximate indexes are at the root level VL:
variable-level approach VLF: variable-level approach exploiting
keyword-frequencies FL: fixed-level approach e.g.,FL-0 approximate
indexes are at the root level VL: variable-level approach VLF:
variable-level approach exploiting keyword-frequencies Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
25
Slide 26
Speaker: Sattam Alsubaiee Comparison with MHR-Tree* Supporting
Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010
* B. Yao, F. Li, M. Hadjieleftheriou, and K. Hou. Approximate
string search in spatial databases. In ICDE, 2010 26 Maximum recall
for MHR-Tree that we achieved is around 50% LBAK-Tree recall is
100% Maximum recall for MHR-Tree that we achieved is around 50%
LBAK-Tree recall is 100%
Slide 27
Speaker: Sattam Alsubaiee Index Size & Query Time
Supporting Location-Based Approximate-Keyword Queries ACM
SIGSPATIAL GIS 2010 27 Business Listings
Slide 28
Speaker: Sattam Alsubaiee Scalability: Query Time vs. VLF
Supporting Location-Based Approximate-Keyword Queries ACM
SIGSPATIAL GIS 2010 28 Used space budge: minimum index size for VLF
to achieve best query time Business Listings
Slide 29
Speaker: Sattam Alsubaiee Conclusion Spatial index +
Approximate index = LBAK-tree 1) Simple fixed-level solution 2)
Utilizing local spatial distribution of objects 3) Exploiting
frequency distribution of keywords Spatial index + Approximate
index = LBAK-tree 1) Simple fixed-level solution 2) Utilizing local
spatial distribution of objects 3) Exploiting frequency
distribution of keywords Supporting Location-Based
Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 29
Slide 30
Speaker: Sattam Alsubaiee Thank You! This work is part of The
Flamingo Project Source Code: http://flamingo.ics.uci.edu
http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu
/localsearch/fuzzysearch/ http://flamingo.ics.uci.edu
/localsearch/fuzzysearch/ This work is part of The Flamingo Project
Source Code: http://flamingo.ics.uci.edu
http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu
/localsearch/fuzzysearch/ http://flamingo.ics.uci.edu
/localsearch/fuzzysearch/ Supporting Location-Based
Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 30