Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California,

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

  • Slide 1
  • Speaker: Sattam Alsubaiee Supporting Location-Based Approximate-Keyword Queries Sattam Alsubaiee, Alexander Behm, and Chen Li University of California, Irvine Sattam Alsubaiee, Alexander Behm, and Chen Li University of California, Irvine Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 1
  • Slide 2
  • Speaker: Sattam Alsubaiee Lunch Time! I want Chinese food! Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 2 Remembering restaurant name?! Ch-o-chi?! Remembering restaurant name?! Ch-o-chi?!
  • Slide 3
  • Speaker: Sattam Alsubaiee Lets Find It! Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 3
  • Slide 4
  • Speaker: Sattam Alsubaiee Just One Typo Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 4
  • Slide 5
  • Speaker: Sattam Alsubaiee Problem Formulation Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 5 Object Collection chaochi restaurant starbucks apple store sams club Object Collection chaochi restaurant starbucks apple store sams club Find objects in San Jose with keywords similar to chochi & resturant
  • Slide 6
  • Speaker: Sattam Alsubaiee Preliminaries: Location-Based Keyword Search Find objects within a given spatial region that have a given set of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Augment a hierarchal spatial index with textual information 6
  • Slide 7
  • Speaker: Sattam Alsubaiee Preliminaries: Approximate String Search Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 7 chaochi chucho church Query q: chochi Query q: chochi Collection of strings s Search Output: strings s that satisfy Sim(q,s) Sim functions: Edit distance, Jaccard, Cosine, etc
  • Slide 8
  • Speaker: Sattam Alsubaiee Preliminaries: Approximate String Search chaochi 2-grams {ch, ha, ao, oc, ch, hi} Intuition: similar strings share a certain number of grams Sliding Window Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 8 Gram-based inverted-index Gram-based inverted-index
  • Slide 9
  • Speaker: Sattam Alsubaiee Our Solution Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Tree-based spatial index Approximate string search capability Keyword search capability LBAK-Tree 9
  • Slide 10
  • Speaker: Sattam Alsubaiee Contributions How to combine those indexes Three Algorithms 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords How to combine those indexes Three Algorithms 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 10
  • Slide 11
  • Speaker: Sattam Alsubaiee Algorithm 1: Fixed-Level Solution Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 11 (Spatial Nodes) (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)
  • Slide 12
  • Speaker: Sattam Alsubaiee Query Example Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1 Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 12
  • Slide 13
  • Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 13 Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1 Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1
  • Slide 14
  • Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 14 Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1 Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1
  • Slide 15
  • Speaker: Sattam Alsubaiee Query Example Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 15 Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1 Query: objects in San Jose with keywords similar to chochi & resturant Based on edit distance of 1
  • Slide 16
  • Speaker: Sattam Alsubaiee How to Choose Level L? Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 16 Trade off between space and time until some level (both increase)
  • Slide 17
  • Speaker: Sattam Alsubaiee Observations Query time & index size sensitive to approximate-index locations Fixed-level solution ignores local spatial distribution of objects Query time & index size sensitive to approximate-index locations Fixed-level solution ignores local spatial distribution of objects Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 17 Prefer to build approximate index at parent Prefer to build approximate indexes at children
  • Slide 18
  • Speaker: Sattam Alsubaiee Algorithm 2: Placing Approximate Indexes at Variable Levels Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 18 (Spatial Nodes) (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)
  • Slide 19
  • Speaker: Sattam Alsubaiee Selecting Nodes for Approximate Indexes Goal: find optimal set of nodes that should have approximate indexes Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 Optimization problem: given an R*-tree and a space budget, choose nodes to store approximate indexes, to minimize query time NP-hard (Knapsack problem) Optimization problem: given an R*-tree and a space budget, choose nodes to store approximate indexes, to minimize query time NP-hard (Knapsack problem) 19
  • Slide 20
  • Speaker: Sattam Alsubaiee Greedy Algorithm: Selecting Nodes for Approximate Indexes Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 N6 N3 N1 N2 N4 N7 N5 N12 N13 N14 N8 N9 N10N11 N15 20
  • Slide 21
  • Speaker: Sattam Alsubaiee Cost/Benefit Estimation Effects of pushing index down Increase space cost Increase or decrease average query time Typically Higher levels: good to push index down Intermediate levels: unclear whether to push it down Effects of pushing index down Increase space cost Increase or decrease average query time Typically Higher levels: good to push index down Intermediate levels: unclear whether to push it down Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 21
  • Slide 22
  • Speaker: Sattam Alsubaiee Algorithm 3: Exploiting Frequency Distribution of Keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 22 (Spatial-Approximate Nodes) (Spatial-Keyword Nodes)
  • Slide 23
  • Speaker: Sattam Alsubaiee Experiments Settings Four-core Intel Xeon E5520 2.26Ghz 12GB of RAM Ubuntu OS C++ implementation LBAK-tree in main memory Keyword-frequency threshold = 1 R*-tree fanout = 40 Settings Four-core Intel Xeon E5520 2.26Ghz 12GB of RAM Ubuntu OS C++ implementation LBAK-tree in main memory Keyword-frequency threshold = 1 R*-tree fanout = 40 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 23
  • Slide 24
  • Speaker: Sattam Alsubaiee Experiments Dataset CoPhIR Test Collection (CoPhIR) 3.75 million objects Raw data size: 500MB Business listings (Business) 20.4 million business listings in the U.S Raw data size: 4GB Queries 10,000 queries for each dataset 30km-by-30km query window around randomly selected object Randomly chose two keywords of the randomly chosen object Normalized edit-distance of 0.8 Dataset CoPhIR Test Collection (CoPhIR) 3.75 million objects Raw data size: 500MB Business listings (Business) 20.4 million business listings in the U.S Raw data size: 4GB Queries 10,000 queries for each dataset 30km-by-30km query window around randomly selected object Randomly chose two keywords of the randomly chosen object Normalized edit-distance of 0.8 Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 24
  • Slide 25
  • Speaker: Sattam Alsubaiee Terminology FL: fixed-level approach e.g.,FL-0 approximate indexes are at the root level VL: variable-level approach VLF: variable-level approach exploiting keyword-frequencies FL: fixed-level approach e.g.,FL-0 approximate indexes are at the root level VL: variable-level approach VLF: variable-level approach exploiting keyword-frequencies Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 25
  • Slide 26
  • Speaker: Sattam Alsubaiee Comparison with MHR-Tree* Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 * B. Yao, F. Li, M. Hadjieleftheriou, and K. Hou. Approximate string search in spatial databases. In ICDE, 2010 26 Maximum recall for MHR-Tree that we achieved is around 50% LBAK-Tree recall is 100% Maximum recall for MHR-Tree that we achieved is around 50% LBAK-Tree recall is 100%
  • Slide 27
  • Speaker: Sattam Alsubaiee Index Size & Query Time Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 27 Business Listings
  • Slide 28
  • Speaker: Sattam Alsubaiee Scalability: Query Time vs. VLF Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 28 Used space budge: minimum index size for VLF to achieve best query time Business Listings
  • Slide 29
  • Speaker: Sattam Alsubaiee Conclusion Spatial index + Approximate index = LBAK-tree 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Spatial index + Approximate index = LBAK-tree 1) Simple fixed-level solution 2) Utilizing local spatial distribution of objects 3) Exploiting frequency distribution of keywords Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 29
  • Slide 30
  • Speaker: Sattam Alsubaiee Thank You! This work is part of The Flamingo Project Source Code: http://flamingo.ics.uci.edu http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ This work is part of The Flamingo Project Source Code: http://flamingo.ics.uci.edu http://flamingo.ics.uci.edu Live Demo: http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ http://flamingo.ics.uci.edu /localsearch/fuzzysearch/ Supporting Location-Based Approximate-Keyword Queries ACM SIGSPATIAL GIS 2010 30