Upload
lamhanh
View
212
Download
0
Embed Size (px)
Citation preview
TASM: Top-k Approximate Subtree Matching
Nikolaus Augsten1 Denilson Barbosa2
Michael Bohlen3 Themis Palpanas4
1Free University of Bozen-Bolzano, [email protected]
2University of Alberta, [email protected]
3University of Zurich, [email protected]
4University of Trento, [email protected]
ICDE 2010, March 3Long Beach, CA, USA
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 1 / 28
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 2 / 28
Motivation and Problem Definition
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 3 / 28
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Example Answer: k = 3inproceedings
authors
author
Tim
author
John
booktitle
ICDE
(1 error)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Example Answer: k = 3inproceedings
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
authorsauthor
John
booktitle
TKDE
(1 error) (2 errors)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Motivation and Problem Definition
Motivation
Query (XML fragment) Document (very large XML)
article
authors
author
Tim
author
John
booktitle
ICDEDBLP
28M nodes, 531MB
top-k matches?
Rank the top-k matches for the article query in the DBLP document!
Example Answer: k = 3inproceedings
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
authorsauthor
John
booktitle
TKDE
inproceedings
authors
author
Tim
author
John
author
Peter
booktitle
ICDE
(1 error) (2 errors) (3 errors)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 4 / 28
Motivation and Problem Definition
TASM: Top-k Approximate Subtree Matching
Definition (TASM: Top-k Approximate Subtree Matching)
Given: query tree Q, document tree T , size k of rankingGoal: Compute a
top-k ranking R = (T1, T2, . . . ,Tk)
of all subtrees Ti of document T
with respect to query Q
using the tree edit distance for the ranking.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 5 / 28
Motivation and Problem Definition
TASM: Top-k Approximate Subtree Matching
Definition (TASM: Top-k Approximate Subtree Matching)
Given: query tree Q, document tree T , size k of rankingGoal: Compute a
top-k ranking R = (T1, T2, . . . ,Tk)
of all subtrees Ti of document T
with respect to query Q
using the tree edit distance for the ranking.
Subtree Ti :
a node and all its descendantslargest subtree is document itself
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 5 / 28
Motivation and Problem Definition
TASM: Top-k Approximate Subtree Matching
Definition (TASM: Top-k Approximate Subtree Matching)
Given: query tree Q, document tree T , size k of rankingGoal: Compute a
top-k ranking R = (T1, T2, . . . ,Tk)
of all subtrees Ti of document T
with respect to query Q
using the tree edit distance for the ranking.
Subtree Ti :
a node and all its descendantslargest subtree is document itself
top-k ranking R = (T1, Ti , . . . , Tk )
subtrees sorted by distance to querybest k subtrees: Ti /∈ R ⇒ ted(Q, Tk ) ≤ ted(Q, Ti )
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 5 / 28
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors) ren(ICDE)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors) ren(ICDE)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
TASM computes TED between query and document subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Motivation and Problem Definition
Ranking Function: Tree Edit Distance (TED)
article
authors
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
ICDE
article
author
Tim
author
John
booktitle
TKDE
del(authors) ren(ICDE)
Tree Edit Distance: Minimum number of node edit operations(insert, rename, delete) that transform one tree into the other.
TASM computes TED between query and document subtrees
Size and number of computed subtrees define TASM complexity
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 6 / 28
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
Space complexity limits application to databases
in database applications n is huge (database size!)TASM-Dynamic maintains two m × n matrixes in RAM> 6GB RAM for our tiny query (m = 8) on DBLP (n = 28 × 106)
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
Space complexity limits application to databases
in database applications n is huge (database size!)TASM-Dynamic maintains two m × n matrixes in RAM> 6GB RAM for our tiny query (m = 8) on DBLP (n = 28 × 106)
For database size solutions dynamic programming is too expensive.
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Motivation and Problem Definition
State of the Art
TASM-Dynamic: dynamic programming solution1
computes distance to every subtree of the documentuse smaller subtrees to compute larger onesrank subtrees by visiting memoization tableSpace complexity: O(mn), m: query size, n: document size
Space complexity limits application to databases
in database applications n is huge (database size!)TASM-Dynamic maintains two m × n matrixes in RAM> 6GB RAM for our tiny query (m = 8) on DBLP (n = 28 × 106)
For database size solutions dynamic programming is too expensive.
State-of-the-art algorithms do not scale!
1Zhang and Shasha 1989, Demaine et al. 2007Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 7 / 28
Motivation and Problem Definition
Problem Definition
Find a solution for TASM (Top-k Approximate Subtree Matching) that
scales to very large documents
runs in small memory
ranks subtrees correctly (no heuristics!)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 8 / 28
TASM-Postorder
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 9 / 28
TASM-Postorder Upper Bound on Subtree Size
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 10 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
worst match
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q|
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q| ≤ 2|Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
|T ′
k | ≤ k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q| ≤ 2|Q| + |T ′
k |
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Subtree Size Upper Bound in Three Steps
1. Rank first k subtrees of T in postorder: R ′ = (T ′
1, T′
2, . . . ,T′
k)
Q∅
T ′
k
|T ′
k | ≤ k
delete Q insert T ′
k
worst match
(i) ted(Q, T ′
k) ≤ |Q| + |T ′
k |
2. Final ranking R = (T1, T2, . . . ,Tk) (=TASM result)
Ti ’s in R are better than worst match T ′
k of R ′
(ii) ted(Q, Ti ) ≤ ted(Q, T ′
k) ≤ |Q| + |T ′
k |
3. Size upper bound for subtree Ti
|Ti | − |Q| ≤ ted(Q, Ti ) Q Ti
at least:insert missing nodes
|Ti | − |Q|
|Ti | ≤ ted(Q, Ti ) + |Q| ≤ 2|Q| + |T ′
k | ≤ 2|Q| + k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 11 / 28
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Upper bound is very powerful:
independent of document size and structure!
linear in query size and k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Upper bound is very powerful:
independent of document size and structure!
linear in query size and k
Example: top-10 with example query |Q| = 8 on DBLP (28M nodes)
with bound: max subtree size τ = 2 ∗ 8 + 10 = 26
without bound: maximum subtree size is 28M (whole document)!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
TASM-Postorder Upper Bound on Subtree Size
Upper Bound on Subtree Size
Theorem (Upper Bound on Subtree Size)
TASM needs to consider only small document subtrees of size τ or less:
τ = 2|Q| + k
Upper bound is very powerful:
independent of document size and structure!
linear in query size and k
Example: top-10 with example query |Q| = 8 on DBLP (28M nodes)
with bound: max subtree size τ = 2 ∗ 8 + 10 = 26
without bound: maximum subtree size is 28M (whole document)!
Document-independent upper bound on subtree size!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 12 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 13 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3 dblp,22
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3 dblp,22
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Relevant and state-of-the-art for XML Parsing
full subtree known only at closing tagclosing tags appear in postorder
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Document Format: Postorder Queue
dblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
John,1 auth,2 X1,1 title,2 article,5
VLDB,1 conf,2 Peter,1 auth,2 X3,1
title,2 article,5 Mike,1 auth,2 X4,1
title,2 article,5 proc,13 X2,1 title,2
book,3 dblp,22
Postorder queue: queue of (label,size)-pairs
dequeue removes leftmost element, e.g., (John, 1)no random access!
Relevant and state-of-the-art for XML Parsing
full subtree known only at closing tagclosing tags appear in postorder
Implementation is efficient and heavily used for
XML streamsplain XML files (e.g., SAX)XML in database (Dewey, interval encoding, ...)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 14 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Candidate Subtrees
Candidate subtrees are all subtrees Ti of the document with
|Ti | ≤ τ ANDTi is not contained in a larger subtree |Tj | ≤ τ
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 15 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Candidate Subtrees
Candidate subtrees are all subtrees Ti of the document with
|Ti | ≤ τ ANDTi is not contained in a larger subtree |Tj | ≤ τ
Pruning: find candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 15 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Problem: memory buffer can grow very large!must keep subtrees in memory until non-candidate ancestor is readworst case: memory buffer stores O(n) nodes(frequent in data-centric XML!)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Problem: memory buffer can grow very large!must keep subtrees in memory until non-candidate ancestor is readworst case: memory buffer stores O(n) nodes(frequent in data-centric XML!)
Example: DBLP, τ = 50
99% of nodes are still in buffer when root node is read!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Simple Pruning Approach
dblp22
article5
auth2
John1
title4
X13
proceedings18
conf7
VLDB6
article12
auth9
Peter8
title11
X310
article17
auth14
Mike13
title16
X415
book21
title20
X219
Simple pruning approach: (τ = 6 in example above)add nodes to memory buffer until non-candidate (|Ti | > τ) is addedsubtrees of non-candidate with |Ti | ≤ τ are candidate subtrees
Problem: memory buffer can grow very large!must keep subtrees in memory until non-candidate ancestor is readworst case: memory buffer stores O(n) nodes(frequent in data-centric XML!)
Example: DBLP, τ = 50
99% of nodes are still in buffer when root node is read!
Simple pruning not feasible for large documents!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 16 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Efficient Pruning is Tricky!
Problem: when can we remove a node from the buffer?
when we see |Ti | ≤ τ , we don’t yet know about parent (postorder!)subtree of parent might be smaller than τ !
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 17 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Efficient Pruning is Tricky!
Problem: when can we remove a node from the buffer?
when we see |Ti | ≤ τ , we don’t yet know about parent (postorder!)subtree of parent might be smaller than τ !
Our Solution does not wait for parent
prefix ring buffer: fixed size bufferpruning rule: prune based on following nodes
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 17 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)VLDB,1
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)VLDB,1
e↑ s↑John,1 auth,2 X1,1 title,4 article,5
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new noderemove leftmost subtree/node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
s↑VLDB,1
e↑
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new noderemove leftmost subtree/node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning in Small Memory
prefix ring buffer (τ = 6)
s↑VLDB,1
e↑
Prefix ring buffer of size τ + 1 (main memory)
stores prefix (τ nodes in postorder) of the document
two operations
append new noderemove leftmost subtree/node
Pruning rule: If leftmost node in full ring buffer is
leaf: leftmost subtree is candidate subtree
non-leaf: leftmost node is non-candidate node
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 18 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning Rule – Intuition
Candidate subtree: leftmost node is a leaf
Ti : leftmost subtree, starts with leftmost nodeTj : smallest subtree that contains Ti
due to postorder: Tj contains all nodes in buffersince |Ti | ≤ τ and |Tj | > τ : Ti is a candidate
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 19 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Pruning Rule – Intuition
Candidate subtree: leftmost node is a leaf
Ti : leftmost subtree, starts with leftmost nodeTj : smallest subtree that contains Ti
due to postorder: Tj contains all nodes in buffersince |Ti | ≤ τ and |Tj | > τ : Ti is a candidate
Non-candidate node: leftmost node is a non-leaf
leftmost non-leaf is parent of previously removed nodeswe remove either candidate subtrees and non-candidate nodesin both cases: parent is a non-candidate
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 19 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)John,1 auth,2 X1,1 · · ·
prefix ring buffer (main memory)
s↑ e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X1,1 title,2 · · ·
prefix ring buffer (main memory)
s↑John,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X1,1 title,2 article,5 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 article,5 VLDB,1 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 VLDB,1 conf,2 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)VLDB,1 conf,2 Peter,1 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2 article,5
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)conf,2 Peter,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2 article,5 VLDB,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)conf,2 Peter,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑John,1 auth,2 X1,1 title,2 article,5 VLDB,1
e↑
append
candidate subtrees:(output)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)conf,2 Peter,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑VLDB,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)Peter,1 auth,2 X3,1 · · ·
prefix ring buffer (main memory)
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X3,1 title,2 · · ·
prefix ring buffer (main memory)Peter,1
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X3,1 title,2 article,5 · · ·
prefix ring buffer (main memory)Peter,1 auth,2
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 article,5 Mike,1 · · ·
prefix ring buffer (main memory)Peter,1 auth,2 X3,1
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 Mike,1 auth,2 · · ·
prefix ring buffer (main memory)Peter,1 auth,2 X3,1 title,2
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 Mike,1 auth,2 · · ·
prefix ring buffer (main memory)Peter,1 auth,2 X3,1 title,2
e↑ s↑VLDB,1 conf,2
append
candidate subtrees:(output)
article
auth
John
title
X1
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 Mike,1 auth,2 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)Mike,1 auth,2 X4,1 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2 article,5
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X4,1 title,2 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2 article,5 Mike,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X4,1 title,2 · · ·
prefix ring buffer (main memory)
s↑Peter,1 auth,2 X3,1 title,2 article,5 Mike,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)auth,2 X4,1 title,2 · · ·
prefix ring buffer (main memory)
s↑Mike,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X4,1 title,2 article,5 · · ·
prefix ring buffer (main memory)
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 article,5 proc,13 · · ·
prefix ring buffer (main memory)X4,1
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)article,5 proc,13 X2,1 · · ·
prefix ring buffer (main memory)X4,1 title,2
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)proc,13 X2,1 title,2 · · ·
prefix ring buffer (main memory)X4,1 title,2 article,5
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X2,1 title,2 book,3 · · ·
prefix ring buffer (main memory)X4,1 title,2 article,5 proc,13
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X2,1 title,2 book,3 · · ·
prefix ring buffer (main memory)X4,1 title,2 article,5 proc,13
e↑ s↑Mike,1 auth,2
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)X2,1 title,2 book,3 · · ·
prefix ring buffer (main memory)
s↑proc,13
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)title,2 book,3 dblp,22 · · ·
prefix ring buffer (main memory)
s↑proc,13 X2,1
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)book,3 dblp,22 · · ·
prefix ring buffer (main memory)
s↑proc,13 X2,1 title,2
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)dblp,22 · · ·
prefix ring buffer (main memory)
e↑ s↑proc,13 X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑proc,13 X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑proc,13 X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)dblp,22
e↑ s↑X2,1 title,2 book,3
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)
s↑dblp,22
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)
s↑dblp,22
e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
Prefix Ring Buffer Pruning – Exampledblp
article
auth
John
title
X1
proceedings
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
1. fill ring buffer
2. check leftmost node
leaf: candidate subtree – to resultnon-leaf: non-candidate – remove
3. until queue and buffer empty
τ = 6 postorder queue (input)(empty) · · ·
prefix ring buffer (main memory)
s↑ e↑
append
candidate subtrees:(output)
article
auth
John
title
X1
conf
VLDB
article
auth
Peter
title
X3
article
auth
Mike
title
X4
book
title
X2
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 20 / 28
TASM-Postorder Prefix Ring Buffer Pruning
TASM-Postorder
TASM-postorder
1. empty ranking R, tightening upper bound τ ′= τ
2. for each candidate subtree Ti
a. if |R| = k: update τ ′ = min(τ,max(R) + |Q|)b. compute tree edit distance for all subtrees of Ti within τ ′
c. update ranking R
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 21 / 28
TASM-Postorder Prefix Ring Buffer Pruning
TASM-Postorder
TASM-postorder
1. empty ranking R, tightening upper bound τ ′= τ
2. for each candidate subtree Ti
a. if |R| = k: update τ ′ = min(τ,max(R) + |Q|)b. compute tree edit distance for all subtrees of Ti within τ ′
c. update ranking R
Theorem (TASM-Postorder)
The space complexity of TASM-postorder is independent of thedocument size:
O(m2 + mk)
(m: query size, k: result size)
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 21 / 28
TASM-Postorder Prefix Ring Buffer Pruning
TASM-Postorder
TASM-postorder
1. empty ranking R, tightening upper bound τ ′= τ
2. for each candidate subtree Ti
a. if |R| = k: update τ ′ = min(τ,max(R) + |Q|)b. compute tree edit distance for all subtrees of Ti within τ ′
c. update ranking R
Theorem (TASM-Postorder)
The space complexity of TASM-postorder is independent of thedocument size:
O(m2 + mk)
(m: query size, k: result size)
TASM-postorder scales to very large documents!
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 21 / 28
Experiments
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 22 / 28
Experiments
Pruning Effectiveness
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 23 / 28
Experiments
Pruning Effectiveness
Prefix ring buffer pruning is very effective!Maximum subtree reduced from 37M to 18 nodes.
Dataset: PSD protein sequences, 37M nodes, 683MB
Compute TASM (|Q| = 4, k = 1)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Histogram of computed subtrees
1e0
1e1
1e2
1e3
1e4
1e5
1e6
1e7
1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7
num
ber
of s
ubtr
ees
subtree size (nodes)
largest subtree: 37Mentire document
TASM-Dynamic
1e0
1e1
1e2
1e3
1e4
1e5
1e6
1e7
1e0 1e1 1e2 1e3 1e4 1e5 1e6 1e7
num
ber
of s
ubtr
ees
subtree size (nodes)
largest subtree: 18
TASM-Postorder
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 23 / 28
Experiments
Scalability: TASM-Postorder vs. TASM-Dynamic
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 24 / 28
Experiments
Scalability: TASM-Postorder vs. TASM-Dynamic
TASM-postorder much faster than TASM-dynamic.
Dataset: XMark (synthetic XML for benchmark)
Vary query size and document size
Compute TASM (k = 5)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Measure wall clock time
1e0
1e1
1e2
1e3
4 8 16 32 64
time
(sec
onds
)
query size (nodes)
dyn, T:224MBdyn, T:112MBpos, T:224MBpos, T:112MB
1e0
1e1
1e2
1e3
112 224 448 896 1792
time
(sec
onds
)
document size (MB)
dyn, |Q|=8dyn, |Q|=4pos, |Q|=8pos, |Q|=4
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 24 / 28
Experiments
Scalability with Result Size k
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 25 / 28
Experiments
Scalability with Result Size k
TASM-postorder scales well with k .Increasing k by 4 orders of magnitude only doubles runtime.
0
50
100
150
200
250
300
1e0 1e1 1e2 1e3 1e4
time
(sec
onds
)
k
dyn, T:224MBdyn, T:112MBpos, T:224MBpos, T:112MB
Dataset: XMark (synthetic XML forbenchmark)
Vary k (size of ranking)
Compute TASM (|Q| = 16)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Measure wall clock time
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 25 / 28
Experiments
Space complexity: TASM-Postorder vs. TASM-Dynamic
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 26 / 28
Experiments
Space complexity: TASM-Postorder vs. TASM-Dynamic
TASM-postorder: space independent of document!
1e0
1e1
1e2
1e3
4e3
112 224 448 896 1792
mem
ory
(MB
)
document size (MB)
3GB
8MB
dyn, |Q|=16dyn, |Q|=4
pos, |Q|=16pos, |Q|=4
Dataset: XMark (synthetic XML forbenchmark)
Vary document size
Compute TASM (k = 5)
TASM-dynamic (state of the art)TASM-postorder (our solution)
Measure main memory usage
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 26 / 28
Conclusion and Future Work
Outline
1 Motivation and Problem Definition
2 TASM-PostorderUpper Bound on Subtree SizePrefix Ring Buffer Pruning
3 Experiments
4 Conclusion and Future Work
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 27 / 28
Conclusion and Future Work
Conclusion
Conclusion
Prefix Ring Buffer for space efficient pruning
Dynamic programming does not scale for database size solutions.
Upper bound τττ : limit maximum subtree size for TASM
TASM-postorder: highly scalable TASM algorithm
TASM-postorder makes TASM feasible.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 28 / 28
Conclusion and Future Work
Conclusion
Conclusion
Prefix Ring Buffer for space efficient pruning
Dynamic programming does not scale for database size solutions.
Upper bound τττ : limit maximum subtree size for TASM
TASM-postorder: highly scalable TASM algorithm
TASM-postorder makes TASM feasible.
Future Work – New research opportunities:
tune tree edit distance to different applications
index the document: can we avoid a document scan?
parallel TASM algorithm: where to split document?
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 28 / 28
Erik D. Demaine, Shay Mozes, Benjamin Rossman, and OrenWeimann.An optimal decomposition algorithm for tree edit distance.In ICALP, volume 4596 of LNCS, pages 146–157, Wroclaw, Poland,July 2007. Springer.
K. Zhang and D. Shasha.Simple fast algorithms for the editing distance between trees andrelated problems.SIAM J. on Computing, 18(6):1245–1262, 1989.
Nikolaus Augsten (Bolzano, Italy) TASM: Top-k Approx. Subtree Matching ICDE 2010 28 / 28