Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Parallel Execution Approaches on Data and Index
Structures in the Context of Semantic Web Database
Management Systems
Dennis Heinrich
Institute of Information Systems
September 19th, 2018
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 1/18
Motivation
Query Processing
Cloud Computing
Hardware Acceleration Index Structures
Semantic Web
Focus of thesis
[3-5]
[1,2]
[6-8]
[9-13][14,15]
[1] OJDB, 2015[2] OJDB, 2016[3] ReCoSoC, Bremen (GER), 2015[4] EDB, Busan (SKR), 2017[5] SUPE, 2018[6] OJSW, 2014[7] OJCC, 2014[8] OJSW, 2015[9] CIT, Xi’an (CN), 2014
[10] CONCURR COMP-PRACT E, 2015[11] ReCoSoC, Bremen (GER), 2015[12] OJDB, 2016[13] MICPRO, 2017[14] ReConFig, Cancun (MEX), 2015[15] ARCS, Vienna (AUT), 2017
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 2/18
Motivation
Query Processing
Cloud Computing
Hardware Acceleration Index Structures
Semantic Web
Focus of thesis
[1,2]
[6-8]
[9-13][14,15]
[3-5]
Focus of presentation
[1] OJDB, 2015[2] OJDB, 2016[3] ReCoSoC, Bremen (GER), 2015[4] EDB, Busan (SKR), 2017[5] SUPE, 2018[6] OJSW, 2014[7] OJCC, 2014[8] OJSW, 2015[9] CIT, Xi’an (CN), 2014
[10] CONCURR COMP-PRACT E, 2015[11] ReCoSoC, Bremen (GER), 2015[12] OJDB, 2016[13] MICPRO, 2017[14] ReConFig, Cancun (MEX), 2015[15] ARCS, Vienna (AUT), 2017
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 2/18
Motivation
What’s the problem?
I Continuous data growth leads to necessary improvement
I Improving sequential executions becomes challenging
... and the solution?
I Parallel executions to fit multi-core systems and specialized hardware accelerators
(like FPGAs)
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 3/18
Semantic Web Database: Data
Dog
Cat
type
type
Animal
Animal
Triples (String)
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 4/18
Semantic Web Database: Dictionary
Dog
Cat
type
type
Animal
Animal
1
4
…
Animal
Cat
7 Dog
9 type
…
…
…
DictionaryTriples (String)
(1)
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 4/18
Semantic Web Database: Data revisited
Dog
Cat
type
type
Animal
Animal
1
4
…
Animal
Cat
7 Dog
9 type
7
4
9
9
1
1
…
…
…
DictionaryTriples (String)
Triples (Integer)
(1)
(2)
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 4/18
Semantic Web Database: B+-tree as Index Structure
Dog
Cat
type
type
Animal
Animal
1
4
…
Animal
Cat
7 Dog
9 type
7
4
9
9
1
1
…
…
…
DictionaryTriples (String)
Triples (Integer)
B+ Tree
(1)
(2)
(3)
5
9
3
2
9
1
4
8
7
4
9
1
6
9
3
7
8
4
7
9
1
8
9
3
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 4/18
Semantic Web Database: Example Query
Dog
Cat
type
type
Animal
Animal
1
4
…
Animal
Cat
7 Dog
9 type
7
4
9
9
1
1
…
…
…
DictionaryTriples (String)
Triples (Integer)
B+ Tree
(1)
(2)
(3)
7
9
?
Dog
type
?
(4)
Dog?
type?
5
9
3
2
9
1
4
8
7
4
9
1
6
9
3
7
8
4
7
9
1
8
9
3
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 4/18
Semantic Web Database: Example Query
Dog
Cat
type
type
Animal
Animal
1
4
…
Animal
Cat
7 Dog
9 type
7
4
9
9
1
1
…
…
…
DictionaryTriples (String)
Triples (Integer)
B+ Tree
(1)
(2)
(3)
7
9
?
Dog
type
?
(4)
Dog?
type?
5
9
3
2
9
1
4
8
7
4
9
1
6
9
3
7
8
4
7
9
1
8
9
3
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 4/18
Semantic Web Database: Example Query
Dog
Cat
type
type
Animal
Animal
1
4
…
Animal
Cat
7 Dog
9 type
7
4
9
9
1
1
…
…
…
DictionaryTriples (String)
Triples (Integer)
B+ Tree
(1)
(2)
(3)
7
9
?
Dog
type
Animal
(4)
Dog?
type?
1?
5
9
3
2
9
1
4
8
7
4
9
1
6
9
3
7
8
4
7
9
1
8
9
3
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 4/18
A short recapitulation on B+-Trees
1 43
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - root
1 4343
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - inner nodes
1 4343
1 15 25 1 46 12815 25 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
inner node order k→ from k to 2 · k keys
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - leaves
1 43
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 5 20 24 37 38 42 44 45 47 48 135 1661 1 1 1 1 1 1 1 1 11 11
inner node order k→ from k to 2 · k keys
leaf order k′ → from k′ to 2 · k′ keys
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - values with data or references
1 43
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 111 1 1 1 1 1 1 1 1 11 11
inner node order k→ from k to 2 · k keys
leaf order k′ → from k′ to 2 · k′ keys
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - edges
1 43
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
inner node order k→ from k to 2 · k keys
leaf order k′ → from k′ to 2 · k′ keys
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - example search
1 4343
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
search for value from key 48
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - example search
1 4343
1 15 25 1 46 12846 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
search for value from key 48
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
A short recapitulation on B+-Trees - example search
1 4343
1 15 25 1 46 12846 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 16647 481 1 1 1 1 1 1 1 1 11 111
search for value from key 48
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 5/18
Searching inside a B+-Tree node
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
linear search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
binary search
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 6/18
Searching inside a B+-Tree node - linear search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
linear search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
binary search
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 6/18
Searching inside a B+-Tree node - binary search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
linear search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
binary search
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 6/18
Searching inside a B+-Tree node - approach: parallel search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
linear search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
binary search
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
parallel search
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 6/18
Requirements & Challenges
13 24 32 36 44 48 52 55 57 63 69 71 76 82 88 93
parallel search
What we need to consider:
I Parallel memory access
FPGA: independed Block RAM(distributed over chip area)
CPU/GPU: sequential data access
I 96 Bit Integer triples
FPGA: individual data widthCPU/GPU: word width of 32 or 64 Bits
Further we want:
I Fully functional Semantic Web system
(insert, update, delete)
⇒ use a hybrid system
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 7/18
Our approach
DATABASE SERVER
Main
Mem
ory/HD
Internal Memory
CPUFPGA
CLIENT
SEARCH/UPDATE
RE
SU
LT
root
leaves
S1
root
leaves
FPGA
SOFTWARE
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 8/18
Our approach
DATABASE SERVER
Main
Mem
ory/HD
Internal Memory
CPUFPGA
CLIENT
SEARCH/UPDATE
RE
SU
LT
root
leaves
S1
root
leaves
SF
S2
FPGA
SOFTWARE
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 8/18
Experimental setup
Hardware:
I Xilinx Virtex-6 XC6VHX380T
I Hosted on a board
I Connected via PCIe
Software:
I Dell Precision T3610 workstation
I 40 GB of RAM
I Intel Xeon E5-1600 v2 processor with 3.0 GHz
I LUPOSDATED. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 9/18
Measured tree groups
LFPGA:
I Variable order from 3 to 9
I Software + FPGA
LB+ :
I Typical parameters for disk use
I Order is 500, just entered triple amount differs
I Software only
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 10/18
What was achieved?
Search:
I Speed up of over 2
I Halved the time per search operation
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 11/18
What was achieved?
Search:
I Speed up of over 2
I Halved the time per search operation
What about update operations (insert/delete)?
I Every update needs a search
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 11/18
What was achieved?
Search:
I Speed up of over 2
I Halved the time per search operation
What about update operations (insert/delete)?
I Every update needs a search
Problem: Is it worth to transfer our tree to the FPGA if the structure of the tree is often
changing?
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 11/18
Worst Case and Best Case for insert and delete operations
root
leaves
SF
S2
FPGA
SOFTWARE
43 156 256 512
Entry point FPGA
15 25 28 33· · · · · ·
1 1 1 1
· · · · · · · · ·
43 156 ∅ ∅
Entry point FPGA
15 25 ∅ ∅· · · 56 64 ∅ ∅
1 1 ∅ ∅ 1 1 ∅ ∅
· · · ∅ ∅
Worst Case for Insert Best Case for Insert
Best Case for Delete Worst Case for DeleteD. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 12/18
Worst Case and Best Case for insert and delete operations
Best Case:
43 156 256 512
Entry point FPGA
15 25 28 33· · · · · ·
1 1 1 1
· · · · · · · · ·
possible Number Of Updates = (Updates In Leaves
·Updates In Inner Nodes Software
·Entry Points FPGA to Host)
(1)
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 13/18
Worst Case and Best Case for insert and delete operations
Best Case:
43 156 256 512
Entry point FPGA
15 25 28 33· · · · · ·
1 1 1 1
· · · · · · · · ·
possible Number Of Updates = (Updates In Leaves
·Updates In Inner Nodes Software
·Entry Points FPGA to Host)
(1)
Worst Case:
I only one operation
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 13/18
Idea: Scheduler
Let a scheduler decide:
tgain = possible Number Of Updates · tOp · rOp − tsetup(x) (2)
tOp: time saved per operation
rOp =number of searchesnumber of updates
(3)
tsetup(x): time for setting up the tree with x triples
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 14/18
Setup time tsetup(x) for a new tree
0
5
10
15
20
25
30
35
0 2e+007 4e+007 6e+007 8e+007 1e+008 1.2e+008
time
inse
cond
s
number of triples
BFS
++++++++++++++++++++++
++++++ + +
++ +
+
+
+Convert
∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗∗∗ ∗∗∗∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
∗Transfer
���������������������� ������ � � � � �
��
�t(x) = 0.00000027501701520384x
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 15/18
Minimum ratio rOp for best and worst case
worst case best case
order rOpworst rOpbest Possible Number Of Updates
3 30,112.9 0.0148414 1,200,500
4 34,210.9 0.0107322 3,280,500
5 73,021.3 0.0082955 7,320,000
6 125,228.9 0.0083823 14,280,000
7 205,270.3 0.0077481 25,312,000
8 280,292.6 0.0064449 41,760,500
9 486,721.4 0.0066672 65,160,000
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 16/18
Average Case
0
10
20
30
40
50
60
70
80
3 4 5 6 7 8 9
r OP
order of the tree
best case
+ + + + + + +
+500 triples
∗∗ ∗
∗
∗
∗
∗∗1,000 triples
�� �
��
��
�2,000 triples
• • •• •
• •
•4,000 triples
N N N N N N N
N
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 17/18
Summary
I Hybrid use of a B+-Tree to search all keys of a node in parallel on the FPGA
I Evaluation: system setup vs. search acceleration
Questions?
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
LUPOSDATE
Index-Generation
Abstract Syntax Tree
CoreSPARQL-Query
Abstract Syntax Tree
Operatorgraph
Logical optimized Operatorgraph
Physical optimized Operatorgraph
Result
RDF-Data
Preprocessing
Optimization
Transformation into CoreSPARQL
Logical Optimization
Physical Optimization
Evaluation
SPARQL-Parser
CoreSPARQL-Parser
Transformation into Operatorgraph
SPARQL-Query
Mapping on FPGA resources
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Appendix
1 43
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
addressaddress
addressaddress
addressaddress
addressaddress
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Appendix
1 43
node group1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
address+offset
address+offset
+2·offsetaddress
+offset+2·offset
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Appendix
1 key0-0 key0-1
1 key1-0 key1-1 1 key2-0 key2-1 key2-2 key2-3 1 key3-0 key3-1
1 key4-0 key4-1 key4-2
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
address0-0
address0-1 address0-END
address1-0
address1-1
address1-END
address2-0
address2-1
address2-2
address2-3
address2-END
address3-0
address3-1
address3-END
key0-0 address0-0 key0-1 address0-1 address0-END
key1-0 address1-0 key1-1 address1-1 address1-END
key2-0 address2-0 key2-1 address2-1 key2-2 address2-2 key2-3 address2-3 address2-END
Key3-0 address3-0 key3-1 address3-1 address3-END
key4-0 address4-0 key4-1 address4-1 key4-2 address4-2 address4-END
. . .
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Appendix
1 key0-0 key0-1
1 key1-0 key1-1 1 key2-0 key2-1 key2-2 key2-3 1 key3-0 key3-1
1 key4-0 key4-1 key4-2
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
address0
address0-1 address0-END
address1
address1-1
address1-END
address2
address2-1
address2-2
address2-3
address2-END
address3
address3-1
address3-END
address0 10 key0-0 key0-1
address1 10 key1-0 key1-1
address2 00 key2-0 key2-1 key2-2 key2-3
address3 10 key3-0 key3-1
address4 01 key4-0 key4-1 key4-2
. . .D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Appendix
1 key0-0 key0-1
1 key1-0 key1-1 1 key2-0 key2-1 key2-2 key2-3 1 key3-0 key3-1
1 key4-0 key4-1 key4-2
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AddressL0
AddressL1
AddressL2
address0-0
address0-1 address0-END
address1-0
address1-1
address1-END
address2-0
address2-1
address2-2
address2-3
address2-END
address3-0
address3-1
address3-END
Addressnumber AddressL0 . . . AddressLn
10 key0-0 key0-1
10 key1-0 key1-1
00 key2-0 key2-1 key2-2 key2-3
10 key3-0 key3-1
01 key4-0 key4-1 key4-2
. . .
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Execution times
0
25k
50k
75k
100k
125k
150k
175k
A B C D E A B C D E A B C D E A B C D E A B C D E A B C D E A B C D E
tim
e in
ns
number of triples
LB+
LMMLFPGA
order 3order 4order 5order 6order 7order 8order 9
order 3 4 5 6 7 8 9
A 343,001 729,001 1,331,001 2,197,001 3,375,001 4,913,001 6,859,001
B 857,500 2,187,000 4,658,250 8,787,750 15,187,250 24,565,000 37,723,000
C 1,372,000 3,645,000 7,985,500 15,378,500 26,999,500 44,217,000 68,588,000
D 1,886,500 5,103,000 11,312,750 21,969,250 38,811,750 63,869,000 99,453,000
E 2,401,000 6,561,000 14,640,000 28,560,000 50,624,000 83,521,000 130,320,000
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Speed up - LMM against LFPGA
1
1.05
1.1
1.15
1.2
A B C D E
spee
d up
number of triples
order 3order 4order 5order 6order 7order 8order 9
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Speed up - LB+ against LFPGA
1
1.25
1.5
1.75
2
2.25
2.5
A B C D E
spee
d up
number of triples
order 3order 4order 5
order 6order 7order 8
order 9
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
An example B+-Tree ...
1 43
1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
addressaddress
addressaddress
addressaddress
addressaddress
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
... and as a Cache Sensitiv B+-Tree (CSB+-Tree1)
1 43
node group1 15 25 1 46 128
1 5 1 20 24 1 37 38 42 1 44 45 1 47 48 1 135 1661 1 1 1 1 1 1 1 1 11 11
address+offset
address+offset
+2·offsetaddress
+offset+2·offset
1 from Jun Rao and Kenneth A. Ross, Making B+-Trees Cache Conscious in Main Memory, ACM 2000
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Parallel search for a triple inside the FPGA
Current Address
Triple1 Triple2 . . . Triplen
Searched Triple>= >= . . . >=
PositionNext Address
+
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Parallel search for a triple inside the FPGA - load node
Current Address
Triple1 Triple2 . . . Triplen
Searched Triple>= >= . . . >=
PositionNext Address
+
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Parallel search for a triple inside the FPGA - compare with searched key
Current Address
Triple1 Triple2 . . . Triplen
Searched Triple>= >= . . . >=
PositionNext Address
+
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Parallel search for a triple inside the FPGA - calculate new address
Current Address
Triple1 Triple2 . . . Triplen
Searched Triple>= >= . . . >=
PositionNext Address
+
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Entered number of triples
I Generated trees have 4 inner levels (except LB+ )
I Leaves with order 500 always in software
I Gradually filling the inner nodes with keys from A to E (max. keys)
order A B C D E
3 343,001 857,500 1,372,000 1,886,500 2,401,000
4 729,001 2,187,000 3,645,000 5,103,000 6,561,000
5 1,331,001 4,658,250 7,985,500 11,312,750 14,640,000
6 2,197,001 8,787,750 15,378,500 21,969,250 28,560,000
7 3,375,001 15,187,250 26,999,500 38,811,750 50,624,000
8 4,913,001 24,565,000 44,217,000 63,869,000 83,521,000
9 6,859,001 37,723,000 68,588,000 99,453,000 130,320,000
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Patricia Trie Merge - 2 Tries
a1
a2 a3
a4
aa bb
a b
a0
b1
b2 b3
b4
ab bb
c d b5
c b0
c2
a bb
a b b5
c
a1
a2 a3
a b b1
b2 b3
c d
c0
c1
Merg
e In
pu
t Pa
tricia Tries
Merg
ed P
atricia
Trie
Legend: node with label kx
edge for
c1…cy “c1…cy“ string 1. Step 2. Step 3. Step 4. Step 5. Step
kx
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Patricia Trie Merge - Multiple Tries
Merg
e In
pu
t Pa
tricia Tries
Merg
ed P
atricia
Trie
cc
c1 c2
bc
c3
d c0
b
a5
d
a1
a2 a3
a
a b
a0
a4 b1
b2 b3 b5
bb
c b c
c b0
b6
b4
d
b
b d
a1
a2 a3
a
a b
c
d5
d2 b1 c1 d4 b6
d0
b2 b3
b
c c
c
d
d1 d3
Legend: node with label kx
edge for
c1…cy “c1…cy“ string 1. Step 2. Step 3. Step 4. Step 5. Step
kx
6. Step 7. Step 8. Step 9. Step
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
PatTrieSort - Total Time
0
5000
10000
15000
20000
25000
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
11
12
13
14
15
16
17
18
PatTrieSort String Merging
External Merge Sort
Replacement Selection
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
50
0 0
00
1 0
00
00
02
00
0 0
00
4 0
00
00
08
00
0 0
00
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
02
00
0 0
00
4 0
00
00
08
00
0 0
00
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
02
00
0 0
00
4 0
00
00
08
00
0 0
00
16
00
0 0
00
32
00
0 0
00
11
12
13
14
15
16
17
18
19
Total Time (in seconds)
PatTrieSort String Merging
External Merge Sort
Replacement Selection
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
PatTrieSort - Bytes Read
0
50.000.000.000
100.000.000.000
150.000.000.000
200.000.000.000
250.000.000.000
300.000.000.000
350.000.000.000
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
11
12
13
14
15
16
17
18
Bytes Read
PatTrieSort String Merging
External Merge Sort
Replacement Selection
0
10.000.000.000
20.000.000.000
30.000.000.000
40.000.000.000
50.000.000.000
60.000.000.000
70.000.000.000
80.000.000.000
90.000.000.000
100.000.000.000
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
11
12
13
14
15
16
17
18
19
Bytes Read
PatTrieSort String Merging
External Merge Sort
Replacement Selection
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
PatTrieSort - Bytes Written
0
50.000.000.000
100.000.000.000
150.000.000.000
200.000.000.000
250.000.000.000
300.000.000.000
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
11
12
13
14
15
16
17
18
Bytes Written
PatTrieSort String Merging
External Merge Sort
Replacement Selection
0
10.000.000.000
20.000.000.000
30.000.000.000
40.000.000.000
50.000.000.000
60.000.000.000
70.000.000.000
80.000.000.000
90.000.000.000
100.000.000.000
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
11
12
13
14
15
16
17
18
19
Bytes Written
PatTrieSort String Merging
External Merge Sort
Replacement Selection
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
PatTrieSort - Number of Runs
1
10
100
1000
10000
500 000 1 000 000 2 000 000 4 000 000 8 000 000 16 000 000 32 000 000
1
10
100
1000
10000
500 000 1 000 000 2 000 000 4 000 000 8 000 000 16 000 000 32 000 000
PatTrieSort/String Merging External Merge Sort
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
PatTrieSort - Number of Runs - Replacement Selektion
1
10
100
1000
10000
100000
1000000
11
(4
09
5)
12
(8
19
1)
13
(1
6 3
83
)
14
(3
2 7
67
)
15
(6
5 5
35
)
16
(1
31
07
1)
17
(2
62
14
3)
18
(5
24
28
7)
1
10
100
1000
10000
100000
1000000
11
(4
09
5)
12
(8
19
1)
13
(1
6 3
83
)
14
(3
2 7
67
)
15
(6
5 5
35
)
16
(1
31
07
1)
17
(2
62
14
3)
18
(5
24
28
7)
19
(1
04
8 5
75
)
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
PatTrieSort - Total IO
0
100.000.000.000
200.000.000.000
300.000.000.000
400.000.000.000
500.000.000.000
600.000.000.000
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
11
12
13
14
15
16
17
18
Total I/O-Costs
PatTrieSort String Merging
External Merge Sort
Replacement Selection
0
20.000.000.000
40.000.000.000
60.000.000.000
80.000.000.000
100.000.000.000
120.000.000.000
140.000.000.000
160.000.000.000
180.000.000.000
200.000.000.000
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
50
0 0
00
1 0
00
00
0
2 0
00
00
0
4 0
00
00
0
8 0
00
00
0
16
00
0 0
00
32
00
0 0
00
11
12
13
14
15
16
17
18
19
Total I/O-Costs
PatTrieSort String Merging
External Merge Sort
Replacement Selection
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
4
Blocks of RDF Data
v:rdf
06
11
113
25
10
349
402
524
638
77
13
v:
289
567
956
745
A
622
811
000
109
12
41014
378
134
v:
9765
A
2
10 12 148 11103 13
v:Journal rdfs:subClassOf v:BibEntity .
v:Article rdfs:subClassOf v:BibEntity .
i:OJBD rdf:type v:Journal .
i:OJBD v:title "Open Journal of Big Data"^^xsd:string .
1. Build patricia trie and map triples to temporary IDs
2. Map to local IDs
0 1 2 .
3 1 2 .
4 5 0 .
4 6 7 .
0 1 2 .
0 3 4 .
0 5 6 .
0 7 8 .
8 9 10 .
0 3 8 .
0 7 10 .
0 6 2 .
0 4 1 .
1 5 9 .
…
6 1 5 .
4 1 5 .
0 2 6 .
0 3 7 .
0 2 6 .
0 3 7 .
4 1 5 .
6 1 5 .
S P O
4 1 5 .
6 1 5 .
0 2 6 .
0 3 7 .
O P S
4. Sort 6 times and store runs
0 3 8 .
0 4 1 .
0 6 2 .
0 7 10 .
1 5 9 .
S P O
0 4 1 .
0 6 2 .
0 3 8 .
1 5 9 .
0 7 10 .
O P S
…3. Roll out patricia trie
5. Merge patricia tries
Dictionary
String → ID(B+-tree)
ID → String(Diskbased Array)
6. Generate dictionary
…2 4 11 .
2 8 13 .
9 3 10 .
11 3 10 .
S P O
9 3 10 .
11 3 10 .
2 4 11 .
2 8 13 .
O P S
7. Determinemapping from
local to global IDs
0 4 9 .
0 5 1 .
0 7 2 .
0 8 14 .
1 6 12 .
S P O
0 5 1 .
0 7 2 .
0 4 9 .
1 6 12 .
0 8 14 .
O P S
…
Evaluation Indices
…SPO(B+-tree)
OPS(B+-tree)
9. Merge runs andgenerate evaluation indices
8. Map runs fromlocal to global IDs
ID types:temporary
localglobal
i:Article1 rdf:type v:Article .
i:Article1 v:title "Solving Big Problems"@en .
i:Article1 v:publishedIn i:OJBD .
i:Article1 v:creator i:Author_BigData .
i:Author_BigData v:name "Big Data Expert"^^xsd:string .
Indexconstruction Single Times
0
5000
10000
15000
20000
25000
1 M 5 M 10 M 25 M 50 M 100 M
Tim
e in
Se
con
ds
Size of RDF Blocks (Number of Triples)
Building patricia tries Mapping to local IDs Local sorting Merging tries
Generating global dictionary Mapping initial runs to global ids Merging initial runs Generating evaluation indices
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18
Indexconstruction Total Time
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
1 M 5 M 10 M 25 M 50 M 100 M
Co
nst
ruct
ion
Tim
e in
Se
con
ds
Size of RDF Blocks (Number of Triples)
Building patricia tries Mapping to local IDs Local sorting Merging tries
Generating global dictionary Mapping initial runs to global ids Merging initial runs Generating evaluation indices
D. Heinrich Parallel Execution Approaches on Data and Index Structures in the Context of SW DBMS 18/18