Upload
xue
View
41
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Revisiting Pipelined Parallelism in Multi-Join Query Processing. Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute (binliu|rundenst)@cs.wpi.edu. http://www.davis.wpi.edu/dsrg. Multi-Join Queries. Data Integration over Distributed Data Sources - PowerPoint PPT Presentation
Citation preview
VLDB 2005 1
Revisiting Pipelined Parallelism in Multi-Join Query Processing
Bin Liu and Elke A. RundensteinerWorcester Polytechnic Institute(binliu|rundenst)@cs.wpi.edu
http://www.davis.wpi.edu/dsrg
VLDB 2005 2
Multi-Join Queries
Data Integration over Distributed Data Sources i.e., Extract Transform Load (ETL) Services
Data Source
Data Source
Data Source
…
Data Warehouse
Data Warehouse
…
Persistent Storage
(1) High IO costs given large intermediate results(2) Disk access undesirable since one time process
VLDB 2005 3
Applying Parallelism
Processed in Main Memory of a PC Cluster Make use of aggregated resources (main memory, CPU)
Network
Clusters of Machines
VLDB 2005 4
Three Types of Parallelism
Pipelined:Operators be composed into producer and consumer relationship
Independent:Independent operators run simultaneously on distinct machines
Partitioned:Single operator replicated and run on multiple machines
VLDB 2005 5
Basics of Hash Join Two-Phase Hash Join [SD89, LTS90]
Demonstrated High Performance Potential High Degree of Parallelism
………
…5/140012
…5/130011
…DateID
………
…HPS0012
…IPC0011
…ItemOID
Orders LineItems
valuekey
(1) Build hash tables of Orders based on ID
………
…5/140012
…5/130011
…DateID
Orders
(2) Probe hash tables and output results
………
…HPS0012
…IPC0011
…ItemOID
LineItems
VLDB 2005 6
Partitioned Hash Join
Orders
(1) Build hash tables of Orders based on ID
………
…5/140012
…5/130011
…DateID
Split
valuekey valuekey valuekey
(2) Probe hash tables and output results
………
…HPS0012
…IPC0011
…ItemOID
LineItems
Partition (Inputs) Hash Tables across Processors Have Each Processing Node Run in Parallel
VLDB 2005 7
Left-Deep Tree [SD90]
R6
R7
R1
R2
R5 R4
R3
R8
R9
Example Join Graph
R1 R2
R3
R8
R9
B1 P1
B2 P2
B7 P7
B8 P8
Left-Deep Query Tree Steps:
(1) Scan R1 – Build R1
(2) Scan R2 – Probe P1 – Build B2
(3) Scan R3 – Probe P2 – Build B3
(8) Scan R8 – Probe P7 – Build B8
(9) Scan R9 – Probe P8 – Output
…
VLDB 2005 8
Right-Deep Tree [SD90]
R6
R7
R1
R2
R5 R4
R3
R8
R9
Example Join Graph
R1R2
R3
R8
R9
B1 P1
B2 P2
B7 P7
B8 P8
Right-Deep Query Tree
(1) Scan R2 – Build R1, Scan R3 – Build R2, …, Scan R9 – Build R8
(2) Scan R1, Probe P1, Probe P2, … , Probe P8
VLDB 2005 9
Tradeoffs Between Left and Right Trees
Right-Deep Good potential for pipelined parallelism.
Intermediate results exist only as a stream.
Size of building relations can be predicted accurately.
Large memory consumption.
Left-Deep Less memory consumption
Less pipelined parallelism
VLDB 2005 10
State-of-the-Art Solutions
Implicit Assumption : Prefer Maximal Pipelined Parallelism !!!
R3
R2R1
R5
B1 P1
B2 P2
B4 P4
B3 P3
R4
B8 P8
R9 B7 P7
R8
VLDB 2005 11
State-of-the-Art Solutions
What if : Memory Constrained Environments ? Strategy :
R3
R2R1
R5
B1 P1
B2 P2
B4 P4
B3 P3
R4
B8 P8
R9 B7 P7
R8
R3
R2R1
R5
B1 P1
B2 P2
B4 P4
B3 P3
R4
B8 P8
R9 B7 P7
R8
Pipeline !
Break tree into several pieces, and Process one piece at a time (as pipeline)
I.e., Static Right-Deep[SD90], ZigZag [ZZBS93], Segmented Right-Deep [CLYY92].
VLDB 2005 12
Pipelined Execution
Optimal Degree of Parallelism? I.e., It may not be necessary to partition R2 over a large number of machines if it only has 1000 tuples?
Redirection Cost: The intermediate results generated may need to be partitioned to a different machine.
R1R2
R3
R4
R2 R3 R4R1
Computation Machines
Partition Partition Partition Partition
BuildingProbing
P32 P3
3 P34P2
2 P23 P2
4P12 P1
3 P14t
t P12
VLDB 2005 13
Pipelined Cost Model
Compute n-way join over k machines Probing relation R0, building relations, R1, R2, …, Rn
Ii represents the intermediate results after joining with Ri
Total Work (Wb+Wp) & Total Processing Time (Tb+Tp)
n
iibuildnetworkpartitionreadb RttttW
1
*)(
probe
n
i
n
iinetworki
probenetworkpartitionreadp
tItIk
k
RttttW
***1
*)(
1
1
1
1
0
ibuildnetworkpartitionreadni
b Rk
kfttttT *
)(*)(max
1
deletep
setupp Ik
WIT
VLDB 2005 14
Break Pipelined Parallelism
Large number of small pipelines High interdependence between pipelined segments
i.e., P1 > P2, P3 > P4, P2 > P4,
R9
R7
R1 R0
P1P2 P3
P4
R3 R2R1 R0 R4 R5R7 R6
To Break Long Pipeline and Introduce Independent Parallelism
VLDB 2005 15
Segmented Bushy Tree
Basic Idea Compose large pipelined segment Run pipelined segments independently Compose bushy tree with minimal interdependency
R7
R6
R4
R3
R5 R0
R1
R8
R9
R2
R2R4 R3R8 R6 R9R7
R5
R1
R0I1 I2P1
P3
P2
To balance pipelined and independent parallelism
VLDB 2005 16
Cost-Based
Heuristics
Composing Segmented Tree
Input: A connected join graph G with n nodes. Number m specifies maximum number of nodes in each graph.
Output: Segmented bushy tree with at least n/m subtrees.
completed = false;WHILE (!completed) {
Choose node V with largest cardinality that has not yet been grouped as probing relation;Enumerate all subgraphs starting from V with at most m nodes;Choose best subgraph, mark nodes in this group as having been selected in original join graph;IF !(exist K, K is a connected subgraph of G with unselected nodes) && (K.size() >= 2) {
completed = true;}
}Compose segmented bushy tree from all groups;
VLDB 2005 17
Example
R7
R6
R4
R3
R5 R0
R1
R8
R9
R2
R7
R6
R4
R3
R5 R0
R1
R8
R9
R2
G1
(1) R7, R8, R9, R6
(2) R7, R9, R6, R8
(3) R7, R4, R8, R5
...
(1) R1, R0, R2, R3
(2) R1, R2, R0, R3
(3) R1, R2, R3, R4
...
R7
R6
R4
R3
R5 R0
R1
R8
R9
R2
G1
G2
VLDB 2005 18
Example : Segmented Bushy Tree
R2R4 R3R8 R6 R9R7
R5
R1
R0I1 I2R7
R6
R4
R3
R5 R0
R1
R8
R9
R2
G1
G2
G3
VLDB 2005 19
Machine Allocation Based on building relation sizes of each segment
Nb: total amount of building work.
ki: number of machines allocated to pipeline i
R2R4 R3R8 R6 R9R7
R5
R1
R0I1 I2k1
k3
k2
7,1,90
1 ||||ii
i IRNb =
bN
RRRk
|)||||(| 9861
bN
RRRk
|)||||(| 4322
)( 213 kkkk
VLDB 2005 20
Insufficient Main Memory
Break query based on main memory availability Compose segmented bushy tree for each part
R7
R6
R4
R3
R5 R0
R1
R8
R9
R2
R15
R16
R18
R19
R17 R11
R10
R14
R13
R12
VLDB 2005 21
Experimental Setup
10 Machine Cluster Each machine has 2 2.4GHz Xeon CPUs, 2G Memory. Connect by gigabit ethernet switch
Oracle 8i
Controller
...
10 Machine Cluster
PIII 800M Hz PC, 256M Memory
2 PIII 1G CPUs, 1G Memory
Application PIII 800M Hz PC, 256M Memory
VLDB 2005 22
Experimental Setup (cont.)
Generated Data Set with Integer Join Values Around 40 bytes per tuple
Randomly Generated Join Queries Acyclic join graph with 8, 12, 16 nodes Each node represents one join relation Each edge represents one join condition Average join ratio is 1 Cardinality of each relation is from 1K ~ 100K Up to 600MB per query
VLDB 2005 23
Pipelined vs. Segmented (I)
0
100000
200000
300000
400000
500000
600000
700000
Sample Queries
Pro
ce
ss
ing
Tim
e (
ms
)
Right-Deep TreeSegmented Bushy Tree (3)
VLDB 2005 24
Pipelined vs. Segmented (II)
0
100000
200000
300000
400000
500000
600000
700000
800000
8 12 16
Number of relations in a query
Pro
ce
ss
ing
tim
e (
ms
)
Right-DeepSegmented Bushy
VLDB 2005 25
Insufficient Main Memory
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
1 2 3 4 5 6 7 8 9 10
Example Queries
Pro
cess
ing
Tim
e (m
s)
Segmented Right-DeepSegmented Bushy Tree
VLDB 2005 26
Related Work [SD90] Tradeoffs in processing complex join queries via hashing in
multiprocessor database machines. VLDB 1990. [CLYY92] Using segmented right deep trees for execution of pipelined
hash joins. VLDB 1992. [MLD94] Parallel hash based join algorithms for a shared everything
environment. TKDE 1994. [MD97] Data placement in shared nothing parallel database systems.
VLDB 1997. [WFA95] Parallel evaluation of multi-join queries. SIGMOD 1995. [HCY94] On parallel execution of multiple pipelined hash joins. SIGMOD
1994. [DNSS92] Practical skew handling in parallel joins. VLDB 1992. [SHCF03] Flux: an adaptive partitioning operator for continuous query
systems. ICDE, 2003.
VLDB 2005 27
Conclusions
Observation: Maximal pipelined hash join processing Redirection costs? optimal degree of parallelism?
Hypothesis: Worthwhile to incorporate independent parallelism into processing Both, so several shorter pipelines in parallel
Solution: Segmented bushy tree processing Heuristics and cost-driven algorithm developed
Validation : Extensive experimental studies Achieve around 50% improvement over pure pipelined processing