Experimental Evidence on Partitioning in Parallel Data Warehouses

1

Experimental Evidence on Experimental Evidence on

Partitioning Partitioning

in Parallel Data Warehousesin Parallel Data Warehouses

Pedro FurtadoPedro FurtadoProf. at Univ. of CoimbraProf. at Univ. of Coimbra

& Researcher at CISUC& Researcher at CISUCDEI/CISUC-Universidade de CoimbraDEI/CISUC-Universidade de Coimbra

PortugalPortugal

2

Pedro Furtado, DOLAP 2004 Pedro Furtado, DOLAP 2004

ContextContext

• Parallelism used for major performance improvement in large Data warehouses

• Using simple low-cost shared-nothing architecture– Without any efficiency requirements on Network or Nodes

NODE PARTITIONED DATA WAREHOUSE

• Minimize inter-node data exchange requirements– Horizontally fully-partition facts (largest), rest of relations are

replicated

• Hope to obtain near-to-linear speedup

3


to run it n times faster …“Divide to conquer”

- Horizontally Partition Large Facts (randomly)

into n Nodes

- Replicate other Relations (Small Dimensions?)

Node 1

D2D1

D3 D4

Sales

Node 2

Sales

D2D1

D3 D4

Node 3

D2D1

D3 D4

Sales

Sales

D2D1

D3 D4

4


Why Replicate Dimensions?Why Replicate Dimensions?• We replicated because we would not need to repartition

nodesall

jn

n

_

1njAAA

nAAA

R R j

Fact

R R Fact

21j1

211

nodesall

jn

n

_

1nAAA

nAAA

R R j

Fact

R R Fact

211

211

Wouldn´t work with partitioned dimensions:

…and you can do other ops independently as well

5


Query processingQuery processing

SUM(X) over 1/n FACT, Ds GROUP BY dims

SUM(X) over 1/n FACT, DsGROUP BY dims

SUM(X) over 1/n FACT , DsGROUP BY dims

SUM(SUMs) SUM(X) over FACT, dims GROUP BY dims

6


Query Processing StepsQuery Processing Steps

RewriteQuery

Send Query

Compute Partial Result

Send Partial Results

Apply MergeQuery

Computing Nodes

1. 2.

3. 5.

6.

Redistribute

Submitter Node

Repartition

4.

7.

7


Problem (TPC-H case study)Problem (TPC-H case study)

PartSupp

Supplier

Customer

Orders

Lineitem

Part

Very large

Large

? ?

• Many typical Schemas are “Complex” – many large

relations may exist

Medium

8


Problem StatementProblem Statement

• Divide by N … would expect N times faster - Linear Speedup (LS)

• However, we don´t get the LS

9


Our Major ContributionsOur Major Contributions

• Show these problems experimentally – performance evaluation benchmark TPC-H: We EXPLAIN AND

ILLUSTRATE the LARGE RELATIONS problem

• Identify simple modifications to improve results

• Analyze the modifications experimentally

10


Partitioning Facts (Largest)Partitioning Facts (Largest)

• LI + PS Partitioned

PS

S

C

O Li

P

PS

S

C

O

Li

P

S

C

O

Li

P

PS

Node 1

Node N

11


• Generated TPC-H 50GB into 1 and 25 nodes

• Used PCs (Pentium III 866 MHz CPU) 512MB RAM

• Oracle 9i, tuned initial setting

• TPC-H 22 query set

• Measured Response Time: 1 node against 25 nodes

• We show that the speedup underachievement is explained mostly

by the size of replicated dimensions

12


Experimental ResultsExperimental Results

0 10 20 30

Q6

Q1

Q15

Que

ry

Speedup

LS Speedup: 25-30

0 5 10 15

Q19

Q11

Q14

Que

ry

Speedup

• Only a few queries exhibited near-to-LS!

Medium Speedup 6-15

0 1 2 3 4 5 6

Q7

Q5

Q9

Q3

Q16

Q12

Q10

Que

ry

Speedup

Low Speedup 2-6

0 0.5 1 1.5 2

Q8

Q22

Q4

Q13

Q21

Q2

Que

ry

Speedup

Very Low Speedup 0.4-1.9

13


Some had Linear Speedup…Some had Linear Speedup…

0 10 20 30

Q6

Q1

Q15

Que

ry

Speedup

LS Speedup 25-30

S

C

O

Li

P

Q15:

PS

•S is reasonably small relative to Li/N

S

C

O

P

Q1, Q6:

LiPS

•Access only fragments (Li/N)

14


Others had smaller speedup…Others had smaller speedup…

Medium Speedup 6-15

S

C

O

Li

P

S

C

O

Li

P

Q14, Q19: Q11

0 5 10 15

Q19

Q11

Q14

Que

ry

Speedup

PSPS

•P is not small relative to fragment (Li/N) •S is not small relative to PS/N

15


What Happened…What Happened…

• With N nodes we would like to:– process 1/N of the data, have about N times speedup

• However, we have replicated relations…

• The amount of speedup degradation depends on the size of

R2 relative to R1/N

21

1,21 RR

Nconst

N

R

N

R

constRN

R ,

21

16


Low Speedup Queries:Low Speedup Queries:Speedup 2-5.5

S

C

O

Li

P

S

C

O

Li

P

Q3, Q5, Q7, Q10, Q12:

Q16:

PS

PS

•O is large relative to Li/N

•P is large relative to PS/N0 1 2 3 4 5 6

Q7

Q5

Q9

Q3

Q16

Q12

Q10

Que

ry

Speedup

S

C

O

Li

P

Q9:

PS

17


Very Low or No Speedup Very Low or No Speedup Queries:Queries:

Speedup 0.4-2

S

C

O

Li

P

Q13, Q22:

PS

•Process only replicated relations

0 0.5 1 1.5 2

Q8

Q22

Q4

Q13

Q21

Q2

Que

ry

Speedup

S

C

O

Li

P

Q8:

PS

•Includes all replicated relations

Q4, Q21, Q2:

•Scenarios Similar to “Slow Queries”

18


What Happened…What Happened…

• Not only includes replicated relations…

• But also replicated relations included are very large in

comparison to fragments!

constRN

R ,

21

const

N

R

N

R ,21

N

RR 1

2

19


The same in pictures…The same in pictures…

• Medium speedup

• Low speedup

S

C

O

Li

P

PS

S

C

O

Li

P

PS


• Large speedup

S

C

O

Li

P

PS

• No speedup at all

S

C

O

Li

P

PS


20


Back to Partitioning Alternatives…Back to Partitioning Alternatives…• Placement alternatives: relation in Single Node vs Replicated (all nodes) vs

Partitioned

• Partitioning function (Round-robin/Random, Range, HASH)

• Choice of Partitioning attributes

ProductSupplyHistory

(PS)

Orders(O)Lineitem

(LI)

? ?PS_key

O key

Customer(C)?

C key

• Repartitioning = re-hash by exchanging rows between nodes

• When you partition more than 1 rel => will probably need to

repartition

• e.g.: If you partition LI and O by O_KEY = “equi-partitioned”

… LI join PS needs repartitioning of LI

… O join C needs repartitioning of O

21


Lets Review Related Work…Lets Review Related Work…• Replicate all but one relation – PRS [Yu et al., TKDE89]

– Similar to what we did: replicated all except LI

[Yu et al., TKDE89]: “Partition strategy for distributed query processing in fast local

networks”

• Partition using dependencies - PLACEMENT DEPENDENCY [Liu et al, ICDE96]

– e.g. partition ORDERs and Co-locate its LINEITEM rows (LI is the dependant relation)

[Liu et al, ICDE96]: “A Distributed Query Processing Strategy Using Placement Dependency”

[Chen et al, ICPADS 2000]: “An Efficient Algorithm for Distributed Queries Using

Partition Dependency”.

• Parallel Hash Join and Optimization - PHJ– Relations are hash-partitioned, Repartitioning required to re-hash in order to JOIN

[DeWitt et al., VLDB11]: “Multiprocessor Hash-Based Join Algorithms”

[Liu et al, EDBT96]: “A Hash Partition Strategy for Distributed Query Processing”

[Kitsuregawa et al., 1983 ], “Application of hash to database machine and its architecture”

[Shasha et al., TODS91]: “Optimizing Equijoin Queries In Distributed Databases … Hash

Partitioned”.

• Workload-based Partitioning and Placement– Determine best partitioning attributes automatically, based on the workload

• [Daniel Zilio et al. 1994], “Partitioning Key Selection for a Shared-Nothing Parallel Database System”

• [Rao et al., SIGMOD 2000]: Automating physical database design in a parallel database.

22


Local Replicated Join:Local Replicated Join:

• Join Fragment to replicated relation locally, no data

exchanged

• One Relation must be Replicated – E.g. LI(O_KEY), O()

Costlocal replicated join=

N

RR 2

1

N nodes, relations R, constant

23


Local Partitioned JoinLocal Partitioned Join• Join fragments locally, no data exchanged

• Relations must be equi-partitioned– E.g. LI(O_KEY), O(O_KEY)

Costlocal join=

N

R

N

R 21

N nodes, relations R, constant

24


Repartition JoinRepartition Join

• Re-hash with data exchange, then join locally

• Relation Partitions are not co-located– E.g. O(O_KEY), C(C_KEY)

CostRepartition join=

N

R

N

R

N

R

N

R 212

11

, constant weight factors

Depends on network configuration

25


Proposed SolutionProposed Solution

• “Very Small” Dimensions– Replicate

– “Very small” depends on relation sizes and nº of nodes

• Non-small Dimensions– Hash-Partition by PRIMARY KEY

• because they “always” join based on PK (with facts)

• like in placement-dependency, we take advantage of invariant

• Facts– Find hash-partitioning attribute that minimizes repartitioning costs

– Reasonable approximation: most frequent equi-join attr.

26


Result of Partitioning (TPC-H)Result of Partitioning (TPC-H)

O Li

P

PS

O_KEY

S

C

O_KEYP_KEY

P_KEY

Local Join (equi-partitioned)

Replicated Join (with small dimension)

Repartitioned Join

27


Experimental ResultsExperimental Results

0 10 20 30

Q6

Q1

Q15

Que

ry

Speedup0 5 10 15 20 25

Q19

Q11

Q14

Que

ry

Speedup

0 10 20 30

Q7

Q5

Q9

Q3

Q16

Q12

Q10

Que

ry

Speedup

0 10 20 30

Q8

Q4

Q13

Q2

Speedup

Ship only selected rows from LI …

LI join P

LI join P

28


Repartition VS Total RuntimeRepartition VS Total Runtime

• TC = total runtime

• RC = repartition time

• Repartition time is reasonably small…

• Depends on: number of nodes + selectivities

– (can be very dependent on selection conditions of specific query)

0

100

200

300

Q8 Q9 Q14 Q19Queries Requiring Repartitioning

runt

ime

(sec

s)

TC RC

0%10%20%30%40%

Q8 Q9 Q14 Q19Queries Requiring Repartitioning

% o

verh

ead

RC/TC

29


ConclusionsConclusions

• We have analyzed a basic partitioning strategy (PRS-like)– Largest Relation is partitioned, the others are replicated

– The speedup is totally unsatisfactory for many queries

• We analyzed why this happens: explained by access patterns to

replicated relations

• We tried very simple partitioning alternative– Only very small relations are replicated

– Dimensions are partitioned by Primary Key

– Hash-partition facts, partitioning key = most frequent join attr

• We have shown that it works well – prevents very low speedup

– provides near to linear speedup for most queries

30


•Thank You!

•Questions?

• www.eden.dei.uc.pt/~pnf

• [email protected]

Documents

Experimental Evidence on Partitioning in Parallel Data Warehouses