6
Presenter: Date: Note: Company: eMail: Twitter: Juan-José van der Linden June 5, 2014 DV, MPP QOSQO [email protected] @delostilos

Ensemble model and mpp

Embed Size (px)

DESCRIPTION

A 5 minute presentation on how to choose the distribution key for a data vault modelled data warehouse on a MPP database.

Citation preview

Page 1: Ensemble model and mpp

Presenter:

Date: Note:

Company:

eMail:

Twitter:

Juan-José van der Linden June 5, 2014 DV, MPP

QOSQO

[email protected]

@delostilos

Page 2: Ensemble model and mpp

SMP => MPP => AMPP

SMP

Symmetric Processing

MPP

Massively

Parallel

Processing

AMPP

Asymmetric MPP

( SMP + MPP)

Page 3: Ensemble model and mpp

Primary key => distribution key

hub -< satellite join

- data redistribution

- join local in parallel

BK SID

Ensemble 1

Dimensional 2

SID LDTS INFO

1 2001-01-01 My first DV

1 2014-06-05 DV Masters

2 1997-08-02 DM manifesto

Node 1

Node 2

Page 4: Ensemble model and mpp

Hub SID => distribution key

hub -< satellite join

- join local in parallel

BK SID

Ensemble 1

Dimensional 2

SID LDTS INFO

1 2001-01-01 First DV

1 2014-06-05 DV Masters

2 1997-08-02 DM manifesto

Node 1

Node 2

Page 5: Ensemble model and mpp

Link SID => distribution key

Default L_SID, 1:N & N:M

- data redistribution

- join local in parallel

H_MID H_SID L_SID

1 A 1

1 B 2

L_SID LDTS LDTS_END CURRENT

1 2001-01-01 2006-01-01 N

1 2014-06-05 9999-12-31 Y

2 2006-01-01 2014-06-05 N

H_MID H_SID L_SID

1 A 1

1 B 2

L_SID H_MID H_SID LDTS LDTS_END

1 1 A 2001-01-01 2006-01-01

1 1 B 2014-06-05 9999-12-31

2 1 A 2006-01-01 2014-06-05

1:N => H_MID on link satellite

- join local in parallel

H_MID is the ensemble identifier !

Node 1

Node 2

Page 6: Ensemble model and mpp

Use the ensemble identifier if possible!

H_SID H_SID LDTS INFO

L_SID? H_SID H_MID H_SID ? L_SID ? LDTS INFO

Distributing data efficiently to ensure good performance in a MPP database.

- If uneven distribution, one node may become a bottleneck for the whole execution

Try to minimize data movement between nodes

- Data redistribution may occur when joining tables

Ensemble