100
CSE544 Data Management Lectures 12 Advanced Query Processing CSE 544 - Winter 2020 1

CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

CSE544Data Management

Lectures 12Advanced Query Processing

CSE 544 - Winter 2020 1

Page 2: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Announcements

• Project Milestone due on Friday

• Homework 4 posted; due next Friday

• There will be a short Homework 5, on transactions

2

Page 3: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Quick Recap

• Name 3 join processing algorithms

3

Page 4: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Outline

Algorithms for multi-joins

• AGM formula for maximum output size

• Generic-join algorithm matching that formula

4

Page 5: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Multi-join

• select * from R, S, T, … where …• Standard approach:

– Compute one join at a time– Optimizer chooses an “optimal” join order

• Issues:– Cardinality estimation is hard– Even “optimal” plan may be suboptimal

5

Page 6: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Plans Are Suboptimal

Because intermediate results are much larger than the final query answer

6

Page 7: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Example

select * -- natural joinfrom R, S, Twhere R.Y = S.Y and S.Z = T.Z and T.X = R.X

T(Z,X)

R(X,Y)

S(Y,Z)

Query plan

Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)

[Ngo’2013]

Page 8: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Example

select * -- natural joinfrom R, S, Twhere R.Y = S.Y and S.Z = T.Z and T.X = R.X

T(Z,X)

R(X,Y)

S(Y,Z)

Query planR:

X Y

0 1

0 2

0 3

... ...

0 N/2

1 0

2 0

... ...

N/2 0

N

Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)

[Ngo’2013]

Page 9: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Example

select * -- natural joinfrom R, S, Twhere R.Y = S.Y and S.Z = T.Z and T.X = R.X

T(Z,X)

R(X,Y)

S(Y,Z)

Query planR:

X Y

0 1

0 2

0 3

... ...

0 N/2

1 0

2 0

... ...

N/2 0

S: (same as R)

Y Z

0 1

0 2

0 3

... ...

0 N/2

1 0

2 0

... ...

N/2 0

T: (same as R)

Z X

0 1

0 2

0 3

... ...

0 N/2

1 0

2 0

... ...

N/2 0

N

Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)

[Ngo’2013]

Page 10: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Example

select * -- natural joinfrom R, S, Twhere R.Y = S.Y and S.Z = T.Z and T.X = R.X

T(Z,X)

R(X,Y)

S(Y,Z)

Query plan

Θ(N2) tuples

N

0 tuples

Q(X,Y,Z) = R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)

R:

X Y

0 1

0 2

0 3

... ...

0 N/2

1 0

2 0

... ...

N/2 0

S: (same as R)

Y Z

0 1

0 2

0 3

... ...

0 N/2

1 0

2 0

... ...

N/2 0

T: (same as R)

Z X

0 1

0 2

0 3

... ...

0 N/2

1 0

2 0

... ...

N/2 0

[Ngo’2013]

Page 11: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Optimal Algorithm

To define “optimal” we need to answer two questions:

Q1: How large is the output of a query?

Q2: How can we compute it in time no larger than the largest output?

11

Page 12: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ⨝ S(Y,Z), |R|, |S| ≤ N• No other info: |Output| ≤ N2

• S.Y is a key: |Output| ≤ N• S.Y has degree ≤ d: |Output| ≤ d×N

E.g. R(X,Y) ⨝ S(Y,Z) ⨝ T(Z,X)|Output| ≤ N3/2

12

Page 13: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Output| ≤ N2

• S.Y is a key: |Output| ≤ N• S.Y has degree ≤ d: |Output| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)|Output| ≤ N3/2

13

Page 14: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Output| ≤ N• S.Y has degree ≤ d: |Output| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)|Output| ≤ N3/2

14

Page 15: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Output| ≤ N• S.Y has degree ≤ d: |Output| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)|Output| ≤ N3/2

15

Page 16: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Q(D)| ≤ N• S.Y has degree ≤ d: |Output| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)|Output| ≤ N3/2

16

Page 17: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Q(D)| ≤ N• S.Y has degree ≤ d: |Output| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)|Output| ≤ N3/2

17

Page 18: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Q(D)| ≤ N• S.Y has degree ≤ d: |Q(D)| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)|Output| ≤ N3/2

18

Page 19: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Q(D)| ≤ N• S.Y has degree ≤ d: |Q(D)| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)|Output| ≤ N3/2

19

Page 20: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Worst-Case OptimalityFix input statistics for D• Runtime = O(maxD satisfies stats(|Q(D)|))

E.g. R(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Q(D)| ≤ N• S.Y has degree ≤ d: |Q(D)| ≤ d×N

E.g. R(X,Y) ∧ S(Y,Z) ∧ T(Z,X)No other info: |Q(D)| ≤ N3/2

20

Page 21: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

The Two Questions

Q1: Given statistics, what is max(|Q(D)|)?

Q2: How can we compute Q in timeO(max(|Q(D)|))?

21

Page 22: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Simple Fact #1

• Consider any query:

• Its output size is no larger than the product of all cardinalities:

22

Q(X1, ..., Xk) = R1(Vars1)∧...∧Rm(Varsm)

|Q| ≤ |R1| ×… ×|Rm| Why?

Page 23: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

• An undirected graph G = (V, E) where each edge e ∈ E is a set of two nodes

Graphs and Hypergraphs

23

ab c

d

Page 24: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

• An undirected graph G = (V, E) where each edge e ∈ E is a set of two nodes

• A hypergraph is G = (V, E) where each edge is some set (of 1 or 2 or >2 nodes)

Graphs and Hypergraphs

24

ab c

d

e

ab c

d

Page 25: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Conjunctive Queries are Hypergraphs

Q(x,y,z) = R(x,y)∧S(y,z)∧T(z,x)

Q(x,y,z) = A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

25

x

y

z

x y

zu

Page 26: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Edge Cover

• An edge cover of a (hyper)graph is a subset of edges that contain all the vertices

x

y

z

Page 27: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Edge Cover

• An edge cover of a (hyper)graph is a subset of edges that contain all the vertices

x

y

z x

y

z

Page 28: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Edge Cover

• An edge cover of a (hyper)graph is a subset of edges that contain all the vertices

x

y

z

x y

zu

x

y

z

Page 29: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Edge Cover

• An edge cover of a (hyper)graph is a subset of edges that contain all the vertices

x

y

z

x y

zu

x

y

z

x y

zu

Page 30: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Simple Fact #2

• Consider any query:

• Let R%&, R%(, … , R%* be an edge cover. Then the output size is no larger than their product:

30

Q(X1, ..., Xk) = R1(Vars1)∧...∧Rm(Varsm)

|Q| ≤ R%+ ×⋯×|R%*| Why?

Page 31: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x)

31

Page 32: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x)• Edge covers:

R(x,y)∧S(y,z) or R(x,y)∧T(z,x) or S(y,z)∧T(z,x)

32

Page 33: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x)• Edge covers:

R(x,y)∧S(y,z) or R(x,y)∧T(z,x) or S(y,z)∧T(z,x)

33

|Q| ≤ min(|R|×|S|, |R|×|T|, |S|×|T|)

Page 34: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x)• Edge covers:

R(x,y)∧S(y,z) or R(x,y)∧T(z,x) or S(y,z)∧T(z,x)

Q(x,y,z,u) = A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

34

|Q| ≤ min(|R|×|S|, |R|×|T|, |S|×|T|)

Page 35: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x)• Edge covers:

R(x,y)∧S(y,z) or R(x,y)∧T(z,x) or S(y,z)∧T(z,x)

Q(x,y,z,u) = A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)• Edge covers:

A(x,y,z)∧B(x,y,u) or A(x,y,z)∧C(x,z,u) or …

35

|Q| ≤ min(|R|×|S|, |R|×|T|, |S|×|T|)

Page 36: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x)• Edge covers:

R(x,y)∧S(y,z) or R(x,y)∧T(z,x) or S(y,z)∧T(z,x)

Q(x,y,z,u) = A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)• Edge covers:

A(x,y,z)∧B(x,y,u) or A(x,y,z)∧C(x,z,u) or …

36

|Q| ≤ min(|R|×|S|, |R|×|T|, |S|×|T|)

|Q| | ≤ min(|A|×|B|, |A|×|C|, …)

Page 37: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Edge Cover

• A fractional edge cover of a (hyper)graph is a set of non-negative numbers we, one for each edge e, such that, for every vertex v: ∑e: v∈e we ≥ 1

Page 38: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Edge Cover

• A fractional edge cover of a (hyper)graph is a set of non-negative numbers we, one for each edge e, such that, for every vertex v: ∑e: v∈e we ≥ 1

x

y

z

Page 39: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Edge Cover

• A fractional edge cover of a (hyper)graph is a set of non-negative numbers we, one for each edge e, such that, for every vertex v: ∑e: v∈e we ≥ 1

x

y

z

½

½½

Page 40: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Edge Cover

• A fractional edge cover of a (hyper)graph is a set of non-negative numbers we, one for each edge e, such that, for every vertex v: ∑e: v∈e we ≥ 1

x

y

z

x y

zu

½

½½

Page 41: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Edge Cover

• A fractional edge cover of a (hyper)graph is a set of non-negative numbers we, one for each edge e, such that, for every vertex v: ∑e: v∈e we ≥ 1

x

y

z

x y

zu

½

½½

1/3

1/3

1/3

1/3

Page 42: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Edge Cover

• A fractional edge cover of a (hyper)graph is a set of non-negative numbers we, one for each edge e, such that, for every vertex v: ∑e: v∈e we ≥ 1

• Fact: every edge cover is also a fractional edge cover. Why?

x

y

z

x y

zu

½

½½

1/3

1/3

1/3

1/3

Page 43: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Not so Simple Fact #3

• Consider any query:

• Let w1, w2, …, wm be a fractional edge cover. Then the output size is no larger than:

43

Q(X1, ..., Xk) = R1(Vars1)∧...∧Rm(Varsm)

|Q| ≤ R+ /&×⋯× R0 /1

Page 44: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

½, ½, ½ (|R|×|S|×|T|)½

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

Page 45: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

Page 46: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

Page 47: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

Page 48: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

Page 49: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

Page 50: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

min( … )1,1,0,0 |A|×|B|1,0,1,0 |A|×|C|

… …

Page 51: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

min( … )1,1,0,0 |A|×|B|1,0,1,0 |A|×|C|

… …

R(x,y)∧S(y,z)∧T(z,u)∧K(u,v)

Page 52: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

min( … )1,1,0,0 |A|×|B|1,0,1,0 |A|×|C|

… …

R(x,y)∧S(y,z)∧T(z,u)∧K(u,v)1,0,1,1 |R|×|T|×|K|1,1,0,1 |R|×|S|×|K|

Page 53: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

min( … )1,1,0,0 |A|×|B|1,0,1,0 |A|×|C|

… …

R(x,y)∧S(y,z)∧T(z,u)∧K(u,v)1,0,1,1 |R|×|T|×|K|1,1,0,1 |R|×|S|×|K|

1, ½, ½, 1 (no need; why?)

Page 54: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQuery w1, w2, …, wm |R1|w1 ×… ×|Rm|wm |Q| ≤ …

R(x,y)∧S(y,z) 1,1 |R|×|S| ≤|R|×|S|

R(x,y)∧S(y,z)∧T(z,x)

½, ½, ½ (|R|×|S|×|T|)½

≤min((|R|×|S|×|T|)½, |R|×|S|, |R|×|T|,

|S|×|T|)1,1,0 |R|×|S|1,0,1 |R|×|T|0,1,1 |S|×|T|

A(x,y,z)∧B(x,y,u)∧C(x,z,u)∧D(y,z,u)

1/3, 1/3, 1/3, 1/3 (|A|×|B|×|C|×|D|)1/3

min( … )1,1,0,0 |A|×|B|1,0,1,0 |A|×|C|

… …

R(x,y)∧S(y,z)∧T(z,u)∧K(u,v)1,0,1,1 |R|×|T|×|K|

min(|R|×|T|×|K|, |R|×|S|×|K|)1,1,0,1 |R|×|S|×|K|

1, ½, ½, 1 (no need; why?)

Page 55: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Upper Bound of a Query

55

Theorem |Q| ≤ min5&,⋯,56

𝑅+ 5&×⋯ × 𝑅8 56

This is called the AGM bound* of Q. It is tight.

Note: it suffices to consider only those fractional edge coversw1, …, wm that are not convex combinations of others

We will prove tightness on a special case.

But first, let’s discuss an algorithm for computing Q with this runtime

*Atserias, Grohe, Marx introduced this bound

Page 56: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join – Overview• Choose a variable order

• Sort every relation Ri according to this order: time is O(|Ri| log |Ri|) = Õ(|Ri| )

• Generic join assumes relations are sorted;it computes Q in time Õ(AGM(Q))

• “Worst case optimal”

56

AGM(Q) = min5&,⋯,56

𝑅+ 5&×⋯ × 𝑅8 56

Page 57: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join – The IntersectionIntersection is the main building block of G.J.

Q(x) = R(x)∧S(x)• Discuss merge-join in class – what is runtime?

57

Page 58: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join – The IntersectionIntersection is the main building block of G.J.

Q(x) = R(x)∧S(x)• Discuss merge-join in class – what is runtime?

• Edge covers of Q: 1,0 and 0,1; |Q| ≤ min(|R|, |S|)

58

Page 59: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join – The IntersectionIntersection is the main building block of G.J.

Q(x) = R(x)∧S(x)• Discuss merge-join in class – what is runtime?

• Edge covers of Q: 1,0 and 0,1; |Q| ≤ min(|R|, |S|)• Discuss improved merge-join in class

Runtime: Õ(min(|R|, |S|))

59

Page 60: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join Algorithm

60

Let x be the first variableLet Ri1, Ri2, … be all relations containing xCompute D = 𝛱x(Ri1) ∩ 𝛱x(Ri2) ∩ …for every value v ∈ D do:

Compute Q,where Ri1, Ri2, … are restricted to x = v

needs tobe done in timeÕ(minj Πx(Rj))

Page 61: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),Assume |R|=|S|=|T|=N, then:

A = Πx(R(x,y)) ∩ Πx(T(z,x))for a in A do

/* compute Q(a,y,z) = R(a,y)∧S(y,z)∧T(z,a) */B = Πy(R(a,y)) ∩ Πy(S(y,z)) for b in B do/* compute Q(a,b,z) = R(a,b)∧S(b,z)∧T(z,a) */

C = Πz(S(b,z)) ∩ Πz(T(z,a))for c in C do

output (a,b,c)

61

|Q| ≤ N3/2

Page 62: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),Assume |R|=|S|=|T|=N, then:

A = Πx(R(x,y)) ∩ Πx(T(z,x))for a in A do

/* compute Q(a,y,z) = R(a,y)∧S(y,z)∧T(z,a) */B = Πy(R(a,y)) ∩ Πy(S(y,z)) for b in B do/* compute Q(a,b,z) = R(a,b)∧S(b,z)∧T(z,a) */

C = Πz(S(b,z)) ∩ Πz(T(z,a))for c in C do

output (a,b,c)

62

|Q| ≤ N3/2

Page 63: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),Assume |R|=|S|=|T|=N, then:

A = Πx(R(x,y)) ∩ Πx(T(z,x))for a in A do

/* compute Q(a,y,z) = R(a,y)∧S(y,z)∧T(z,a) */B = Πy(R(a,y)) ∩ Πy(S(y,z)) for b in B do/* compute Q(a,b,z) = R(a,b)∧S(b,z)∧T(z,a) */

C = Πz(S(b,z)) ∩ Πz(T(z,a))for c in C do

output (a,b,c)

63

|Q| ≤ N3/2

Page 64: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),Assume |R|=|S|=|T|=N, then:

A = Πx(R(x,y)) ∩ Πx(T(z,x))for a in A do

/* compute Q(a,y,z) = R(a,y)∧S(y,z)∧T(z,a) */B = Πy(R(a,y)) ∩ Πy(S(y,z)) for b in B do/* compute Q(a,b,z) = R(a,b)∧S(b,z)∧T(z,a) */

C = Πz(S(b,z)) ∩ Πz(T(z,a))for c in C do

output (a,b,c)

64

|Q| ≤ N3/2

Page 65: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Generic Join ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),Assume |R|=|S|=|T|=N, then:

A = Πx(R(x,y)) ∩ Πx(T(z,x))for a in A do

/* compute Q(a,y,z) = R(a,y)∧S(y,z)∧T(z,a) */B = Πy(R(a,y)) ∩ Πy(S(y,z)) for b in B do/* compute Q(a,b,z) = R(a,b)∧S(b,z)∧T(z,a) */

C = Πz(S(b,z)) ∩ Πz(T(z,a))for c in C do

output (a,b,c)

65

Runtime:Õ(N3/2)

|Q| ≤ N3/2

Page 66: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Discussion• All relations need to be presorted, or

indexed

• Runtime is guaranteed to be worst-case optimal, no matter what variable order we choose

• In practice, the variable order does matter;in class: discuss R(x,y)∧S(y,z)

66

Page 67: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Comparison to Naïve Nested LoopNaïve nested loop:

// tuple at a time:For t1 in R1 dofor t2 in R2 do…

// value at a time:For x in Domain doFor y in Domain do

For z in Domain do...

Generic-join

A = ∩ domains for xFor x in A do

B = ∩ domains for yFor y in B do

C = ∩ domains for zFor z in C do...

67

Page 68: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Comparison to Naïve Nested LoopNaïve nested loop:

// tuple at a time:For t1 in R1 dofor t2 in R2 do…

// value at a time:For x in Domain doFor y in Domain do

For z in Domain do...

Generic-join

A = ∩ domains for xFor x in A do

B = ∩ domains for yFor y in B do

C = ∩ domains for zFor z in C do...

68

Page 69: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Comparison to Naïve Nested LoopNaïve nested loop:

// tuple at a time:For t1 in R1 dofor t2 in R2 do…

// value at a time:For x in Domain doFor y in Domain do

For z in Domain do...

Generic-join

A = ∩ domains for xFor x in A do

B = ∩ domains for yFor y in B do

C = ∩ domains for zFor z in C do...

69

Page 70: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Tightness

• There exists instances R1, R2, … such that the size of the query’s output is AGM(Q)

• Proof is simple and instructive; we will show for special case |R1| = … = |Rm| = N

• In this case AGM(Q) = Nmin(w1+…+wm)

70

Page 71: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Edge Covering Number

• The fractional edge covering number of a hypergraph is 𝜌* = min ∑e we , where the minimum is over all fractional edge covers of the hypergraph.

71

Fact Assume |R1| = … = |Rm| = N. Then AGM(Q)= N𝜌*

Why?

Page 72: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

Page 73: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1Fact For any v, w: ∑x vx ≤ ∑e we

Page 74: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 75: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 76: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

𝜌* = 1

1 1

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 77: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

01 1

𝜌* = 1

1 1

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 78: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

01 1

𝜌* = 1

1 1

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 79: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

𝜌* = 3/2

01 1

𝜌* = 1

1 1

½

½

½

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 80: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

𝜌* = 3/2½

½½

01 1

𝜌* = 1

1 1

½

½

½

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 81: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

𝜌* = 3/2½

½½

01 1

𝜌* = 1

1 1

½

½

½

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 82: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

𝜌* = 3/2½

½½

01 1

𝜌* = 1 𝜌* = 3

1 1

½

½

½

1 110

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 83: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

Fractional Vertex Packing

• A fractional vertex packing of a (hyper)graph is a set of non-negative numbers vx, one for each node x, such that, for every edge e: ∑x: x∈e vx ≤ 1

𝜌* = 3/2½

½½

01 1

𝜌* = 1 𝜌* = 3

1 0 1 0 1

1 1

½

½

½

1 110

Theorem maxv ∑x vx = 𝜌* = minw ∑e we

Fact For any v, w: ∑x vx ≤ ∑e we

Page 84: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

The Bound is Tight

Fact Fix a fractional vertex packing v = 𝑣< <∈=>?@A. Then there exists a database such thatR1| ≤ N, …, |Rm| ≤ N and 𝑄 = 𝑁EF GF

Page 85: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

The Bound is Tight

Proof. For every relation Rj with variables 𝑥I&, 𝑥I(, …define the instance 𝑅J = 𝑁GK& × 𝑁GK( ×⋯where [k] = {1,2,…,k}. Then:(a) 𝑅J = 𝑁GK&OGK(O⋯ ≤ 𝑁 (why?)(b) 𝑄 = 𝑁EF GF(why?)

Fact Fix a fractional vertex packing v = 𝑣< <∈=>?@A. Then there exists a database such thatR1| ≤ N, …, |Rm| ≤ N and 𝑄 = 𝑁EF GF

Page 86: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

The Bound is Tight

Proof. For every relation Rj with variables 𝑥I&, 𝑥I(, …define the instance 𝑅J = 𝑁GK& × 𝑁GK( ×⋯where [k] = {1,2,…,k}. Then:(a) 𝑅J = 𝑁GK&OGK(O⋯ ≤ 𝑁 (why?)(b) 𝑄 = 𝑁EF GF(why?)

Fact Fix a fractional vertex packing v = 𝑣< <∈=>?@A. Then there exists a database such thatR1| ≤ N, …, |Rm| ≤ N and 𝑄 = 𝑁EF GF

Page 87: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

The Bound is Tight

Proof. For every relation Rj with variables 𝑥I&, 𝑥I(, …define the instance 𝑅J = 𝑁GK& × 𝑁GK( ×⋯where [k] = {1,2,…,k}. Then:(a) 𝑅J = 𝑁GK&OGK(O⋯ ≤ 𝑁 (why?)(b) 𝑄 = 𝑁EF GF(why?)

Fact Fix a fractional vertex packing v = 𝑣< <∈=>?@A. Then there exists a database such thatR1| ≤ N, …, |Rm| ≤ N and 𝑄 = 𝑁EF GF

Page 88: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),• Assume |R|=|S|=|T|=N.• Optimal vertex packing: vx = vy = vz = 1/2• Define: Dx = [N½] = {1, 2, …, N½}

Dy = [N½]Dz = [N½]

• Define R = Dx×Dy, S = Dy×Dz, T = Dz×Dx. • Then |R| = |S| = |T| = N,

Q = Dx×Dy× Dz and |Q| = N3/2

88

Page 89: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),• Assume |R|=|S|=|T|=N.• Optimal vertex packing: vx = vy = vz = 1/2• Define: Dx = [N½] = {1, 2, …, N½}

Dy = [N½]Dz = [N½]

• Define R = Dx×Dy, S = Dy×Dz, T = Dz×Dx. • Then |R| = |S| = |T| = N,

Q = Dx×Dy× Dz and |Q| = N3/2

89

Page 90: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),• Assume |R|=|S|=|T|=N.• Optimal vertex packing: vx = vy = vz = 1/2• Define: Dx = [N½] = {1, 2, …, N½}

Dy = [N½]Dz = [N½]

• Define R = Dx×Dy, S = Dy×Dz, T = Dz×Dx. • Then |R| = |S| = |T| = N,

Q = Dx×Dy× Dz and |Q| = N3/2

90

Page 91: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),• Assume |R|=|S|=|T|=N.• Optimal vertex packing: vx = vy = vz = 1/2• Define: Dx = [N½] = {1, 2, …, N½}

Dy = [N½]Dz = [N½]

• Define R = Dx×Dy, S = Dy×Dz, T = Dz×Dx. • Then |R| = |S| = |T| = N,

Q = Dx×Dy× Dz and |Q| = N3/2

91

Page 92: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExampleQ(x,y,z) = R(x,y)∧S(y,z)∧T(z,x),• Assume |R|=|S|=|T|=N.• Optimal vertex packing: vx = vy = vz = 1/2• Define: Dx = [N½] = {1, 2, …, N½}

Dy = [N½]Dz = [N½]

• Define R = Dx×Dy, S = Dy×Dz, T = Dz×Dx. • Then |R| = |S| = |T| = N,

Q = Dx×Dy× Dz and |Q| = N3/2

92

Page 93: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

KeysR(X,Y) ∧ S(Y,Z), |R|, |S| ≤ N• No other info: |Q(D)| ≤ N2

• S.Y is a key: |Q(D)| ≤ N

The Query Expansion method:• If Y is a key in some relation S, then add

all attributes of S relations containing Y• Compute AGM(Qexpanded)

93

Page 94: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(X,Y,Z) = R(X,Y)∧S(Y,Z)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z),• Edge cover: 1,0• AGM(Qexp) = |R|

Q(X,Y,Z) = R(X,Y)∧S(Y,Z)∧T(Z,X)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z)∧T(Z,X)• Edge covers: 1,0,0 or 0,1,1• AGM(Qexp) = min(|R|, |S|×|T|)

94

Page 95: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(X,Y,Z) = R(X,Y)∧S(Y,Z)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z),• Edge cover: 1,0• AGM(Qexp) = |R|

Q(X,Y,Z) = R(X,Y)∧S(Y,Z)∧T(Z,X)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z)∧T(Z,X)• Edge covers: 1,0,0 or 0,1,1• AGM(Qexp) = min(|R|, |S|×|T|)

95

Page 96: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(X,Y,Z) = R(X,Y)∧S(Y,Z)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z),• Edge cover: 1,0• AGM(Qexp) = |R|

Q(X,Y,Z) = R(X,Y)∧S(Y,Z)∧T(Z,X)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z)∧T(Z,X)• Edge covers: 1,0,0 or 0,1,1• AGM(Qexp) = min(|R|, |S|×|T|)

96

Page 97: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(X,Y,Z) = R(X,Y)∧S(Y,Z)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z),• Edge cover: 1,0• AGM(Qexp) = |R|

Q(X,Y,Z) = R(X,Y)∧S(Y,Z)∧T(Z,X)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z)∧T(Z,X)• Edge covers: 1,0,0 or 0,1,1• AGM(Qexp) = min(|R|, |S|×|T|)

97

Page 98: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(X,Y,Z) = R(X,Y)∧S(Y,Z)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z),• Edge cover: 1,0• AGM(Qexp) = |R|

Q(X,Y,Z) = R(X,Y)∧S(Y,Z)∧T(Z,X)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z)∧T(Z,X)• Edge covers: 1,0,0 or 0,1,1• AGM(Qexp) = min(|R|, |S|×|T|)

98

Page 99: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

ExamplesQ(X,Y,Z) = R(X,Y)∧S(Y,Z)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z),• Edge cover: 1,0• AGM(Qexp) = |R|

Q(X,Y,Z) = R(X,Y)∧S(Y,Z)∧T(Z,X)• Qexp(X,Y,Z) = R(X,Y,Z)∧S(Y,Z)∧T(Z,X)• Edge covers: 1,0,0 or 0,1,1• AGM(Qexp) = min(|R|, |S|×|T|)

99

Page 100: CSE544 Data Management...Announcements •Project Milestone due on Friday •Homework 4 posted; due next Friday •There will be a short Homework 5, on transactions 2

SummaryGiven cardinalities of all input tables:• AGM bound gives upper bound on query size• GJ computes the query in this time

Generic Join:• A nested loop algorithm• No longer one-join-at-a-time• Theoretical optimality means it will be

efficient for very expensive queries;less so for cheaper queries

100