27
Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

  • Upload
    buique

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

Databases 2012

The Relational Algebra

Christian S. Jensen

Computer Science, Aarhus University

Page 2: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

2 The Relational Algebra

What is an Algebra?

An algebra consists of

• values

• operators

• rules

Closure: operations yield values

Examples

• integers with +, ,

• sets with , , \,

• matrices with +, ,

• functions with , O , -1

• relations with query operators

Page 3: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

3 The Relational Algebra

Mathematical Relations

An n-ary relation on a set S is a subset of Sn

Examples

• is a binary relation on R, a subset of RR

{ (1.2, 3.4), (34, 117.363), (53, 0.1234), ... }

• divides is a binary relation on N, a subset of NN

{ (2, 4), (3, 9), (3, 12), (17, 34), (1237, 21029), ... }

• negative is a binary relation on N, a subset of NN

{ (3,-3), (-17,17), (0,0), (2, -2), (-2,2), (87, -87), ...}

• sum is a ternary relation on N, a subset of NNN

{ (3,5,8), (23,14,37), (0,123,123), (42,87,129), ... }

• married to is a binary relation on people

{ (Hillary, Bill), (Bill, Hillary), (Angelina, Brad), ... }

Page 4: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

4 The Relational Algebra

Tables as Relations

A database relation on a data set D consists of

• a schema of attribute names (a1, a2, ..., an)

• a finite n-ary relation on D, a subset of Dn

A relation is like a table where

• all columns have the same generic type

• no duplicates are allowed

• no other constraints are imposed

We implicitly allow permutations of the attributes

Page 5: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

Database relations form an algebra with the operators

• union:

• intersection:

• difference: \

• projection:

• renaming:

• selection:

• Cartesian product:

• natural join: ⋈

These provide an abstract model of database queries

5 The Relational Algebra

Relational Operators

Page 6: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

6 The Relational Algebra

Union, Intersection, Difference

The arguments must have the same schema

The result has again that schema

R S

R S

R \ S

They compute the set operations on the relations

Page 7: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

7 The Relational Algebra

(R)

Assume the schema of R is (a1,...,an,b1,...,bm)

The schema of the result is (a1,...,an)

The result relation is

{ (d1, ..., dn) | (d1, ..., dn+m) R }

Projection

a1,...,an

Page 8: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

8 The Relational Algebra

Renaming

ab(R)

The name a must occur as ai in the schema of R

The name b must not occur in the schema of R

Schema of the result: (a1, ..., ai-1, b, ai+1, ..., an)

The result relation is unchanged

ab,cd,ef(R) = ab(cd(ef(R)))

Page 9: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

9 The Relational Algebra

Selection

C(R)

C is a condition of the attributes of R

The resulting schema is unchanged

The relation part is: { r | r R C(r) }

Page 10: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

10 The Relational Algebra

Cartesian Product

R S

Assume

• R has schema (a1, ..., am)

• S has schema (b1, ..., bn)

The new schema is (a1, ..., am, b1, ..., bn)

The relation part is

{ (c1, ..., cm+n) | (c1, ..., cm) R (cm+1, ..., cm+n) S }

Page 11: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

11 The Relational Algebra

Natural Join

R ⋈ S

Assume

• R has schema (a1, ..., ak, c1, ..., cn)

• S has schema (c1, ..., cn, b1, ..., bm)

• {ai} {bi} =

The new schema is (a1, ..., ak, c1, ..., cn, b1, ..., bm)

The relation part is

{ (d1, ..., dk, e1, ..., en, f1, ..., fm) |

(d1, ..., dk, e1, ..., en) R (e1, ..., en, f1, ..., fm) S }

Page 12: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

R ⋈ S = R ∩ S = R ∖ (R ∖ S)

• when the schemas are identical

R ⋈ S = R ⨉ S

• when the schemas are disjoint

R ⋈𝚹 S = 𝚹(R ⨉ S)

• the theta join

SELECT DISTINCT X1, …, Xk

FROM R1, …, Rn

WHERE C

= x1, …, xk(C (R1 ⨉… ⨉ Rn)

12 The Relational Algebra

Derived Operators

Page 13: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

In which meetings do the owners participate?

what,meetid(status=’a’(

owneruserid(Meetings) ⋈ piduserid(Participants)))

13 The Relational Algebra

Query Trees

owneruserid piduserid

Meetings Participants

what,meetid

status=’a’

Page 14: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

14 The Relational Algebra

Limitations

The relational algebra cannot answer all queries

Flights

Which cities can be reached from Copenhagen

in one or more flights?

from to

Copenhagen Madrid

Rome London

Madrid Athens

Athens Rome

... ...

Page 15: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

15 The Relational Algebra

Transitive Closure

The transitive closure of a binary relation R

R = { (x1,xk) | x1,...,xk-1 ((xi,xi+1) R) }

No relational algebra expression computes R

No SQL query can handle it either

• unless SQL is extended with recursion

• or a special closure operator is added

• (some DBMSs do support this)

Page 16: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

x x = x idempotence

x y = y x commutativity

x ⋈ x = x idempotence

x ⋈ y = y ⋈ x commutativity

x (y z) = (x y) z associativity

x ⋈ (y ⋈ z) = (x ⋈ y) ⋈ z associativity

x ⋈ (y z) = (x ⋈ y) (x ⋈ z) distributivity

16 The Relational Algebra

Algebraic Laws (1/3)

Page 17: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

17 The Relational Algebra

Algebraic Laws (2/3)

C(x y) = C(x) C(y) distributivity

C(x \ y) = C(x) \ C(y) = C(x) \ y distributivity

C(x ⋈ y) = C(x) ⋈ C(y) distributivity

C(x y) = C(x) C(y) distributivity

C(x) = C(C(x)) idempotence

C(D(x)) = D(C(x)) commutativity

CD(x) = C(D(x)) = C(x) ⋈ D(x) splitting

CD(x) = C(x) D(x) splitting

C(x) = x \ C(x) splitting

Page 18: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

a(x y) = a(x) a(y) distributivity

(does not hold for ∖ and ∩)

ab(x y) = ab(x) ab(y) distributivity

ab(x \ y) = ab(x) \ ab(y) distributivity

bc(ab(x)) = ac(x) cancellation

ab(cd(x)) = cd(ab(x)) commutativity

18 The Relational Algebra

Algebraic Laws (3/3)

Page 19: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

19 The Relational Algebra

Zero and Unit

Define 0 = the empty relation (for each schema)

Define 1 as follows

• the schema is empty

• the relation contains the single empty row

0 x = x 0 = x

0 ⋈ x = x ⋈ 0 = 0

1 ⋈ x = x ⋈ 1 = x

Page 20: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

20 The Relational Algebra

Division

Page 21: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

21 The Relational Algebra

Division Example

Completed dDB student task

Fred Database1

Fred Database2

Fred Compiler1

Eugene Database1

Eugene Compiler1

Eugene Compiler2

Sara Database1

Sara Database2

John Usability1

task

Database1

Database2

CompleteddDB student

Fred

Sara

Those students that have

completed all the dDB tasks

Page 22: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

22 The Relational Algebra

Algebraic Query Optimization

Rewritings may improve efficiency

(A ⋈ B) ⋈ C A ⋈ (B ⋈ C)

C(A B) C(A) C(B)

Depends on the predicates (selectivities) and the

specific instances

Page 23: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

23 The Relational Algebra

Algebraic Query Optimization

10 rows 106 rows 106 rows

1012 rows 10 rows

Rewritings may improve efficiency:

(A ⋈ B) ⋈ C A ⋈ (B ⋈ C)

C(A B) C(A) C(B)

Depends on the predicates (selectivities) and the

specific instances

Page 24: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

24 The Relational Algebra

Rules of Thumb

Push selections down the expressions tree

Push projections down the expression tree

Order joins based on size estimates

In general, search for a good expression tree

• use heuristics

• use statistics: table sizes, distinct values for attributes,

histograms, etc.

Page 25: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

25 The Relational Algebra

Bag Algebra

Allows relations to contain duplicate entries

Sets are replaced by bags

The bag versions of , , and \ count copies

The bag versions of , , and ⋈ keep duplicates

A better match with real-life SQL than sets

Does still not account for the ordering of the tuples

• SQL offers some support for ordering

• Tuples in a relation are stored on disk in some order

Page 26: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

26 The Relational Algebra

Algebraic Laws for Bags

Fewer algebraic laws are valid for the bag algebra

Counter examples

x (y z) = (x y) (x z)

CD(x) = C(x) D(x)

Beware when optimizing bag queries!

Page 27: Databases 2012 The Relational Algebra - Computer …csj/files/dDB/slides/algebra-2012-3.pdf · Databases 2012 The Relational Algebra Christian S. Jensen Computer Science, Aarhus University

27 The Relational Algebra

Algebraic Laws for Bags

Fewer algebraic laws are valid for the bag algebra

Counter examples

x (y z) = (x y) (x z)

CD(x) = C(x) D(x)

Beware when optimizing bag queries!

a

42 x,y,z =

C,D = true