44
CSE 303 Database Ashikur Rahman CSE 303: Database Lecture 11 Chapter 2: The Relational Algebra

Chapter 2: The Relational Algebrateacher.buet.ac.bd/ashikur/CSE303/CSE_303_Lec_11_Relational... · Chapter 2: The Relational Algebra. ... (A1 B1,…,An Bn)(R) ... •Find the model

  • Upload
    lamtu

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

CSE 303: Database

Lecture 11

Chapter 2: The Relational Algebra

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

RDBMS Architecture

How does a SQL engine work ?

SQL Relational

OptimizedSQL Query

Algebra (RA) Plan

OptimizedRA Plan

Execution

Declarative query (from user)

Translate to relational algebra expression

Find logically equivalent- but more efficient-RA expression

Execute each operator of the optimized plan!

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

RDBMS Architecture

How does a SQL engine work ?

SQL Relational

OptimizedSQL Query

Algebra (RA) Plan

OptimizedRA Plan

Execution

Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions!

Relational Algebra allows us to translate declarative (SQL) queries into precise and optimizable expressions!

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

• Five basic operators:

1. Selection: s

2. Projection: P

3. Cartesian Product:

Relational Algebra (RA)

We’ll look at these first!We’ll look at these first!

4. Union:

5. Difference: -

• Derived or auxiliary operators:

• Intersection

• Joins (natural,equi-join, theta join)

• Renaming:

And also at one example of a derived operator (natural join) and a special operator (renaming)

And also at one example of a derived operator (natural join) and a special operator (renaming)

Then we’ll look see these set operationsThen we’ll look see these set operations

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Keep in mind: RA operates on sets!

• RDBMSs use Bags (multisets), however in relational algebra formalism we will consider sets!

• Also: we will consider the named perspective, where every attribute • Also: we will consider the named perspective, where every attribute must have a unique name

• attribute order does not matter…

Now on to the basic RA operators…Now on to the basic RA operators…

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

• Returns all tuples which satisfy a condition

• Notation: sc(R)

• Examples

SELECT *FROM StudentsWHERE gpa > 3.5;

SQL:

Students(sid,sname,gpa)

• Examples

• sSalary > 40000 (Employee)

• sname = ‘Smith’ (Employee)

• The condition c can use =, <, , >, , <>RA:

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SSN Name Salary

1234545 John 200000

5423341 Smith 600000

4352342 Fred 500000

Another example:

sSalary > 40000 (Employee)

SSN Name Salary

5423341 Smith 600000

4352342 Fred 500000

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

• Eliminates columns, then removes duplicates

• Notation: P A1,…,An (R)

• Example: project social-security

SELECT DISTINCTsname,gpa

FROM Students;

SQL:

Students(sid,sname,gpa)

• Example: project social-security number and names:

• P SSN, Name (Employee)

• Output schema: Answer(SSN, Name)

FROM Students;

RA:

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SSN Name Salary

1234545 John 200000

5423341 John 600000

4352342 John 200000

Another example:

P Name,Salary (Employee)

Name Salary

John 200000

John 600000

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Note that RA Operators are Compositional!

SELECT DISTINCTsname,

Students(sid,sname,gpa)

sname,gpa

FROM StudentsWHERE gpa > 3.5;

How do we represent this query in RA?How do we represent this query in RA? Are these logically equivalent?Are these logically equivalent?

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

• Each tuple in R1 with each tuple in R2

• Notation: R1 R2

• Example: • Employee Dependents

SELECT *FROM Students, People;

SQL:

Students(sid,sname,gpa)People(ssn,pname,address)

• Employee Dependents

• Rare in practice; mainly used to express joins RA:

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

ssn pname address

1234545 John 216 Rosse

5423341 Bob 217 Rosse

sid sname gpa

001 John 3.4

002 Bob 1.3

People StudentsAnother example:

ssn pname address sid sname gpa

1234545 John 216 Rosse 001 John 3.4

5423341 Bob 217 Rosse 001 John 3.4

1234545 John 216 Rosse 002 Bob 1.3

5423341 Bob 216 Rosse 002 Bob 1.3

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

• Changes the schema, not the instance

• A ‘special’ operator- neither basic nor derived

• Notation: (R)

SELECTsid AS studId,sname AS name,gpa AS gradePtAvg

SQL:

Students(sid,sname,gpa)

• Notation: S(B1,…,Bn) (R)

• Note: this is shorthand for the proper form (since names, not order matters!):

• S(A1B1,…,AnBn) (R)

gpa AS gradePtAvgFROM Students;

RA:)(),,( StudentsgradePtAvgnamestudIdStudents

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

sid sname gpa

001 John 3.4

002 Bob 1.3

StudentsAnother example:

)(Students

studId name gradePtAvg

001 John 3.4

002 Bob 1.3

Students

)(),,( StudentsgradePtAvgnamestudIdStudents

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SELECT DISTINCTssid, S.name, gpa,ssn, address

FROM

SQL:

Students(sid,name,gpa)People(ssn,name,address)

FROM Students S,People P

WHERE S.name = P.name;

RA:

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

ssn P.name address

1234545 John 216 Rosse

5423341 Bob 217 Rosse

sid S.name gpa

001 John 3.4

002 Bob 1.3

People PStudents SAnother example:

sid S.name gpa ssn address

001 John 3.4 1234545 216 Rosse

002 Bob 1.3 5423341 216 Rosse

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Natural Join

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Example: Converting SFW Query -> RA

SELECT DISTINCTgpa,

Students(sid,sname,gpa)People(ssn,sname,address)

gpa,address

FROM Students S,People P

WHERE gpa > 3.5 ANDS.sname = P.sname;

How do we represent this query in RA?How do we represent this query in RA?

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Advanced Relational Algebra

1. Set Operations in RA

2. Fancier RA

Lecture 16 > Section 2

2. Fancier RA

3. Extensions & Limitations

19

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

• Five basic operators:

1. Selection: s

2. Projection: P

3. Cartesian Product:

4. Union:

Relational Algebra (RA)

4. Union:

5. Difference: -

• Derived or auxiliary operators:

• Intersection

• Joins (natural,equi-join, theta join, semi-join)

• Renaming:

We’ll look at theseWe’ll look at these

And also at some of these derived operators

And also at some of these derived operators

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

1. Union () and 2. Difference (–)

• R1 R2

• Example: • ActiveEmployees RetiredEmployees

R1 R2

• R1 – R2

• Example:• AllEmployees - RetiredEmployees R1 R2

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

What about Intersection () ?

• It is a derived operator

• R1 R2 = ?

• R1 R2 = R1 – (R1 – R2)

• Example

R1 R2

• Example• UnionizedEmployees RetiredEmployees

R1 R2

R1 – R2

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Fancier RA

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SELECT *FROM Students,People

WHERE q;

SQL:

Students(sid,sname,gpa)People(ssn,pname,address)

WHERE q;

RA:

Note that natural join is a theta join + a projection.Note that natural join is a theta join + a projection.

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SELECT *FROM Students S,People P

SQL:

Students(sid,sname,gpa)People(ssn,pname,address)

People PWHERE sname = pname;

RA:

Most common join in practice!Most common join in practice!

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Optimization

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Logical Equivalece of RA Plans

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

RDBMS Architecture

How does a SQL engine work ?

SQL Relational

OptimizedSQL Query

Algebra (RA) Plan

OptimizedRA Plan

Execution

We’ll get a flavor of how to optimize on these plans now

We’ll get a flavor of how to optimize on these plans now

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Note: We can visualize the plan as a tree

R(A,B) S(B,C)

Bottom-up tree traversal = order of operation execution! Bottom-up tree traversal = order of operation execution!

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

A simple plan

What SQL query does this correspond to?What SQL query does this correspond to?

Are there any logically Are there any logically

R(A,B) S(B,C)

Are there any logically equivalent RA expressions?

Are there any logically equivalent RA expressions?

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

“Pushing down” projection

R(A,B) S(B,C)R(A,B) S(B,C)

Why might we prefer this plan?Why might we prefer this plan?

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Takeaways

• This process is called logical optimization

• Many equivalent plans used to search for “good plans”

• Relational algebra is an important abstraction.

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

RA commutators

• The basic commutators:• Push projection through (1) selection, (2) join

• Push selection through (3) selection, (4) projection, (5) join

• Also: Joins can be re-ordered!Also: Joins can be re-ordered!

This simple set of tools allows us to greatly improve the execution time of queries by optimizing RA plans!This simple set of tools allows us to greatly improve

the execution time of queries by optimizing RA plans!

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Optimizing the SFW RA Plan

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

sA<10

SELECT R.A, T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C

R(A,B) S(B,C) T(C,D)

Translating to RA

R(A,B) S(B,C)

T(C,D)

AND S.C = T.CAND R.A < 10;

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Logical Optimization

• Heuristically, we want selections and projections to occur as early as possible in the plan

• Terminology: “push down selections” and “pushing down projections.”

• Intuition: We will have fewer tuples in a plan.

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

sA<10

SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C

R(A,B) S(B,C) T(C,D)

Optimizing RA Plan Push down selection on A so it occurs earlier

Push down selection on A so it occurs earlier

R(A,B) S(B,C)

T(C,D)

AND S.C = T.CAND R.A < 10;

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C

R(A,B) S(B,C) T(C,D)

Optimizing RA Plan Push down selection on A so it occurs earlier

Push down selection on A so it occurs earlier

R(A,B)

S(B,C)

T(C,D)

AND S.C = T.CAND R.A < 10;

sA<10

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C

R(A,B) S(B,C) T(C,D)

Optimizing RA Plan Push down projection so it occurs earlier

Push down projection so it occurs earlier

R(A,B)

S(B,C)

T(C,D)

AND S.C = T.CAND R.A < 10;

sA<10

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

SELECT R.A,T.DFROM R,S,TWHERE R.B = S.BAND S.C = T.C

R(A,B) S(B,C) T(C,D)

Optimizing RA Plan We eliminate B earlier!We eliminate B earlier!

In general, when is an attribute not needed…?

In general, when is an attribute not needed…?

R(A,B)

S(B,C)

T(C,D)

AND S.C = T.CAND R.A < 10;

sA<10

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Example (PPLP)

• What PC models have a speed of at least 3.00?

• Which manufacturers make laptops with a hard disk of at least

Answer(model) := Pmodel(sspeed >= 3.00 (PC))

• Which manufacturers make laptops with a hard disk of at least 100GB.

Answer(maker) := Pmaker(shd >= 100 (Product Laptop))

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Example (PPLP)

• Find the model numbers of all color laser printers?

• Find those manufacturer that sells Laptops but not PC’s.

Pmodel(scolor=‘T’ AND type = ‘laser’ (Printer))

• Find those manufacturer that sells Laptops but not PC’s.

Pmaker(stype=‘laptop’ (Product)) - Pmaker(stype=‘pc’ (Product))

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Example (PPLP)

• Find those hard disk sizes that occur two or more PC’s?

• Find those pairs of PC models that have both the same speed and RAM. A

Phd(spc1.model < > pc2.model AND pc1.hd = pc2.hd (ρpc1 (PC) ρpc2 (PC)))

• Find those pairs of PC models that have both the same speed and RAM. A pair should be listed only once.

Ppc1.model,pc2.model(spc1.model < pc2.model AND pc1.speed = pc2.speed AND pc1.ram = pc2.ram (ρpc1 (PC) ρpc2 (PC)))

CSE 303 Database Ashikur RahmanCSE 303 Database Ashikur Rahman

Example (PPLP)

• Find the manufacturer(s) of the PC or Laptop with highest available speed?

R1:= Pmodel,speed(sspeed >= 2.0 (PC) sspeed >= 2.0 (Laptop))

Pmaker, model (Product) - Ppdpl1.maker, pdpl1.model(spdpl1.speed > pdpl2.speed (ρpdpl1 (Product

R1 ) ρpdpl2 (Product R1)))