428
Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or otherwise, without the explicit written permission of the copyright owner. How to Write Correct SQL and Know It: A Relational Approach to SQL a technical seminar for DBAs, data architects, DBMS implementers, database application programmers, and other database professionals by C. J. Date

Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Embed Size (px)

Citation preview

Page 1: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or otherwise, without the explicit written permission of the copyright owner.

How to Write Correct SQL and Know It:

A Relational Approach to SQL

a technical seminar for DBAs, data architects,DBMS implementers, database application programmers,

and other database professionals

by

C. J. Date

Page 2: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 2

THESIS :

1. You’re an SQL professional

2. But SQL is complicated and difficult (much more so than SQL advocates would have you believe)

3. And testing can never be exhaustive

4. So to have a hope of writing correct SQL, you must follow some discipline

5. Q: What discipline?A: Discipline of using SQL relationally

6. So you must know relational theory thoroughly too(as well as SQL itself)

Page 3: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 3

USING SQL RELATIONALLY :Why is this a good idea? What does it mean? Isn’t SQLrelational anyway?

And in any case ... What does "SQL" mean?*

Objectives:

1. Cover relational theory thoroughly/* what it is but not always why */

2. Apply that theory to SQL practice/* and explain esoteric SQL features */

* Ignore (e.g.) OLAP, dynamic SQL, user defined types, and other nonrelational stuff

Page 4: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 4

PREREQUISITES :

This seminar is not for complete beginners ... but it's not just a refresher course, either!

Aimed at database professionals:

• Know SQL reasonably well

• Know that relational theory is A Good Thing

Sadly, if your "relational" knowledge derives from SQLalone, you won't know the relational model as well as youshould, and you might know some things that ain't so

SQL the relational model !!!

Page 5: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 5

FOR EXAMPLE :

• What exactly is first normal form? • What’s the connection between relations and

predicates?

• What’s semantic optimization?

• What’s an image relation?

• What’s semidifference and why is it important?

• Why doesn’t deferred integrity checking make sense?

Page 6: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 6

• What’s a relation variable? • What’s prenex normal form?

• Can a relation have an attribute whose values are themselves relations?

• Is SQL relationally complete?

• What’s The Information Principle?

• How does XML fit with the relational model?

Page 7: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 7

Relational terms when discussing relational theory—relation, tuple, attribute (etc.); SQL terms when discussingSQL—table, row, column (etc.)

Note: The equivalences are not exact!

One term I’ll use in connection with both relational theoryand SQL: operator (SQL uses operator, function,procedure, routine, method, but they all mean the samething, pretty much)

Thus, e.g., "=", ":=", "+", SELECT,DISTINCT, UNION, SUM, operatorsGROUP BY, etc., etc.

TERMINOLOGY :

Page 8: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 8

WHY DO YOU NEED TO KNOW RELATIONAL THEORY ??? Because it's PRINCIPLES ... FOUNDATIONS ...

Professionals should know the foundations of their field

Technology and products (and SQL) change all the time, but principles ENDURE ... Hence emphasis on:

• Principles, not products

• Foundations, not fads

Compromises and tradeoffs might be necessary in "the real world" but should always be made from a position of conceptual strength

Page 9: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 9

Those who are enamored of practice without theory arelike a pilot who goes into a ship without rudder orcompass and never has any certainty where he [sic] is going. Practice should always be based on a sound knowledge of theory.

—Leonardo da Vinci (1452-1519)

Languages die ... mathematical ideas do not.

—G. H. Hardy (1877-1947)

SOME NICE QUOTES :

Page 10: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 10

THEORYTHEORY

ISIS

PRACTICAL !PRACTICAL !

Page 11: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 11

The gap between theory and practice

is not as wide in theory

as it is in practice

—Anon.

UNFORTUNATELY ...

Page 12: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 12

CODD’S ORIGINAL RELATIONAL MODEL : AN OVERVIEW

STRUCTURE:

types ("domains")n-ary relationsattributes, tupleskeys: candidate, primary, foreign

INTEGRITY:

entity integrity /* but I don't believe in nulls !!! */

referential integrity

MANIPULATION:

relational algebra: /* see later re relational calculus */intersection, union, difference, product restrict, project, join, divide

relational assignment

DEPT DNO LOC BUDGET

EMP ENO ENAME DNO SAL

Page 13: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 13

CODD’S ORIGINAL RELATIONAL ALGEBRA : AN OVERVIEW

(natural) join

a1 b1 b1 c1 a1 b1 c1 a2 b1 b2 c2 a2 b1 c1 a3 b2 b3 c3 a3 b2 c2

intersect difference

project

a

divide

a xa ya zb xc y

xz

abc

xy

a x a y b x b y c x c y

(select)

restrictproduct

union

Page 14: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 14

SP SNO PNO QTY

S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400

P PNO PNAME COLOR WEIGHT CITY

P1 Nut Red 12.0 LondonP2 Bolt Green 17.0 ParisP3 Screw Blue 17.0 Oslo

P4 Screw Red 14.0 LondonP5 Cam Blue 12.0 ParisP6 Cog Red 19.0 London

S SNO SNAME STATUS CITY

S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens

THE SUPPLIERS-AND-PARTS DATABASE :

Page 15: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 15

MODEL vs. IMPLEMENTATION :

Unfortunately the term "data model" is used in the IT world with two very different meanings:

Data model (first sense): An abstract, self-contained, logical definition of the objects, operators, and so forth, that together make up the abstract machine with which users interact. The objects allow us to model the structure of data. The operators allow us to model its behavior.

Implementation: The physical realization on a real machine of the components of the abstract machine that together constitute the data model in question.

Page 16: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 16

MODEL vs. IMPLEMENTATION (cont.) :

Data model (second sense): A model of the persistent data of some particular enterprise (i.e., a logical DB design).

First meaning: Like a programming language, whose constructs can be used to solve many specific problems, but in and of themselves have no direct connection with any such specific problem

Second meaning: Like a specific program written in that language—uses the facilities provided by the model (first meaning) to solve some specific problem

Page 17: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 17

MODEL vs. IMPLEMENTATION (cont.) :

From here on "model" means the first sense (barring explicit statements to the contrary)

Don’t confuse model vs. implementation !!! ... e.g., don’t confuse keys vs. unique indexes

Model vs. implementation implies (physical) data independence ... Hence protection of investment

Everything to do with performance is primarily an implementation, not a model, issue!

/* and recommendations to follow are almost NEVER *//* driven by performance concerns ... */

Page 18: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 18

E.g., "JOINS ARE SLOW" :MAKES NO SENSE !!!

S JOIN SP /* good */

vs. /* bad */

do for all tuples in S ;fetch S tuple into TS , TN , TT , TC ;do for all tuples in SP with SNO = TS ;

fetch SP tuple into TS , TP , TQ ;emit tuple TS, TN , TT , TC , TP , TQ ;

end ;end ;

Recommendation: Don’t do this!

Page 19: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 19

PROPERTIES OF RELATIONS : Every relation has a heading

(set of attribute names—more precisely, attribute-name:type-name pairs, butinformally we often ignore the types)

and a body (set of tuples)

No. of attributes = degree, no. of tuples = cardinality

Relations never contain duplicate tuples /* SQL fails here */

The tuples of a relation are unordered, top to bottom

The attributes of a relation are unordered, left to right /* SQL fails here */

Page 20: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 20

Every subset of a tuple is a tuple ... Every subset of a heading is a heading ... Every subset of a body is a body

Tuple equality: Two tuples EQUAL iff (= if and only if)

Same attributes (i.e., same attribute-name/type-name pairs)

And attributes with same name have same attribute value

I.e., iff they're the same tuple !!!

Two tuples are duplicates iff they're equal

MANY features of the relational model rely on the above

NOTE THAT :

Page 21: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 21

MORE ON RELATIONS : Relations are always normalized (i.e., in first normal form, 1NF)

A relation and a table aren’t the same thing!

A table can be regarded as a CONCRETE picture of an ABSTRACT idea (but it’s a significant advantage

of the relational model that its fundamental dataobjects have such a simple and easily

understoodconcrete representation)

Base vs. derived relations /* see next page */

Page 22: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 22

BASE vs. DERIVED RELATIONS :Rel ops let us start with given rels and derive further rels (e.g., by doing queries) ... Given rels are base ones, others are derived

Must be able to define base ones (CREATE TABLE in SQL) and base ones must be named

Certain derived rels—in particular, views (aka virtual rels)—are named too: e.g.,

CREATE VIEW SST_PARISAS SELECT SNO , STATUS

FROM SWHERE CITY = ‘Paris’ ;

Page 23: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 23

Value of view at time t = result of evaluating definingexpression at time t

Can operate on views as if they were base rels ...Can think of view as being conceptually materializedat time of reference

• But it isn’t really materialized!/* at least, we hope not */

• And materialization wouldn’t work for updates anyway

Page 24: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 24

What you often hear:

• Base rels "physically exist"

• Views don’t "physically exist"

Wrong!

RM deliberately has nothing to say about physical storage matters!

Also ... it’s all relations !!!

POPULAR MISCONCEPTIONS :

Page 25: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 25

FROM A RECENT TEXTBOOK :

"[It] is important to make a distinction between stored relations, which are tables, and virtual relations, which are views ... [We] shall use relation only where a table or a view could be used. When we want to emphasize that a relation is stored, rather than a view, we shall sometimes use the term base relation or base table."

• How many confusions here?

• No wonder there's so much confusion out there, if this is typical of the quality of the teaching (which it probably is)

Page 26: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 26

ONE FURTHER (important) PRELIMINARY :RELATIONS vs. RELVARS Historically there has been much confusion between

relations as such (i.e., relation values) and relationvariables

Consider: DECLARE N INTEGER ... —pgmg lang

N is an integer variable whose values are

integers per se

Likewise: CREATE TABLE T ... —SQL

T is a relation variable whose values arerelations per se /* ignoring SQL quirks */

For example:

Page 27: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 27

S SNO SNAME STATUS CITY

S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris

relationvariable

currentrelationvalue

DELETE S WHERE CITY = ‘Paris’ ;

Shorthand for:

S := S WHERE NOT (CITY = ‘Paris’ ) ;

currentrelationvaluerelation

variable

S SNO SNAME STATUS CITY

S1 Smith 20 London

Page 28: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 28

HENCE :• INSERT / DELETE / UPDATE are all shorthand for some

relational assignment, and—by definition—they allassign some relation value to some relation variable

• A relation variable or relvar is a variable whose permittedvalues are relations

Base (or real) relvar: One that isn’t virtual

Virtual relvar: One that’s defined by means of some specified relational expression in terms of one or more other relvars

• Henceforth: “Relation” means relation / “relvar” means relvar! ... and we ought to start again

Page 29: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 29

BY THE WAY :SQL doesn’t support relational assignment as such ... So foregoing example

S := S WHERE NOT ( CITY = ‘Paris’ ) ;

is expressed in Tutorial D ... Self-explanatory (?) "toy" language used by Date and Darwen to illustrate the ideas of The Third Manifesto

In what follows, I’ll use Tutorial D to illustrate relational concepts (as well as showing SQL analogs where applicable)

Page 30: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 30

ASIDE : THE THIRD MANIFESTOC. J. Date and Hugh Darwen: Databases, Types, and the Relational Model: The Third Manifesto (3rd edition, Addison-Wesley, 2006)

• Proposal for future direction of data and DBMSs

• D = any language that conforms to Manifesto principles(generic name)

• Tutorial D = language used in Manifesto book as a basis for examples

See www.thethirdmanifesto.com

Page 31: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 31

VALUES vs. VARIABLES IN GENERAL :

VALUE : an "individual constant" no location in time or space can’t be changed can be represented in memory (by some encoding)

VARIABLE : a holder for (the representation of) a value has location in time and space can be updated (i.e., current value can be

replaced by another)

Important note: Values and variables (more fundamentally, types) can be arbitrarily complex

Hard to imagine people getting confused over such a basic distinction, but they do ...

Page 32: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 32

VALUE vs. VARIABLE CONFUSION :AN EXAMPLE :

"We distinguish the declared type of a variable from ... the type of the object that is the current value of the variable ...

(so an object is a value)

"... we distinguish objects from values ...

(so an object isn't a value after all) — ???

"... a MUTATOR [is an operation such that it's] possible to observe its effect on some object."

(in fact, an object is a variable) — ?????

Page 33: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 33

A GUIDING PRINCIPLE AND A GREAT AID TO CLEAR THINKING :

All logical differences are big differences—Wittgenstein

Examples:

Model vs. implementation Relation vs. table Value vs. variable Attribute vs. column Relation vs. relvar Tuple vs. row Base relvar vs. view SQL vs. relational model Data model (1st sense) vs. DB vs. DBMS

data model (2nd sense) Expression vs. statement

Page 34: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 34

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 35: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 35

RELATIONS ARE DEFINED OVER TYPES :

RM implies support for user defined types—hence, user defined operators also—hence, an "object/relational" DBMS done right is just a relational DBMS done right!

RM attributes can be of any type whatsoever, except (a) no pointer valued attributes; (b) relation r cannot have an attribute of the same type as r itself (see later)

But whole point about user defined types is: They look just like system defined types to other users ... So I’ll just assume types are system defined (mostly)

RM prescribes type BOOLEAN ... Assume CHAR, INTEGER, FIXED available too /* see later for SQL */

Page 36: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 36

DOMAINS AND TYPES ARE THE SAME THING :

1. Equality comparisons and "domain check override" (DCO)

domains really are types ...

Note: Assume for sake of discussion that SNO attribsin S and SP are of user defined type SNO ... PNO attribs in P and SP are of user defined type PNO

Caveat: Only fair to warn you that I discuss "DCO" onlyto dismiss it ... as we’ll see

• Data value atomicity and first normal form

... of arbitrary complexity

Page 37: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 37

EQUALITY COMPARISONS : "Everyone knows" that two values can be tested for equality

only if they come from the same domain

E.g., with suppliers and parts: SP.SNO = S.SNO /* OK */ SP.PNO = S.SNO /* not OK */

Any relational op—join, union, etc.—that calls for an explicit or implicit equality comparison between values from different domains should fail /* at compile time */

E.g., SELECT S.SNO, S.SNAME, S.STATUS, S.CITY FROM S WHERE NOT EXISTS ( SELECT * FROM SP WHERE SP.PNO = S.SNO ) /* not OK */

Probably a typo

Page 38: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 38

Comparison "SP.PNO = S.SNO" is INVALID

—unless user insists ... (Codd's "domain check override" ops)

BUT, according to Codd:

P.WEIGHT = SP.QTY /* not OK */ P.WEIGHT - SP.QTY = 0 /* OK ... ?!?!? */

"... DBMS checks that the basic data types are the same" [Codd's book on RM/V2 p.47, italics added]

So there’s something strange about Codd-style domain checks in the first place, let alone "domain check override"

EQUALITY COMPARISONS (cont.) :

Page 39: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 39

"DOMAIN CHECK OVERRIDE" : Indeed, "domain check override" (DCO) is not the appropriate concept (in fact, it makes no sense AT ALL*) ...

Consider comparisons:

S.SNO = 'X4' P.PNO = 'X4' S.SNO = P.PNO

valid valid invalid

What's going on ???

Well ...

* Stems from failure to recognize another logical difference! (see next page)

Page 40: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 40

SNO, PNO are types—represented internally in terms of type CHAR, say—but representation is (or should be) irrelevant and HIDDEN! (it’s an implementation issue) ... Logical difference between type and representation

Also selector operators SNO, PNO that effectively convert CHAR values to types SNO, PNO—invoked implicitly in:

S.SNO = 'X4' P.PNO = 'X4'

(i.e., strings coerced to type SNO or PNO: see later)

Plus operators for inverse conversions too (in effect)

This mechanism provides domain checking and "DCO" capability in a clean, fully orthogonal, non ad hoc manner

Page 41: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 41

What we’re really talking about is

STRONG TYPING

Which incidentally would correctly deal with expressions such as

P.WEIGHT * SP.QTY ( WEIGHT )

P.WEIGHT + SP.QTY ( invalid )

SPX.QTY + SPY.QTY /* SPX and SPY both shipments */

( QTY )

etc., etc.

Page 42: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 42

DATA VALUE ATOMICITY : First normal form (1NF) requires every attribute value in every tuple to be "atomic"

Codd defines atomic as "nondecomposable by the DBMS (excluding certain special functions)"

But this defn is a trifle puzzling, and/or not very precise ...

What about strings (SUBSTR, LIKE, etc.)? numbers (INTEGER, FRACTION, etc.)? dates (YEAR, MONTH, DAY)? times (HOUR, MIN, SEC)?

Not to mention, e.g., view defns in the catalog

Page 43: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 43

NOW WATCH VERY CAREFULLY !!!

R1 R2 R3

SNO PNO SNO PNO SNO PNO_SET

S2 P1 S2 P1,P2 S2 {P1,P2}S2 P2 S3 P2 S3 {P2}S3 P2 S4 P2,P4,P5 S4 {P2,P4,P5}S4 P2 S4 P4S4 P5

This one isclearly 1NF...

This one is clearly NOT 1NF... PNO is "repeating group" or"multivalued" (?)

But this one is 1NF again !!!

Page 44: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 44

Values of PNO_SET in R3 are no more and no less

"decomposable by the DBMS" than are strings, dates,

etc.

(R3 might not be a good DESIGN—that’s a separate

issue)

The real point:

"Atomicity" has no absolute meaning!

Page 45: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 45

A CLOSER LOOK AT R3 :

SNO PNO_REL

S2 PNO

P1 P2

S3 PNO

P2

.. ....

/* note name change */

Values in PNO_REL position are RELATIONS!

… PNO_REL is a relation- valued attribute (RVA)

/* no “table valued columns” in *//* SQL, though SQL does support *//* columns with values that are *//* “multisets of rows” */

Page 46: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 46

A DOMAIN IS A DATA TYPE (summary) :Domains, and therefore attributes, can contain ABSOLUTELY

ANYTHING !!! (any values, that is)

Arrays, lists, relations, XML docs, photos, ... I.e., values of ARBITRARY COMPLEXITY Without violating first normal form!

Recap: RM implies support for user defined types—hence, user defined ops also—hence, an "O/R" DBMS done right is just a relational DBMS done right!

From here on, favor type over domain

DOMAIN TYPE

Page 47: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 47

TO SPELL IT OUT ONE MORE TIME :

THE QUESTION AS TO WHAT TYPES ARE SUPPORTED ISORTHOGONAL TO THE QUESTION OF SUPPORT FOR THE

RELATIONAL MODEL

More succinctly:

TYPES ARE ORTHOGONAL TO TABLES

The relational model has NEVER prescribed data types

(it's never been implemented either—but that's another matter)

Page 48: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 48

SO WHAT’S A TYPE ???

Basically, a named set of values—e.g., all possible integers (INTEGER); all possible character strings (CHAR); all possible supplier numbers (SNO); all XML docs ... all fingerprints ... all X rays ... etc., etc.

Every value (in partic, every relation) is of some type—in fact, exactly one type /* so types disjoint */ unless type inheritance is supported—and carries its type with it

Every variable (in partic, every relvar), every attribute of every relation, every operator that returns a result, and every parameter of every operator is declared to be of some type

Page 49: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 49

To say that variable V is of type T is to say that every value v that can legally be assigned to V is of type T

Aside: To say that V is a variable is to say that V is "assignable to" (i.e., updatable)

Every expression denotes some value and is of some type = type of value in question = type of value returned by outermost operator

E.g., type of

( a + b ) * ( x - y )

is whatever the declared type of "*" is

Page 50: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 50

Associated with type T is a set of ops for operating on values and variables of type T ... ("associated with" means op in question has parameter of declared type T)

E.g., system-defined type INTEGER:

System defines ":=", "=", "<", etc., for assigning and comparing integers

And "+", "*", etc., for arithmetic on integers

Perhaps CAST to convert integers to char strings

But not "||", SUBSTR, etc.

Page 51: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 51

E.g., user-defined type SNO:

Type definer defines ":=", "=", and maybe "<" etc., for assigning and comparing supplier numbers

But not "+", "*", etc.

Subscript ops for arrays

Special arith ops for dates and times

XQuery ops for XML docs ... and so on

Page 52: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 52

DEFINING A NEW TYPE INVOLVES AT LEASTALL OF THE FOLLOWING :

1. Specifying a name for the type

2. Specifying the values that make up the type /* see later */

3. Specifying the physical representation /* ignore */

4. Specifying a selector op for selecting values of the type /* see later */

5. Specifying ops that apply to values and variables of the type ... Must include "=" and ":=" !!!

6. For those ops that return a result, specifying the type of the result (so DBMS knows which expressions are legal, and type of result of every legal expression)

Page 53: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 53

EXAMPLE (Tutorial D) :

Define type:

TYPE POINT ... /* geometric points in 2D space */ ;

Define op REFLECT that, given point (x,y), returns inversepoint (-x,-y):

OPERATOR REFLECT ( P POINT ) RETURNS POINT ;RETURN POINT ( - THE_X ( P ) , - THE_Y ( P ) ) ;

/* POINT selector invocation ... takes two */

/* arguments (unlike SNO selector earlier)*/

END OPERATOR ;

Page 54: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 54

POINTS ARISING (sorry) :

Another important logical difference:

argument vs. parameter

And another:

operator vs. invocation

Selector is a generalization of the familiar concept of a literal

Page 55: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 55

NOTE TOO THAT :

The values that make up a given type exist BEFORE the DB exists, WHILE the DB exists, and AFTER the DB exists ... Better: They "have no location in time or space"

Defining type T just means "now we're interested in a certain set of values and we want to call it T"

Similarly for dropping type T

Values and sets of values don't "belong" to any particular DB!

Page 56: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 56

Type is scalar if no user visible components, nonscalar otherwise

Values, variables, etc., of type T are scalar if T is scalar, nonscalar otherwise

Nonscalar example (Tutorial D):

VAR S BASERELATION { SNO CHAR , SNAME CHAR ,

STATUS INTEGER , CITY CHAR }KEY { SNO } ;

RELATION {...} is a relation type (nonscalar) /* order in which attribs specified insignificant */

SCALAR vs. NONSCALAR/* informal distinction */ :

Page 57: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 57

RELATION {...} is also a generated type ...obtained by invoking RELATION type generator(not defined by separate TYPE statement)

Example involving TUPLE type generator:

VAR SINGLE_SUPPLIERTUPLE { STATUS INTEGER , SNO CHAR ,

CITY CHAR , SNAME CHAR } ;

Code fragment /* illustrating "tuple extraction" */ :

SINGLE_SUPPLIER := TUPLE FROM ( S WHERE SNO = ‘S1’ ) ;

Note logical difference between tuple t and relation r containing just tuple t !!!

TYPE GENERATORS :

Page 58: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 58

BOOLEAN NUMERIC(p,q) DATECHARACTER(n) DECIMAL(p,q) TIMECHARACTER VARYING(n) INTEGER

TIMESTAMPFLOAT(p)SMALLINT INTERVAL

1. Various defaults, abbreviations, alternative spellings

2. Literals (more or less conventional)

3. Scalar assignment:

SET <scalar var ref> = <scalar exp> ;

Plus implicit scalar assignments on FETCH etc.

SCALAR TYPES IN SQL :

Page 59: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 59

4. Scalar equality comparison:

<scalar exp> = <scalar exp>

Plus implicit comparisons on DISTINCT, UNION, etc.

Unfortunately "=" support is badly flawed!

Can give TRUE even if comparands clearly distinguishable /* discuss in a moment */

Can fail to give TRUE even if comparands not distinguishable /* see nulls, later */

Page 60: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 60

5. BOOLEAN might not be supported ... If it isn’t:

Boolean exps can still appear in WHERE, ON, HAVING

But no table can have a column of type BOOLEAN, and no variable can be declared to be of type BOOLEAN

So workarounds might be needed ...

6. SQL also supports "domains" ... But SQL domains aren’t types at all ... In fact, completely unnecessary, now that SQL does support user defined types ... Use them if you like, but don’t mistake them for true relational domains

Page 61: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 61

SQL supports a weak form of strong typing (!) on assignmentand equality comparisons:

BOOLEAN : BOOLEAN Character string : Character string Number : Number

(plus various rules for dates, times, etc.)

In other words, SQL often does coercions

One bizarre consequence: Certain unions (etc.) can yieldresult with rows not appearing in either operand!

SQL TYPE CHECKING AND COERCIONS :

Page 62: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 62

FOR EXAMPLE :

INTEGER

T1 X Y T2 X Y X Y

0 1.0 0.0 0 0.01.0

0 2.0 0.0 1 0.02.0

1.0 2 0.00.0

1.02.0

NUMERIC(5,1)

SELECT X , Y FROM T1UNIONSELECT X , Y FROM T2 ... Result:

Page 63: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 63

RECOMMENDATIONS :

1. Ensure that columns with the same name are always of the same type /* see later */

2. Avoid type conversions where possible

3. When they can’t be avoided, do them explicitly:

SELECT CAST ( X AS NUMERIC(5,1) ) AS X , Y FROM T1

UNIONSELECT X , CAST ( Y AS NUMERIC(5,1) ) AS Y FROM

T2

I.e., avoid coercions! /* general good practice */

Page 64: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 64

UNFORTUNATELY :Certain coercions are built into the definition of SQL andcan’t be avoided! Just for the record:

If table exp tx is used as a row subquery, then the table t denoted by tx should have just one row r, and t is coerced to r

If table exp tx is used as a scalar subquery, then the table t denoted by tx should have just one column and just one row and hence contain just one value v, and t is doubly coerced to v

If the "row exp" rx in the ALL or ANY comparison rx theta sq (where theta is, e.g., >ALL or <ANY and sq is a subquery) is in fact a scalar exp, the scalar value v denoted by that exp is coerced to a row that contains just v

Page 65: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 65

SQL COLLATIONS :Type checking and coercion for character strings are morecomplex than I’ve been pretending ...

Given string consists of chars from one character set and has one collation

Given collation = rule for specific character set ... Governs comparison of strings of chars from that set

Let C be a collation for character set S, and let a and b be any two characters from S. Then C must be such that exactly one of

a < b a = b a > b

gives TRUE and the other two give FALSE (under C)

Page 66: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 66

COMPLICATIONS :

Either PAD SPACE or NO PAD can apply to collation C

Under PAD SPACE, distinct strings (e.g., ‘AB’ and ‘AB ’) can "compare equal"

Recommendation: Don’t use PAD SPACE!

But distinct strings might still "compare equal" even with NO PAD ... E.g., if C is CASE_INSENSITIVE

Recommendation: Don’t do this ... or if you must, then be very careful!

Page 67: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 67

Call v1 and v2 "equal but distinguishable" if they’re distinctbut v1 = v2 gives TRUE

In UNION, JOIN, MATCH, LIKE, UNIQUE, etc., implicitequality rule is indeed "equal even if distinguishable"

In UNION, JOIN, GROUP BY, DISTINCT, etc., DBMSmight have to choose which "equal but distinguishable"value is to appear in some column in some result row

SQL gives little guidance in such situations!

Hence, certain SQL expressions are indeterminate!... or "possibly nondeterministic" (SQL term)

Page 68: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 68

For example,

SELECT MAX ( Z )FROM T

might return ‘ZZZ’ on one occasion and ‘zzz’ on another,even if T hasn’t changed in the interim!

One important consequence: Many SQL table exps aren’t allowed in constraints !!!

Strong recommendation: Avoid possibly nondeterministic expressions as much as you

can!

Page 69: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 69

Recall:

VAR SINGLE_SUPPLIERTUPLE { STATUS INTEGER , SNO CHAR ,

CITY CHAR , SNAME CHAR } ;

SQL analog of TUPLE type generator = ROW type constructor

DECLARE SINGLE_SUPPLIER /* SQL row variable */ROW ( SNO VARCHAR(5) , SNAME VARCHAR(25) ,

STATUS INTEGER , CITY VARCHAR(20) ) ;

But "field" [sic] order matters! ... 4 fields can be arranged into 24 distinct row types!

SQL ROW TYPES :

Page 70: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 70

Row assignment: e.g.,

SET SINGLE_SUPPLIER = ( S WHERE SNO = ‘S1’ ) ;/* row subquery ... */

Note the coercion here !!!

Row comparison: /* see later */

SQL ROW TYPES (cont.) :

Page 71: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 71

SQL doesn’t really have a TABLE type generator (or constructor) at all !!! Recall:

VAR S BASERELATION { SNO CHAR , SNAME CHAR ,

STATUS INTEGER , CITY CHAR } KEY { SNO } ;

SQL analog:

CREATE TABLE S( SNO VARCHAR(5) NOT NULL , /* note strange */

SNAME VARCHAR(25) NOT NULL , /* jumble of */STATUS INTEGER NOT NULL , /* column and */CITY VARCHAR(20) NOT NULL , /* constraint */UNIQUE ( SNO ) ) ; /* defns */

WHAT ABOUT SQL TABLE TYPES ???

Page 72: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 72

No sequence of linguistic tokens in that CREATE TABLE statement that can logically be labeled "an invocation of the TABLE type constructor"

If table S has any type at all, it’s just bag of rows, where the rows are of type

ROW ( SNO VARCHAR(5) ,SNAME VARCHAR(25) ,STATUS INTEGER ,CITY VARCHAR(20) )

Page 73: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 73

ASIDE : "TYPED TABLES"

Very bad term! ... If "typed table" TT defined to be "of typeT," then TT is not of type T, and nor are its rows!

Avoid such tables anyway, because they’re inextricablyintertwined with SQL’s support for pointers ...

RM prohibits pointers ... But SQL allows a column in onetable to have values that are pointers to rows in some othertable ... Pointers are reference values, columns containingthem are of some REF type ... Why?

Strong recommendation: Don’t use such tables, nor any features related to them!

Page 74: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 74

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 75: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 75

A SAMPLE TUPLE VALUE (tuple for short) :

attribute name type name

SNO:CHAR SNAME : CHAR STATUS : INTEGER CITY : CHAR

S1 Smith 20 London

degree = 4

Attribute : attribute name + type name

Component : attribute + attribute value

Heading : { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

Type : TUPLE {SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

attribute value

Page 76: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 76

By definition, no left to right ordering to components (so ordering arbitrary in written form)

By definition, every tuple contains exactly one value, of approp type, for each attribute

No nulls !!! (nulls aren’t values)

Recommendation: Never say "null value"!

Sample tuple selector invocation (tuple literal):

TUPLE { SNO ‘S1’ , SNAME ‘Smith’ , STATUS 20 , CITY ‘London’ }/* keyword TUPLE does double duty in Tutorial D */

Page 77: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 77

Two tuples equal ("duplicates") iff very same tuple ("" and "" make sense, "<" and ">" don’t)

Every subset of a heading is a heading ...Every subset of a tuple is a tuple: e.g.,

SNO : CHAR CITY : CHAR SNO : CHAR

S1 London S1

The empty set is a subset of every set ... So the empty tuple (or 0-tuple) is a valid tuple! (and there’s only one) ... Type and value both TUPLE{} in Tutorial D

Tuple assignment and comparisons: Already discussed

Page 78: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 78

ATTRIBUTE EXTRACTION :

Note logical difference between value v and tuple t (ofdegree one) that contains just v !!!

Let t be a tuple—say the tuple for supplier S1 in currentvalue of suppliers-and-parts DB

Tutorial D: CITY FROM t—"extracts" CITY value from t

SQL analog: t.CITY

Page 79: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 79

SQL ROWS :

Tutorial D term: SQL analog (approx.):

tuple (value) rowTUPLE type generator row type constructortuple selector row value constructortuple variable row variable (?)

But SQL rows have left to right ordering to their "fields" ...e.g., ROW(1,2) ROW(2,1) * ... Fields identified by ordinalposition, not by name

No "0-row"

* Keyword ROW optional in row value constructors andusually omitted

Page 80: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 80

ROW ASSIGNMENT :

SET syntax (as for scalars) /* already discussed */

Row assignments also involved (in effect) in UPDATE: e.g.,

UPDATE SSET STATUS = 20 , CITY = ‘London’WHERE CITY = ‘Paris’ ;

Logically equivalent to:

UPDATE SSET ( STATUS , CITY ) = ( 20 , ‘London’ )WHERE CITY = ‘Paris’ ;

Page 81: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 81

ROW COMPARISONS :Believe it or not, most boolean exps in SQL, even simple"scalar" comparisons, are defined in terms of rows, notscalars!

Example involving "genuine" row comparison:

SELECT SNOFROM SWHERE ( STATUS , CITY ) = ( 20 , ‘London’ )

Logically equivalent to:

SELECT SNOFROM SWHERE STATUS = 20 AND CITY = ‘London’

Page 82: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 82

SELECT SNOFROM SWHERE ( STATUS , CITY ) <> ( 20 , ‘London’ )

Logically equivalent to:

SELECT SNOFROM SWHERE STATUS <> 20 OR CITY <> ‘London’

Page 83: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 83

Because row components have left to right ordering, SQLcan support "<" and ">" on rows:

SELECT SNOFROM SWHERE ( STATUS , CITY ) > ( 20 , ‘London’ )

Logically equivalent to:

SELECT SNOFROM SWHERE STATUS > 20 OR ( STATUS = 20 AND CITY > ‘London’ )

/* hmmm ... */

Page 84: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 84

But most row comparisons involve rows of degree one:

SELECT SNOFROM SWHERE ( STATUS ) = ( 20 )

Syntax rule: Parens can be dropped from row valueconstructors of degree one ... Thus:

SELECT SNOFROM SWHERE STATUS = 20

But this "scalar" comparison is stil technically a rowcomparison (scalar comparands coerced to rows)

Page 85: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 85

RECOMMENDATION :Unless the rows being compared are of degree one (i.e.,effectively scalars):

Don’t use "<", "<=", ">", and ">=" comparisons

• They rely on left to right column ordering

• No straightforward relational counterpart

• Error prone

In this connection ... it’s worth noting that the SQLstandardizers took several iterations to get the semanticsright!

Page 86: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 86

A SAMPLE RELATION VALUE(relation for short) :SNO:CHAR SNAME : CHAR STATUS : INTEGER CITY : CHAR

S1 Smith 20 LondonS2 Jones 10 ParisS3 Blake 30 ParisS4 Clark 20 LondonS5 Adams 30 Athens

Heading : { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

/* tuple heading as previously defined */ /* … same attributes and same degree */

Type : RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

Body : { tuples all with specified heading }

Cardinality : cardinality of body

Page 87: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 87

NOTE THAT : "Relations contain tuples" only indirectly true!

By definition:

No relation contains duplicate tuples—including results of relational operators

No top to bottom ordering to tuples, no left to right ordering to attributes

Every tuple of every relation contains exactly one value, of approp type, for each attribute—i.e., relations are always normalized

No nulls!

Page 88: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 88

Every subset of a body is a body (loosely, every subset of a rel is a rel—empty subset included ("empty relation")

Given rel type RT, there’s exactly one empty rel of type RT

Tuple extraction: Already discussed

t r : TRUE iff t appears in r ... SQL example:

SELECT SNO , SNAME , STATUS , CITYFROM SWHERE SNO IN /* SNO coerced to ROW(SNO) */

( SELECT SNOFROM SP )

Page 89: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 89

ANOTHER POINT :RELATIONS ARE n-DIMENSIONAL (massive confusion on this simple point!) …

A couple of quotes:

1. "When you’re well trained in relational modeling, you begin to believe the world is two-dimensional … You think you can get anything into the rows and columns of a table" —Douglas Barry, Executive Director, ODMG

2. "There is simply no way to mask the complexities involved in assembling two-dimensional data into a

multi-dimensional form"—Richard Finkelstein

Page 90: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 90

But a relation with n attributes (i.e., of degree n) represents points in n-dimensional space ... It’s n-dimensional, not 2-dimensional !!!

Of course a relation looks flat when pictured in tabular form on paper … but a picture of a thing isn’t the thing itself !!!

A major logical difference here, in fact!

Let’s all vow never to say "flat relations" ever again

Page 91: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 91

Page 92: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 92

RELATIONAL COMPARISONS :

Must be able to test rels for equality, of course:e.g.,

S { CITY } = P { CITY } /* FALSE */

Other useful comparison ops:

Useful shorthands:

IS_EMPTY ( r ) IS_NOT_EMPTY ( r )

Page 93: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 93

RELATIONS OF DEGREE ZERO :Empty heading is a valid heading ... So a relation can be ofdegree zero! Type is RELATION{} in Tutorial D

(Such rels are a little hard to draw)

Can a relation with no attributes have any tuples?

Yes, it can have AT MOST ONE TUPLE (the 0-tuple)

One tuple: TABLE_DEE /* DEE for short */No tuples: TABLE_DUM /* DUM for short */

Fundamentally important! (perhaps surprisingly)

But not supported in SQL ...

Page 94: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 94

WHY ARE THEY SO IMPORTANT ?

Because DEE corresponds to YES (or TRUE) and DUM corresponds to NO (or FALSE) !!!

/* see later for further explanation */

Also ... DEE and DUM (especially DEE) play a role in therelational algebra analogous to the role played by 0 inconventional arithmetic

/* again, see later for further explanation */

Page 95: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 95

SQL TABLES :

I.e., table values, unless context demands otherwise/* see later re table variables */

SQL has no "table type" notion ... An SQL table is just abag of rows of some row type ... Hence, no "TABLE typegenerator" (though SQL does support ROW, ARRAY,MULTISET type generators)

But SQL table value constructor is analogous (somewhat)to a relation selector. E.g. /* "table literal" */

VALUES ( 1, 2 ), ( 2, 1 ), ( 1,1 ), ( 1,2 )

Denotes table with 2 unnamed columns and 4 (not 3!) rows

Page 96: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 96

ANOTHER EXAMPLE :

VALUES ( ‘S1’ , ‘Smith’ , 20 , ‘London’ ) , ( ‘S2’ , ‘Jones’ , 10 , ‘Paris’ ) , ( ‘S3’ , ‘Blake’ , 30 , ‘Paris’ ) , ( ‘S4’ , ‘Clark’ , 20 , ‘London’ ) , ( ‘S5’ , ‘Adams’ , 30 , ‘Athens’ )

Recommendations:

1. For each column, ensure all values are of the same type

2. Don’t specify same row twice

Page 97: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 97

TABLE COMPARISONS ???No direct support, but workarounds are available ...E.g., SQL analog of

S { CITY } = P { CITY }

is:

NOT EXISTS ( SELECT CITY FROM SEXCEPTSELECT CITY FROM P )

ANDNOT EXISTS ( SELECT CITY FROM P

EXCEPTSELECT CITY FROM S )

Page 98: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 98

COLUMN NAMING (very important!) :

RM attribute naming discipline:

No anonymous attributesNo duplicate attribute names

SQL enforces analogous discipline for tables that arecurrent values of table variables (CREATE TABLE orCREATE VIEW) but not for tables resulting from evaluationof some table expression

Very strong recommendation: /* Why? See later */Use AS to enforce discipline if SQL doesn’t!*

* But you can’t, with VALUES expressions

Page 99: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 99

EXAMPLES :SELECT DISTINCT SNAME , ‘Supplier’ AS TAGFROM S

SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUSFROM S

CREATE VIEW SDS AS SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS

FROM S ;

SELECT DISTINCT S.CITY AS SCITY , P.CITY AS PCITYFROM S , SP , PWHERE S.SNO = SP.SNOAND SP.PNO = P.PNO

Page 100: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 100

SELECT TEMP.*FROM ( S JOIN P ON S.CITY > P.CITY ) AS TEMP ( SNO , SNAME , STATUS , SCITY ,

PNO , PNAME , COLOR , WEIGHT , PCITY )

SELECT MAX ( WEIGHT ) AS MBWFROM PWHERE COLOR = ‘Blue’

Note: Can ignore recommendation if no need to referencecolumn subsequently: e.g.,

SELECT ... WHERE WEIGHT < ( SELECT MAX ( WEIGHT ) FROM P WHERE P.COLOR = ‘Blue’ )

Page 101: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 101

WHY IS COLUMN NAMING IMPORTANT ???

Rel alg ops (e.g., UNION) rely on proper attrib naming

One reason: Avoids complexities caused by relying on ordinal position!

To use SQL relationally, must apply same discipline to SQLanalogs ... As a prereq:

Very strong recommendation: If two columns represent"the same kind of information," give them the same namewherever possible!

E.g., SNO and SNO, not (say) SNO and SNUM

If two columns represent different kinds of information, givethem different names (usually)

Page 102: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 102

Only situation where foregoing recommendation can’t befollowed = when two columns in same table representsame kind of information ... E.g.:

CREATE TABLE EMP ( ENO ... , MNO ... , ... ) ;

So column renaming sometimes necessary: e.g.,

( SELECT ENO , MNO FROM EMP ) AS TEMP1NATURAL JOIN( SELECT ENO AS MNO , ... FROM EMP ) AS TEMP2

/* join EMP to itself on MNO in "1st copy" *//* and ENO in "2nd copy" */

Page 103: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 103

But what if DB already violates naming discipline?Possible strategy:

For each base table T, define view V identical to T except for column renaming

Ensure V abides by column naming discipline

Operate in terms of V instead of T

Referred to subsequently as the "operate via viewsstrategy"

Page 104: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 104

Impossible to ignore ordinal position 100 percent ...Columns still have ordinal position even when they don’t needto (in base tables and views in particular)

Strong recommendation: Never write SQL code that relies

on ordinal position!

Contexts in which SQL attaches significance to ordinalposition:

SELECT * JOIN, UNION, INTERSECT, EXCEPT

VALUES INSERT if column name commalist omitted

ALL and ANY Column name commalist in CREATE VIEWcomparisons and range variable definitions

BUT ...

Page 105: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 105

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 106: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 106

WHY DUPLICATE ROWS ARE BAD NEWS :

I assume you know:

• Relational DBMSs include an optimizer ...

Purpose is to figure out the best way to implement user queries etc. ("best" = best performing)

• Optimizers transform relational expressions ("query rewrite")* ...

Replace exp1 by exp2, where exp1 and exp2 guaranteed to produce same result when evaluated but exp2 has better performance (we hope)

* But watch out for this term (has other meanings too)

Page 107: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 107

DUPLICATE ROWS (cont.) :

If a table permits duplicates, IT’S NOT A RELATION

RM doesn’t recognize duplicates

Example (with acknowledgments to Nat Goodman):

P PNO PNAME SP SNO PNO P1 Screw S1 P1 P1 Screw S1 P1 P1 Screw S1 P2 P2 Screw

No CKs !!!

Violate theInformationPrinciple !!!

Meaningshidden !!! Find part nos. for parts that either are screws or

are supplied by supplier S1

Page 108: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 108

SELECT P.PNO FROM P SELECT P.PNO FROM PWHERE P.PNAME = ‘Screw’ WHERE P.PNAME = ‘Screw’OR P.PNO IN UNION ALL (SELECT SP.PNO FROM SP SELECT SP.PNO FROM SP WHERE SP.SNO = ‘S1’) WHERE SP.SNO = ‘S1’

SELECT SP.PNO FROM SP SELECT DISTINCT P.PNO FROM PWHERE SP.SNO = ‘S1’ WHERE P.PNAME = ‘Screw’OR SP.PNO IN UNION ALL (SELECT P.PNO FROM P SELECT SP.PNO FROM SP WHERE P.PNAME = ‘Screw’) WHERE SP.SNO = ‘S1’

SELECT P.PNO FROM P, SP SELECT P.PNO FROM PWHERE (SP.SNO = ‘S1’ AND WHERE P.PNAME = ‘Screw’ P.PNO = SP.PNO) UNION ALLOR P.PNAME = ‘Screw’ SELECT DISTINCT SP.PNO FROM SP WHERE SP.SNO = ‘S1’

SELECT SP.PNO FROM P, SP SELECT P.PNO FROM PWHERE (SP.SNO = ‘S1’ AND WHERE P.PNAME = ‘Screw’ P.PNO = SP.PNO) UNIONOR P.PNAME = ‘Screw’ SELECT SP.PNO FROM SP WHERE SP.SNO = ‘S1’

DUPLICATE ROWS (cont.) :

Page 109: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 109

SELECT P.PNO FROM P P1*3 P2*1 SELECT P.PNO FROM P P1*5 P2*2WHERE P.PNAME = ‘Screw’ WHERE P.PNAME = ‘Screw’ OR P.PNO IN UNION ALL (SELECT SP.PNO FROM SP SELECT SP.PNO FROM SP WHERE SP.SNO = ‘S1’) WHERE SP.SNO = ‘S1’

SELECT SP.PNO FROM SP P1*2 P2*1 SELECT DISTINCT P.PNO FROM PP1*3 P2*2WHERE SP.SNO = ‘S1’ WHERE P.PNAME = ‘Screw’ OR SP.PNO IN UNION ALL (SELECT P.PNO FROM P SELECT SP.PNO FROM SP WHERE P.PNAME = ‘Screw’ WHERE SP.SNO = ‘S1’

SELECT P.PNO FROM P, SP P1*9 P2*3 SELECT P.PNO FROM P P1*4 P2*2WHERE (SP.SNO = ‘S1’ AND WHERE P.PNAME = ‘Screw’ P.PNO = SP.PNO) UNION ALLOR P.PNAME = ‘Screw’ SELECT DISTINCT SP.PNO FROM SP WHERE SP.SNO = ‘S1’

SELECT SP.PNO FROM P, SP P1*8 P2*4 SELECT P.PNO FROM P P1*1 P2*1WHERE (SP.SNO = ‘S1’ AND WHERE P.PNAME = ‘Screw’ P.PNO = SP.PNO) UNIONOR P.PNAME = ‘Screw’ SELECT SP.PNO FROM SP WHERE SP.SNO = ‘S1’

DUPLICATE ROWS (cont.) :

Page 110: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 110

Either (a) the user cares about the degree of duplication, or (b) the user does not care…

Expression transformation is inhibited!

Performance suffers

DBMS code quality suffers

Law-abiding users suffer

Particularly annoying if the user does NOT care !!!

DUPLICATE ROWS (cont.) :

Page 111: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 111

• If a table is a plot of points in some n-dimensional space, duplicates don’t add anything—just mean plotting the same point twice

• If table T permits duplicates, we can’t distinguish "genuine" duplicates and duplicates arising from data entry errors!

• If something is true, saying it twice doesn’t make it more true

Much more could be said ....

Please write out one googol times:There’s no such thing as a duplicate.

—Anon.

DUPLICATE ROWS : FURTHER ISSUES

Page 112: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 112

RM prohibits duplicates ... So to use SQL relationally, we must prevent them from occurring

Base tables: Specify at least one key /* see later */

Derived tables: SELECT ALL / UNION ALL / VALUES canall produce dup rows ...

VALUES already discussed ... Regarding ALL vs. DISTINCT: Can appear in SELECT / UNION / INTERSECT / EXCEPT / invocation of "set function" such as SUM /* this case is a little special ... see later */

DISTINCT is default for UNION / INTERSECT / EXCEPT ...ALL is default in other cases

AVOIDING DUPLICATES IN SQL :

Page 113: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 113

Obvious recommendations: Always specify DISTINCT ... preferably do so explicitly ... and never specify ALL

Unfortunately ... /* quote ex book */ :

At this point in the original draft, I added that if you find the discipline of always specifying DISTINCT annoying, don’t complain to me—complain to the SQL vendors instead. But my reviewers reacted with almost unanimous horror to my suggestion that you should always specify DISTINCT. One wrote: "Those who really know SQL well will be shocked at the thought of coding SELECT DISTINCT by default." Well, I’d like to suggest, politely, that (a) those who are "shocked at the thought" probably know the implementations well, not SQL, and (b) their shock is probably due to their recognition that those implementations do such a poor job of optimizing away unnecessary DISTINCTs.

SELECT / UNION / etc. :

Page 114: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 114

If I write SELECT DISTINCT SNO FROM S ..., that DISTINCT can safely be ignored. If I write either EXISTS (SELECT DISTINCT ...) or IN (SELECT DISTINCT ...), those DISTINCTs can safely be ignored. If I write SELECT DISTINCT SNO FROM SP ... GROUP BY SNO, that DISTINCT can safely be ignored. If I write SELECT DISTINCT ... UNION SELECT DISTINCT ..., those DISTINCTs can safely be ignored. And so on. Why should I, as a user, have to devote time and effort to figuring out whether some DISTINCT is going to be a performance hit and whether it’s logically safe to omit it?—and to remembering all of the details of SQL’s inconsistent rules for when duplicates are automatically eliminated and when they’re not?

Page 115: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 115

Well, I could go on. However, I decided—against my own better judgment, but in the interest of maintaining good relations (with my reviewers, I mean)—not to follow my own advice elsewhere in this book but only to request duplicate elimination explicitly when it seemed to be logically necessary to do so. It wasn’t always easy to decide when that was, either. But at least now I can add my voice to those complaining to the vendors, I suppose.

Page 116: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 116

Recommendations:

• Make sure you know when SQL eliminates duplicates without you asking it to

• When you do have to ask, make sure you know whether it matters if you don’t

• When it does matter, specify DISTINCT/* but be annoyed about it */

• And never specify ALL!

SADLY, THEREFORE :

Page 117: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 117

WHY NULLS ARE BAD NEWS :

I assume you know:

Any comparison in which at least one comparand is null evaluates to UNKNOWN, not TRUE or FALSE

Rationale: Null means "value unknown" …

Hence three-valued logic (3VL)

3VL truth tables for NOT, AND, OR:

NOT AND T U F OR T U F T F T T U F T T T T U U U U U F U T U U F T F F F F F T U F

Page 118: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 118

S SNO CITY P PNO CITY S1 London P1

Nothing at all in CITY slot for part P1 !!!

Get SNO/PNO pairs where either the supplier and part cities are different or the part city isn’t Paris (or both):

SELECT DISTINCT S.SNO, P.PNOFROM S, PWHERE S.CITY <> P.CITYOR P.CITY <> ‘Paris’

"null"

NULLS (cont.) :

Page 119: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 119

Boolean expression in the WHERE clause:

( S.CITY <> P.CITY ) OR ( P.CITY <> ‘Paris’ )

For the only data we have, this becomes

( S.CITY <> null ) OR ( null <> ‘Paris’ )

UNKNOWN OR UNKNOWN UNKNOWN

Nothing retrieved!

NULLS (cont.) :

Page 120: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 120

But part P1 does have some corresponding city … i.e., the null does stand for some real value, say c

Either c is Paris or it is not

If it is, boolean expression becomes

( ‘London’ <> ‘Paris’ ) OR ( ‘Paris’ <> ‘Paris’ ) : TRUE

If it is not, boolean expression becomes

( ‘London’ <> c ) OR ( c <> ‘Paris’ ) : TRUE

because c is not Paris

So TRUE is the right answer … hence, 3VL DOES NOT MATCH REALITY !!! (Showstopper !!!)

NULLS (cont.) :

Page 121: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 121

SELECT PNOFROM PWHERE CITY = CITY

Message:

If you have nulls in your DB ...you’re getting wrong answers !!!

Note: Foregoing arguments apply to nulls and 3VL in general ... But SQL manages to introduce additional flaws of its own!

In particular, SQL represents "the third truth value" by NULL, not UNKNOWN (even though it does support an UNKNOWN keyword) ... Just as bad as representing zero by NULL !!!

EVEN MORE TRIVIAL EXAMPLE :

Page 122: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 122

TO SUM UP :By definition, a null isn’t a value … THEREFORE:

A "type" that contains a null isn’t a type

A "tuple" that contains a null isn’t a tuple

A "relation" that contains a null isn’t a relation

In fact, nulls violate The Information Principle/* see later */

Which means the entire edifice crumbles, and ALL BETS ARE OFF !!!

MUCH more that could be said—but not here ...

Page 123: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 123

AVOIDING NULLS IN SQL :

RM prohibits nulls ... So to use SQL relationally, we must prevent them from occurring

Base tables: Specify NOT NULL for every column

Derived tables: Many ops can produce nulls ...

"Set functions" such as SUM all return null if argument is empty (except for COUNT and COUNT(*), which correctly return zero)

If scalar subquery evaluates to an empty table, that table is coerced to null

Page 124: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 124

If row subquery evaluates to an empty table, that table is coerced to a row of all nulls /* not a null row! */

Outer join, union join

If ELSE omitted from CASE, ELSE NULL assumed

If x = y, NULLIF(x,y) returns null

ON DELETE SET NULL, ON UPDATE SET NULL

Page 125: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 125

STRONG RECOMMENDATIONS :

Base tables: Specify NOT NULL for every column/* is this a duplicate recommendation? */

Don’t use NULL keyword in any other context

Don’t use UNKNOWN keyword anywhere

Don’t omit ELSE from CASE

Don’t use NULLIF

Don’t use outer join except as noted below

Don’t use union join

Don’t specify PARTIAL or FULL on MATCH

Page 126: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 126

STRONG RECOMMENDATIONS (cont.) :

Don’t use MATCH on foreign key constraints

Don’t use IS DISTINCT FROM

Don’t use IS [NOT] TRUE or IS [NOT] FALSE

Do use COALESCE on every exp that might otherwise "evaluate to null" ... e.g.:

SELECT S.SNO , ( SELECT COALESCE ( SUM ( ALL QTY ) , 0 )

FROM SP /* this ALL is OK! */WHERE SP.SNO = S.SNO ) AS TOTQ

FROM S

Page 127: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 127

A REMARK ON OUTER JOIN :Should generally be avoided (shotgun marriage): Forces tables into a kind of union [sic!] even when they fail to conform to requirements for union /* see later */ by, in effect, padding with nulls before doing the union

But why not pad with proper values?—

SELECT SNO , PNO SNO PNOFROM SPUNION S1 P1SELECT SNO , ‘nil’ AS PNO S1 P2FROM S S1 P3WHERE SNO NOT IN .. .. ( SELECT SNO FROM SP ) S5 nil

Page 128: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 128

A REMARK ON OUTER JOIN (cont.) :

Could achieve same result via disciplined (“clean”) use of explicit outer join plus COALESCE:

SELECT SNO , COALESCE ( PNO , ‘nil’ ) AS PNOFROM ( S NATURAL LEFT OUTER JOIN SP ) AS POINTLESS

/* re that POINTLESS ... don’t even ask (yet?) */

Page 129: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 129

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 130: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 130

BASE RELVARS, BASE TABLES :

Assume for simplicity until further notice that:

• All relvars are base relvars

• All table variables are base table variables

Special considerations* that apply to other kinds of relvars / other kinds of table variables—to views in particular—will be covered later

* Such as they are

Page 131: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 131

DATA DEFINITIONS :

VAR S BASE RELATION CREATE TABLE S { SNO CHAR , ( SNO VARCHAR(5) NOT NULL ,

SNAME CHAR , SNAME VARCHAR(25) NOT NULL ,STATUS INTEGER , STATUS INTEGER NOT NULL ,CITY CHAR } CITY VARCHAR(20) NOT NULL ,

KEY { SNO } ; UNIQUE ( SNO ) ) ;

VAR P BASE RELATION CREATE TABLE P { PNO CHAR , ( PNO VARCHAR(6) NOT NULL ,

PNAME CHAR , PNAME VARCHAR(25) NOT NULL ,COLOR CHAR , COLOR CHAR(10) NOT NULL ,WEIGHT FIXED , WEIGHT NUMERIC(5,1) NOT NULL ,CITY CHAR } CITY VARCHAR(20) NOT NULL ,

KEY { PNO } ; UNIQUE ( PNO ) ) ;

Page 132: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 132

VAR SP BASE RELATION CREATE TABLE SP { SNO CHAR , ( SNO VARCHAR(5) NOT NULL ,

PNO CHAR , PNO VARCHAR(6) NOT NULL ,QTY INTEGER } QTY INTEGER NOT NULL ,

KEY { SNO , PNO } UNIQUE ( SNO, PNO ) , FOREIGN KEY { SNO } FOREIGN KEY ( SNO ) REFERENCES S REFERENCES S ( SNO ) , FOREIGN KEY { PNO } FOREIGN KEY ( PNO )

REFERENCES P ; REFERENCES P ( PNO ) ) ;

Page 133: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 133

INSERT inserts a set of tuples / DELETE deletes a set of tuples / UPDATE updates a set of tuples

Thus, e.g., "UPDATE tuple t" really means "update a set of tuples that happens to be of cardinality one" ...

... and isn’t always possible!

Suppose suppliers S1 and S4 must be in the same city (integrity constraint for relvar S)

Then updating, e.g., just the city for S1 must fail

Instead (e.g.):

UPDATING IS SET LEVEL/* actually ALL rel ops are set level */ :

Page 134: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 134

UPDATE S UPDATE SWHERE SNO = ‘S1’ SET CITY = ‘New York’ OR SNO = ‘S4’ : WHERE SNO = ‘S1’ { CITY := ‘New York’ } ; OR SNO = ‘S4’ ;

• Implications: (a) Integrity checking and triggered actions mustn’t be done till all updating has been done (set level op is not a sequence of tuple level ops) /* more on integrity later */ ... (b) UPDATE / DELETE via cursor make no sense!

• Recommendation: Avoid row level ops (cursor updates in particular) unless you know integrity problems won’t occur

Page 135: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 135

Tuples are values and CAN'T be updated!

"Updating a set of tuples" really means replacing one set of tuples by another ...

R := ( R MINUS old ) UNION new ;

where old and new are relations (of same type as R) containing the old and new tuples, respectively

Likewise: "Updating attribute A within tuple t" is also sloppy—though useful!—shorthand

WHAT’S MORE :

Page 136: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 136

• R := rx ; /* generic form */

• "INSERT R rx ;" shorthand for:

R := R D_UNION rx ;

"disjoint union"

• "DELETE R WHERE bx ;" shorthand for:

R := R WHERE NOT ( bx ) ;

• "UPDATE R WHERE bx : { ... } ;" shorthand for:

/* see later */ attribute assignment commalist

RELATIONAL ASSIGNMENT :

Page 137: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 137

• INSERT / DELETE / UPDATE directly analogous to Tutorial D counterparts ... Two points on INSERT:

INSERT INTO T [ ( column name commalist ) ] tx ;

1. tx often but not always a VALUES exp ... INSERT really does insert a set of rows /* not true historically! */

2. Recommendation: State column names explicitly. E.g.:

INSERT INTO SP ( PNO , SNO , QTY ) /* good */VALUES ( ‘P6’ , ‘S4’ , 700 ) ,

( ‘P6’ , ‘S5’ , 250 ) ;

INSERT INTO SP /* bad—relies on column ordering */VALUES ( ‘S4’ , ‘P6’ , 700 ) ,

( ‘S5’ , ‘P6’ , 250 ) ;

UPDATING IN SQL :

Page 138: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 138

No SQL counterpart to relational assignment as such ... Best approximation:

R := rx ; DELETE FROM T ;INSERT INTO T ( ... ) tx ;

SQL could fail where Tutorial D succeeds

The Assignment Principle:

After assignment of v to V, v = V must give TRUE

Very simple ... but far reaching consequences!

Page 139: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 139

Let K be a subset of the heading of relvar R. Then K is acandidate key (or just key) for R iff:

1. Uniqueness: No possible value of R has two distinct tuples with the same value for K

2. Irreducibility: No proper subset of K has the uniqueness property

E.g., {SNO}, {PNO}, {SNO,PNO} for relvars S, P, SP, resp.

EVERY RELVAR HAS AT LEAST ONE CANDIDATE KEY (why?) :

Page 140: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 140

Strong recommendation: Every CREATE TABLE should have at least one UNIQUE and/or PRIMARY KEY specification

Note: We don’t insist on primary keys as such, but do usually follow PK discipline ourselves (marked by double underlining)

Key values are tuples! Key uniqueness relies on tuple equality! ... Number of attributes is degree of key

Keys apply to relvars, not relations (why?)

Note: System can enforce uniqueness but can’t enforce irreducibility

POINTS ARISING :

Page 141: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 141

Why irreducibility?

Because if system knows only that, e.g., {SNO,CITY} values have uniqueness property, it will be enforcing the WRONG INTEGRITY CONSTRAINT

Recommendation: Never lie to the DBMS!

A subset SK of the heading of R that’s unique but not necessarily irreducible is a superkey

Uniqueness of SK implies that the functional dependence /* see later */ SK A is satisfied by R for all subsets A of the heading of R

i.e., ALWAYS have "arrows out of superkeys"

Page 142: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 142

VAR TAX_BRACKET BASE RELATION { LOW MONEY, HIGH MONEY, PERCENTAGE INTEGER } KEY { LOW } KEY { HIGH } KEY { PERCENTAGE } ;

VAR ROSTER BASE RELATION { DAY DAY, HOUR HOUR, GATE GATE, PILOT NAME } KEY { DAY, HOUR, GATE } KEY { DAY, HOUR, PILOT } ;

VAR MARRIAGE BASE RELATION { SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE } KEY { SPOUSE_A, DATE_OF_MARRIAGE } KEY { DATE_OF_MARRIAGE, SPOUSE_B } KEY { SPOUSE_B, SPOUSE_A } ;

RELVARS CAN HAVE N KEYS(N > 1) :

Page 143: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 143

• Let R1 and R2 be relvars, not necessarily distinct, and let K be a key for R1

• Let FK be a subset of the heading of R2 such that there exists a possibly empty sequence of attribute renamings on R1 that maps K into K’ (say), where K’ and FK contain exactly the same attributes

• Let R2 and R1 be subject to the constraint that, at all times, every tuple t2 in R2 has an FK value that’s the K’ value for some (necessarily unique) tuple t1 in R1 at the time in question

• Then FK is a foreign key (with the same degree as K); the associated constraint is a referential constraint; and R2 and R1 are the referencing relvar and the corresponding referenced relvar, respectively, for that constraint

SOME RELVARS HAVE FOREIGN KEYS :

Page 144: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 144

E.g., {SNO} and {PNO} in relvar SP

Referential integrity rule: DB must never contain any unmatched FK values

Note reliance on tuple equality again ... Another example:

VAR EMP BASE RELATION CREATE TABLE EMP { ENO CHAR , ( ENO VARCHAR(6) NOT NULL ,

MNO CHAR , MNO VARCHAR(6) NOT NULL ,... } ..... ,

KEY { ENO } UNIQUE ( ENO ) , FOREIGN KEY { MNO } FOREIGN KEY ( MNO ) REFERENCES EMP { ENO } REFERENCES EMP ( ENO ) ) ;

RENAME ( ENO AS MNO ) ;

Page 145: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 145

Column matching in SQL done by ordinal position, not by name, so renaming not nec ... though corresp columns must be of same type (no coercion)

Recommendation: Nevertheless, ensure that corresp columns do have the same name if possible

Can’t follow this recommendation if either:

Table T has FK matching key of T itself (as in EMP)

Table T2 has two distinct FKs both matching same key in table T1 (as in bill of materials)

So do the best you can ...

Page 146: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 146

REFERENTIAL ACTIONS/* e.g., cascade delete */ :

Not part of RM as such ... Supported by SQL but not byTutorial D /* yet */

RM = foundation of the DB field, but only the foundation ...Nothing wrong with additional features, so long as theydon’t violate RM and are in spirit of RM and are useful:

Type theory

Recovery and concurrency (?)

Triggered procedures ... Referential actions a special case, though specified declaratively ... OK so long as set level not row level (?) ... OK so long as they don’t violate The Assignment Principle (but they usually do)

Page 147: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 147

Heading corresponds to a predicate (truth valued function): e.g.,

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in CITY

Parameters (SNO, SNAME, STATUS, CITY in the example) stand for values of the relevant types

Tuples represent true propositions ("instantiations" of thepredicate that evaluate to TRUE), obtained by substitutingarguments for the parameters: e.g.,

Supplier S1 is under contract, is named Smith, has status 20, and is located in London

(Very important!) WAY OF THINKING ABOUT RELVARS :

Page 148: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 148

THUS :• Every relvar has associated relvar predicate (or

meaning or intended interpretation or intension)

• If relvar R has predicate P, then every tuple t in R at time x represents proposition p, derived by invoking (or instantiating) P at time x with t’s attrib values as arguments

Body of R at time x is extension of P at time x

• The Closed World Assumption: Relvar R contains, at any given time, all and only the tuples that represent true propositions (true instantiations of the predicate for R) at the time in question

Loosely: Everything the DB says (or implies) is true, everything else is false

Page 149: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 149

TYPES are sets of things we can talk about;

RELATIONS are (true) statements about those things!

Note three very important corollaries ...

RELATIONS vs. TYPES :

Page 150: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 150

1. Types and relations are both NECESSARY

2. They're not the same thing (logical difference!)

3. They're SUFFICIENT (as well as necessary)*

A DB (with ops) is a logical system!

This was Codd’s great insight ... and it’s why RM is rock solid, and "right," and will endure ... and why other "data models" are just not in the same ballpark

* Need relvars too for changes over time

Page 151: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 151

TYPES are to RELATIONS

as

NOUNS are to SENTENCES

A NICE ANALOGY :

Page 152: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 152

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 153: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 153

SOME PRELIMINARIES :

• Reminder re closure and nested exps

• Ops are generic and read-only

• But exps (op invocations) can include relvar refs: e.g.,

R1 UNION R2 /* R1 and R2 are relvar names */

• Relvar ref is itself a rel exp* (op is "return value of")

• INSERT / DELETE / UPDATE / relational assignment are rel ops but not rel algebra ops: Caveat lector!

* Not in SQL, though!—e.g., T1 UNION T2 illegal and so is T (must say, e.g., SELECT * FROM T)

Page 154: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 154

Tutorial D vs. SQL :Overriding point = when correspondence needs to beestablished between operand attributes (as in JOIN):

Tutorial D requires corresponding attributes to be, formally, the very same attribute ... E.g.:

P JOIN S /* join P and S "on CITY" */

SQL uses different techniques in different contexts: ordinal position, explicit specification, same name (not always same type) ... E.g.:

SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , P.CITY /* or S.CITY

*/ ,S.SNO , S.SNAME , S.STATUS

FROM P , SWHERE P.CITY = S.CITY /* explicit specification */

Page 155: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 155

OR :SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , P.CITY

/* or S.CITY */ ,

S.SNO , S.SNAME , S.STATUSFROM P JOIN SON P.CITY = S.CITY

SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , CITY ,S.SNO , S.SNAME , S.STATUSFROM P JOIN S not P.CITYUSING ( CITY ) or S.CITY!

SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , CITY ,S.SNO , S.SNAME , S.STATUSFROM P NATURAL JOIN S

Page 156: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 156

POINTS ARISING :• SQL permits, and sometimes requires, dot qualified names;

Tutorial D doesn’t • Tutorial D sometimes needs to rename attributes to avoid

naming clashes or mismatches; SQL usually doesn’t (though it does support column renaming for other reasons)

• Tutorial D has no need for "correlation names"/* see later */

• SQL supports features of rel calculus as well as features of rel algebra; Tutorial D doesn’t /* see later */

• SQL requires most queries to conform to SELECT - FROM - WHERE template; Tutorial D has nothing analogous

/* see later */

Page 157: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 157

MORE ON CLOSURE :Result of every rel op is a relation ... Any op that producesa result that’s not a rel isn’t a rel op!*

E.g., in SQL, any op that produces a result with:

Duplicate rows Anonymous columns Nulls Duplicate column names Left to right column

ordering

Strong recommendation: Don’t use any op that violatesclosure if you want the result to be amenable to furtherrelational processing

* Except for relational inclusion (?)

Page 158: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 158

Closure doesn’t mean intermediate results have to be materialized (popular misconception!) ... E.g.:

( P JOIN S ) SELECT P.* , SNO , SNAME , STATUSWHERE PNAME > SNAME FROM P , S

WHERE P.CITY = S.CITYAND P.PNAME > S.SNAME

Can pipeline join result to restriction op

But another important point here:

"PNAME > SNAME" applies to result of P JOIN S ... so names PNAME and SNAME refer to attributes of that result !!!

Page 159: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 159

How do we know that result has such attributes? What is the heading of that result? More generally: What’s the heading for the result of any algebraic operation?

Need relation type inference rules such that, given headings (and hence types) of input rels, we can infer heading (and hence type) of output rel

RM includes such rules ... E.g., P JOIN S is of type:

RELATION { PNO CHAR , PNAME CHAR , COLOR CHAR , WEIGHT FIXED , CITY CHAR , SNO CHAR , SNAME CHAR , STATUS INTEGER }

In fact need for such rules is implied by closure

Page 160: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 160

S RENAME ( CITY AS SCITY ) SELECT SNO , SNAME , STATUS , S.CITY AS SCITYFROM S

Result identical to current value of S except for renaming

SNO SNAME STATUS SCITY Note: Relvar S not changedS1 Smith 20 London in the DB!S2 Jones 10 Paris S3 Blake 30 Paris ... not like S4 Clark 20 London ALTER TABLE in S5 Adams 30 Athens SQL

Needed primarily as a preliminary to performing, e.g.,UNION or JOIN /* see later */

RENAME :

Page 161: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 161

HOW DOES SQL HANDLE"TABLE TYPE" INFERENCE ???

Answer: Not very well! • No proper notion of table type anyway• Result can have anonymous columns• Result can have duplicate column names• Result has left to right column ordering)

Strong recommendation: Use column renaming disciplinedescribed earlier—which effectively relied on SQL-stylecolumn renaming (AS specifications)—to ensure that SQLconforms as far as possible to relational rules

(

Page 162: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 162

( P JOIN S ) SELECT P.* , SNO , SNAME , STATUSWHERE PNAME > SNAME FROM P , S

WHERE P.CITY = S.CITYAND P.PNAME > S.SNAME

“P.PNAME > S.SNAME” applies to result of join ... ???

Actually quite difficult to explain this at all ... The standard does explain it, but the machinations involved are much more complicated than RM type inference rules ... Details beyond the scope of this seminar !!!

In any case, you’re supposed to know SQL, so you already know how this works (right?) ... Or had you never thought about this issue before?

EXAMPLE REVISITED :ANOTHER POINT

Page 163: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 163

THE ORIGINAL OPERATORS :

restriction /* aka selection */

projection

JOIN, TIMES

theta join/* see later */

UNION, INTERSECT, MINUS

DIVIDEBY /* see much later */

Page 164: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 164

RESTRICT :P WHERE WEIGHT < 12.5 SELECT P.*

FROM Pboolean exp in WHERE WEIGHT < 12.5which every attrib - - - - - - - - - - - - -ref identifies Note: WHERE inattrib of P and there Tutorial D is moreare no relvar refs general

Result has same heading as P and body = tuples of P forwhich boolean exp evaluates to TRUE

PNO PNAME COLOR WEIGHT CITY

P1 Nut Red 12.0 LondonP5 Cam Blue 12.0 Paris

Page 165: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 165

PROJECT :P { COLOR , CITY } SELECT DISTINCT COLOR , CITY

FROM P

Result has heading as specified:

COLOR CITY Note: Duplicates eliminated!

Red London Tutorial D also supportsGreen Paris projection on ALL BUT specifiedBlue Oslo attribs ... Similarly for otherBlue Paris ops where it makes sense

Page 166: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 166

(Natural) JOIN :Rels r1 and r2 joinable iff attribs with same name are of sametype (i.e., iff set theory union of headings is a legal heading)/* concept relevant to other ops as well as join */

P JOIN S SELECT P.* , SNO , SNAME , STATUSFROM P , SWHERE P.CITY = S.CITY

Result heading = set theory union of headings of P and S ...Result body = set of all tuples t where t is the set theory union of tuple from P and tuple from S

PNO PNAME COLOR WEIGHT CITY SNO SNAMESTATUS

P1 Nut Red 12.0 London S1 Smith 20 .. ... ... .... ...... .. ..... ..P6 Cog Red 19.0 London S4 Clark 20

Page 167: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 167

ALTERNATIVE SQL FORMULATION :

SELECT *FROM P NATURAL JOIN S

Result heading has columns

CITY, PNO, PNAME, COLOR, WEIGHT, SNO, SNAME, STATUS

in that order ... but don’t write code that relies on this ordering!

Page 168: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 168

POINTS ARISING :Let r1 and r2 be joinable

Let common attributes (set theory intersection of headings) be {Y} ... Let other attributes of r1 and r2 be {X} and {Z}, resp. ...Join has heading = set theory union of {X}, {Y}, and {Z}

If {X} and {Z} are empty, {Y} = entire heading of r1 and r2, andr1 JOIN r2 degenerates to r1 INTERSECT r2

E.g.: S { CITY } JOIN P { CITY }

same as

S { CITY } INTERSECT P { CITY }

Page 169: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 169

If {Y} is empty, r1 and r2 have no common attrib names,and r1 JOIN r2 degenerates to r1 TIMES r2

E.g.: S { ALL BUT CITY } JOIN P { ALL BUT CITY }

same as

S { ALL BUT CITY } TIMES P { ALL BUT CITY }

Direct support for TIMES included for psychological reasonsrather than logical ones (likewise for INTERSECT)

Note: For TIMES, operand rels must have no common attrib names

Page 170: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 170

Can usefully define n-adic JOIN also (n > 0)*

JOIN { r1 , r2 , ... , rn }

JOIN { r } r

JOIN { } ??? Answer: TABLE_DEE !!!

TABLE_DEE is the identity with respect to JOIN /* important! */

* Why exactly is this possible? See later ...

Page 171: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 171

1. t1 NATURAL JOIN t2 /* already explained */

2. t1 JOIN t2 ON bx

3. t1 JOIN t2 USING ( C1 , C2 , ... , Cn )

4. t1 CROSS JOIN t2 /* ( SELECT * FROM t1 , t2 ) */

2. t1 JOIN t2 ON bx ... logically equivalent to:

( SELECT * FROM t1 , t2 WHERE bx )

EXPLICIT JOINS IN SQL :

Page 172: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 172

3. t1 JOIN t2 USING ( C1 , C2 , ... , Cn ) equivalent to:

( SELECT * FROM t1 , t2 WHERE t1.C1 = t2.C1 AND ... AND t1.Cn =

t2.Cn )

—except that columns C1, C2, ..., Cn appear only once in result, and result column ordering is:

first C1, C2, ..., Cn (in that order)then other columns of t1 (in same order as in t1),then other columns of t2 (in same order as in t2)

/* Do you begin to see what a pain this left to right *//* ordering business is ??? */

EXPLICIT JOINS IN SQL (cont.) :

Page 173: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 173

1. NATURAL JOIN: First choice ... Usually most succinct if other recommendations followed ... But make sure columns with same name are of same type (joinability)

2. Avoid JOIN ON: Virtually guaranteed to produce duplicate column names (unless ... ???) ... If you must use it, do renaming as well

3. JOIN USING: Make sure columns with same name are of same type

4. CROSS JOIN: Make sure no common column names

5. WHERE (original syntax): As Case 2 (JOIN ON)

RECOMMENDATIONS :

Page 174: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 174

Operands must be of same type, result is of same type also ... Suppose parts have extra attribute STATUS, of type INTEGER:

P { STATUS , CITY } UNION SELECT STATUS , CITYS { CITY , STATUS } FROM P

UNION CORRESPONDINGSELECT CITY , STATUSFROM S

Note: Duplicates eliminated!—unless ALL specified, in SQL; result has attributes (columns) STATUS and CITY—in that order, in SQL

If CORRESPONDING not specified, column matching done on basis of ordinal position ... Don’t do this!

UNION, INTERSECT, MINUS :

Page 175: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 175

P { STATUS , CITY } INTERSECT SELECT STATUS , CITYS { CITY , STATUS } FROM P

INTERSECT CORRESPONDINGSELECT CITY , STATUSFROM S

P { STATUS , CITY } MINUS SELECT STATUS , CITYS { CITY , STATUS } FROM P

EXCEPT CORRESPONDINGSELECT CITY , STATUSFROM S

UNION, INTERSECT, MINUS (cont.) :

Page 176: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 176

Make sure corresponding columns have same name and type

Always specify CORRESPONDING if possible ...

... otherwise, make sure columns line up properly (because matching done by ordinal position): e.g.,

SELECT STATUS , CITY FROM PUNIONSELECT STATUS , CITY FROM S /* note

reordering */

Don’t use "BY (column name commalist)"

Never specify ALL! Note: Usual "justification" for ALL is performance ...

RECOMMENDATIONS :

Page 177: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 177

Tutorial D also supports:

“Disjoint union” (D_UNION)/* see defn of INSERT earlier */

n-adic UNION, INTERSECT, D_UNION (n > 0)/* but not MINUS !!! */

ONE LAST POINT :

Page 178: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 178

Already seen that INTERSECT and TIMES can be definedin terms of join ... i.e., not all ops primitive

Difference between primitive and useful !!!

One possible primitive set:

restrictprojectjoinuniondifference

But what about rename?

WHICH OPERATORS ARE PRIMITIVE ???

Page 179: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 179

Get pairs of supplier numbers such that the suppliers arecolocated (i.e., in same city):

( ( ( S RENAME ( SNO AS SA ) ) { SA , CITY } JOIN( S RENAME ( SNO AS SB ) ) { SB , CITY } )

WHERE SA < SB ) { SA , SB }

Or:

WITH ( S RENAME ( SNO AS SA ) ) { SA , CITY } AS R1 ,( S RENAME ( SNO AS SB ) ) { SB , CITY } AS R2 ,R1 JOIN R2 AS R3 ,R3 WHERE SA < SB AS R4 :

R4 { SA, SB }

"WITH" SPECIFICATIONS/* very useful feature */ :

Page 180: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 180

Operands the other way around: WITH name AS exp

No colon separator

In Tutorial D, WITH can be used with exps of any kind; in SQL, WITH can be used with table exps only

WITH T1 AS ( SELECT SNO AS SA , CITY FROM S ) ,T2 AS ( SELECT SNO AS SB , CITY FROM S ) ,T3 AS ( SELECT * FROM T1 NATURAL JOIN

T2 ) ,T4 AS ( SELECT * FROM T3 WHERE SA < SB )

SELECT SA , SB FROM T4

"WITH" IN SQL :

Page 181: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 181

Recall: Every relvar has a relvar predicate (i.e., what the relvar means)

This notion extends naturally to arbitrary rel exps!E.g., consider projection S {SNO,SNAME,STATUS} ...Denotes rel containing all tuples of the form

TUPLE { SNO sno , SNAME sn , STATUS st }

such that a tuple of the form

TUPLE { SNO sno , SNAME sn , STATUS st , CITY sc }

currently exists in relvar S for some CITY value sc ...In other words:

WHAT DO RELATIONAL EXPRESSIONS MEAN?

Page 182: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 182

Specified exp denotes current extension of predicate:

There exists some city CITY such that supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY

Or just: Supplier SNO is under contract, is named SNAME, has status STATUS, and is located somewhere

This predicate = meaning of S {SNO,SNAME,STATUS} ...Has three parameters (relation has three attributes);CITY is a bound variable, not a param /* see later */

Pred for arb rel exp can be determined from preds for relvarsinvolved plus semantics of rel ops involved

Page 183: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 183

THETA JOIN :

E.g.: "unequal" join of S and P on cities /* SQL only */ :

SELECT SNO , SNAME , STATUS , S.CITY AS SCITY , PNO , PNAME , COLOR , WEIGHT , P.CITY AS PCITY

/* 3. "project" */FROM S , P /* 1. cartesian product */WHERE S.CITY <> P.CITY /* 2. restrict */

Note the conceptual algorithm for evaluating a SELECT - FROM - WHERE exp (i.e., formal definition of semantics of such exps)

By the way: What if theta had been "=" ???

Page 184: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 184

Example: Suppliers who supply part P2, with corresp quantities (Tutorial D):

( ( S JOIN SP ) WHERE PNO = ‘P2’ ) { ALL BUT PNO }

DB : 100 suppliers, 100,000 shipments (500 for P2)

No optimization at all (worst case) :

1. Join 10,000,100 reads, 100,000 writes

2. Restrict (result 500 tuples) 100,000 reads, no writes

3. Project No reads, no writes

TOTAL: 10,200,100 tuple I/Os

EXPRESSION TRANSFORMATION : ("query rewrite") :

Page 185: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 185

1. Restrict (result 500 tuples) 100,000 reads, no writes

2. Join (result 500 tuples) 100 reads, no writes

3. Project No reads, no writes

TOTAL: 100,100 tuple I/Os(100 times better)

AN OBVIOUS IMPROVEMENT :

Page 186: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 186

In effect, optimizer has transformed original exp into

S JOIN ( SP WHERE PNO = ‘P2’ ) /* ignore projection */

Such transformations are one of the two great ideas at the heart of optimization

Other = cost based optimizing: E.g., index or hash on SP.PNO will reduce 1,000,000 reads in Step 1 to 500, and overall procedure now 20,000 times better than the original

But such optimizing has little to do with RM per se, except for strong logical vs. physical separation, which keeps access strategies out of applications

Page 187: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 187

THE DISTRIBUTIVE LAW :

E.g., SQRT ( a * b ) SQRT ( a ) * SQRT ( b )

"SQRT distributes over multiplication"/* but not over addition */

In RM, restrict distributes over UNION / INTERSECT / MINUS... also JOIN if restriction condition = AND of two separate conditions, one for each join operand

I.e., ( r1 WHERE bx1 ) JOIN ( r2 WHERE bx2 ) (( r1 JOIN r2 ) WHERE bx1 AND bx2

This law was used in the example

Net effect: Can do restrictions early

Page 188: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 188

Project distributes over UNION

I.e., ( r1 UNION r2 ) { X } r1 { X } UNION r2 { X }

Also distributes over JOIN provided all joining attribsare included in the projection

Can do projections early

Page 189: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 189

THE COMMUTATIVE LAW :

Dyadic Op is commutative iff a Op b b Op a

• In arith, "+" and "*" are commutative, "-" and "/" aren’t

• In RM, UNION / INTERSECT / JOIN are commutative, MINUS isn’t

• Hence, in (e.g.) r1 JOIN r2, system is free to choose, smaller of r1 and r2 (say) as "outer" rel and other as "inner" rel

Page 190: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 190

THE ASSOCIATIVE LAW :

Dyadic Op is associative iff a Op (b Op c) ( a Op b) Op c

• In arith, "+" and "*" are associative, "-" and "/" aren’t

• In RM, UNION / INTERSECT / JOIN are associative, MINUS isn’t

• Hence, in (e.g.) r1 JOIN r2 JOIN r3:

No parens necessarySystem is free to choose join sequence

Page 191: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 191

THE IDEMPOTENCE AND ABSORPTION LAWS :

Dyadic Op is idempotent iff a Op a a

In logic, AND and OR are idempotent

In RM, UNION / INTERSECT / JOIN are idempotent, MINUS isn’t

Absorption laws:

r1 UNION ( r1 INTERSECT r2 ) r1r1 INTERSECT ( r1 UNION r2 ) r1

Page 192: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 192

All such transformations can be done without regard for actualdata values or access paths!

Important note:

Many such transformations available for sets ...

But fewer for bags ...

And fewer still if column ordinal position has to be taken intoaccount ...

And far fewer if nulls and 3VL have to be taken into account ...

What do you conclude?

Page 193: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 193

E.g., P JOIN S ... What if STATUS attribute added to P?

Popular misconception!

1960s/1970s:

pgm DB Not much data

DB defindependence

Today:

pgm DB More dataDB def

independence(but ...)

BUT DOESN’T RELYING ON ATTRIBUTE NAMESMAKE FOR FRAGILE CODE ???

Page 194: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 194

The right way:

pgm DB Full dataDB def DB def

independence*

Note: Views should have solved this problem but didn’t... because mapping specified as part of the viewdefinition instead of separately

Recommendation: Adopt the "operate via views strategy"!

* Full logical data independence, to be precise

Page 195: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 195

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 196: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 196

ADDITIONAL OPERATORS :MATCHING, NOT MATCHING

EXTEND

image relations

DIVIDEBY

aggregate operators

SUMMARIZE

GROUP, UNGROUP

"what if"

ORDER BY (?)

Page 197: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 197

SEMIJOIN AND SEMIDIFFERENCE :Most exps involving join or difference really require semijoin or semidifference

r1 MATCHING r2 ( r1 JOIN r2 ) { H1 }where {H1} = heading of r1

S MATCHING SP SELECT S.* FROM SWHERE SNO IN ( SELECT SNO FROM SP )

r1 NOT MATCHING r2 r1 MINUS ( r1 MATCHING r2 )

S NOT MATCHING SP SELECT S.* FROM SWHERE SNO NOT IN ( SELECT SNO FROM SP )

If r1 and r2 of same type, r1 NOT MATCHING r2 degeneratesto r1 MINUS r2 /* analogous remark NOT true of semijoin */

Page 198: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 198

EXTEND P SELECT P.* ,ADD ( WEIGHT * 454 WEIGHT * 454 AS GMWT AS GMWT ) FROM P

PNO PNAME COLOR WEIGHT CITY GMWT Note: Relvar P not P1 Nut Red 12.0 London 5448.0 changed in P2 Bolt Green 17.0 Paris 7718.0 the DB! P3 Screw Blue 17.0 Oslo 7718.0 P4 Screw Red 14.0 London 6356.0 ... not like P5 Cam Blue 12.0 Paris 5448.0 ALTER TABLE P6 Cog Red 19.0 London 8626.0 in SQL

EXTEND :

Page 199: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 199

Get PNO and gram weight for parts with gram weight > 7000.0:

( ( EXTEND P ADD ( WEIGHT * 454 AS GMWT ) ) WHERE GMWT > 7000.0 ) { PNO, GMWT }

Contrast SQL:

SELECT PNO, ( WEIGHT * 454 ) AS GMWTFROM PWHERE ( WEIGHT * 454 ) > 7000.0 /* not GMWT > 7000.0 */

SELECT - FROM - WHERE template too rigid! (Lack of orthogonality) ... Need to apply WHERE to SELECT result, not FROM result

HENCE :

Page 200: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 200

Actually the standard does allow:

SELECT TEMP PNO , TEMP.GMWTFROM ( SELECT P.PNO , ( WEIGHT * 454 ) AS GMWT FROM P ) AS TEMP WHERE TEMP.GMWT > 7000.0

But does your favorite product support subqueries in the FROM clause?

Also, this style leads to references appearing (possibly a long way) before definitions ...

Page 201: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 201

IMAGE RELATIONS :

Image relation = "image" in some rel of some tuple(usually a tuple in some other rel)

E.g., image in SP of tuple in S for S4:

PNO QTY( SP WHERE SNO = ‘S4’ ) { ALL BUT

SNO } P2 200P4 300P5 400

Very useful and widely applicable concept! So we define ashorthand ...

Page 202: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 202

S WHERE ( !!SP ) { PNO } = P { PNO }

image in SP of "current" tuple relationalin S equality

I.e., get suppliers who supply all parts!

SNO SNAME STATUS CITY

S1 Smith 20 London

Image relation ref can’t appear wherever rel exp is general canappear, only in contexts where pertinent tuple well defined (e.g.,WHERE clause)

Page 203: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 203

SQL has no direct support for image rels ... SQL analog offoregoing example: /* can be simplified */

SELECT *FROM SWHERE NOT EXISTS

( SELECT PNOFROM SPWHERE SP.SNO = S.SNOEXCEPTSELECT PNOFROM P )

AND NOT EXISTS( SELECT PNO

FROM PEXCEPTSELECT PNOFROM PWHERE SP.SNO = S.SNO )

Page 204: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 204

S { SNO } /* suppliers */SP { SNO, PNO } /* supplier supplies part */PJ { PNO, JNO } /* part is used in project */J { JNO } /* projects */

Get all sno/jno pairs such that:

• SNO sno currently appears in S• JNO jno currently appears in J• Supplier sno supplies all parts used in project jno

( S JOIN J ) WHERE !!PJ !!SP

Easy ... but try it in SQL!

ANOTHER EXAMPLE :

Page 205: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 205

Should be dropped, IMHO/* so can skip this topic if you like */

• Any query that can be done via divide can be done better via image rels

• There are at least seven different divides!

• Doesn’t solve the problem it was originally, and specifically, meant to address

Original and simplest version:

Let heading of r2 be subset of heading of r1 (so r1 and r2 definitely joinable, by the way)

DIVIDEBY :

Page 206: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 206

r1 r2

X Y Y X

Dividend Divisor Result

r1 DIVIDEBY r2 r1 { X } NOT MATCHING( ( r1 { X } JOIN r2 ) NOT MATCHING r1 )

E.g., let RP be ( P WHERE COLOR = ‘Red’ ) ... Then

SP { SNO , PNO } DIVIDEBY RP { PNO }

Loosely (?): SNOs for suppliers who SNOsupply all red parts ...

Probably needs to be joined to S (?) S1

Page 207: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 207

AGGREGATE OPERATORS /* digression (?) */ :

In RM, agg op = op that derives a single value from the bag or set ofvalues of some attribute of some relation—or, for COUNT, from theentire rel. E.g.:

X := COUNT ( S ) ; SELECT COUNT ( * ) AS X/* X = 5 */ FROM S

Y := COUNT SELECT COUNT ( DISTINCT STATUS )

( S { STATUS } ) ; AS Y/* Y = 3 */ FROM S

Tutorial D syntax:

<agg op name> ( <relation exp> [, <exp> ] )

Page 208: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 208

Tutorial D EXAMPLES :SUM ( SP { QTY } ) /* 1000 */

SUM ( SP , QTY ) /* 3100 */

AVG ( SP , 3 * QTY ) /* 775 */

Legal <agg op name>s include:

COUNT SUM AVG MAX MIN AND OR XOR

The <exp> can include <attribute ref>s (in practice, almostalways does)

The <exp> must be omitted for COUNT ... Otherwise, can beomitted only if rel denoted by <relation exp> is of degree one, asin first example above

Page 209: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 209

WHAT ABOUT SQL ???SELECT COUNT ( * ) AS X FROM S

SELECT COUNT ( DISTINCT STATUS ) AS Y FROM S

SQL doesn’t really support agg ops at all!

Foregoing exps are summarizations, not aggregations; they don’tevaluate to 5 and 3, resp. ... instead, they evaluate to tablescontaining those counts:

X Y /* COUNT invocations are agg*/

/* op invocations, perhaps */

5 3 /* ... but they can’t appear */

/* as "stand alone" exps ...*/

/* only inside table exps*/

Page 210: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 210

IN OTHER WORDS :

Aggregation is treated in SQL as a special case ofsummarization (i.e., loosely, what’s represented by a SELECTexp with a GROUP BY) ... Note that the foregoing SELECTexps do have implicit GROUP BYs:

SELECT COUNT ( * ) AS XFROM SGROUP BY ( )

SELECT COUNT ( DISTINCT STATUS ) AS YFROM SGROUP BY ( )

SQL "aggregation" is, loosely, a SELECT exp without anexplicit GROUP BY

Page 211: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 211

Aggregation and summarization are often confused! ...Perhaps you can begin to see why

Picture confused still further because SQL often coerces tableresulting from an "aggregation" to the single row it contains, oreven doubly coerces it to the single value that row contains, as here:

SET X = ( SELECT COUNT ( * ) FROM S ) ;

SET Y = ( SELECT COUNT ( DISTINCT STATUS ) FROM S ) ;

Another oddity: Logical error in connection with SQL-styleaggregation and empty tables (I don’t mean the nulls problem) ...Details beyond the scope of this seminar

Page 212: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 212

BACK TO Tutorial D :

Image rels can be very useful in connection with agg ops... e.g.:

Suppliers for whom total shipment quantity, taken over allshipments, is less than 1000

S WHERE SUM ( !!SP , QTY ) < 1000

SQL "analog" (but note the trap!):

SELECT S.SNO , S.SNAME , S.STATUS , S.CITYFROM S , SPWHERE S.SNO = SP.SNOGROUP BY S.SNO , S.SNAME , S.STATUS , S.CITYHAVING SUM ( SP.QTY ) < 1000

Page 213: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 213

• Suppliers with fewer than three shipments:

S WHERE COUNT ( !!SP ) < 3

• Suppliers where maximum shipment quantity < twice minimum shipment quantity:

S WHERE MAX ( !!SP , QTY ) < 2 * MIN ( !!SP , QTY )

• Update suppliers where total shipment quantity < 1000, halving their status:

UPDATE S WHERE SUM ( !!SP , QTY ) < 1000 :{ STATUS := 0.5 *

STATUS } ;

Page 214: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 214

SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( PNO ) AS PCT )

/* Tutorial D (see later for SQL analog) ... *//* call this "SX1" for subsequent reference */

SNO PCT

S1 6 S2 2 S3 1 S4 3 S5 0

SUMMARIZE :

Note: COUNT ( PNO ) is not aninvocation of the agg opcalled COUNT!— which takesa rel as its argument ...So what is it ??? Hmmm ...

note this tuple in particular!

Page 215: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 215

Heading of PER rel must = that of some projection of SUMMARIZE rel ... If it actually is such a projection, can replace PER spec by BY spec as in SX2 here:

SUMMARIZE SP BY { SNO } ADD ( COUNT ( PNO ) AS PCT )

SNO PCT

S1 6 S2 2 S3 1 S4 3

SUMMARIZE (cont.) :

Misses S5, with count of 0 ...because BY { SNO } is shorthand for PER ( SP { SNO } )

Page 216: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 216

SELECT SNO , COUNT ( ALL PNO ) AS PCTFROM SPGROUP BY SNO

Summarizations typically formulated in SQL by means of SELECT exp with explicit GROUP BY /* but see later */

(Recall that "aggregations" typically have implicit GROUP BY)

But what about Example SX1 ??? Straightforward GROUP BY doesn’t do the job ... Instead:

EXAMPLE SX2 HAS A DIRECT SQL ANALOG :

Page 217: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 217

SELECT S.SNO , ( SELECT COUNT ( ALL PNO ) /* AS PCT ??? */FROM SPWHERE SP.SNO = S.SNO ) AS PCT

FROM S /* double coercion */

Example SX2 could be done the same way:

SELECT DISTINCT SPX.SNO, ( SELECT COUNT ( ALL SPY.PNO )

FROM SP AS SPYWHERE SPY.SNO = SPX.SNO ) AS PCT

FROM SP AS SPX

GROUP BY is logically redundant!

EXAMPLE SX1 IN SQL :

Page 218: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 218

/* SX3 : Slight variation on SX1 */

SUMMARIZE SP PER ( S { SNO } ) ADD ( SUM ( QTY ) AS TOTQ )

/* SQL analog ... or is it? */

SELECT S.SNO , ( SELECT SUM ( ALL QTY )FROM SPWHERE SP.SNO = S.SNO ) AS TOTQ

FROM S

/* SX4 : Slight variation on SX3 */

( SUMMARIZE SP PER ( S { SNO } ) ADD ( SUM ( QTY ) AS TOTQ ) )WHERE TOTQ > 250

Page 219: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 219

SELECT SNO , SUM ( ALL QTY ) AS TOTQFROM SPGROUP BY SNOHAVING SUM ( ALL QTY ) > 250 /* not TOTQ > 250 !!! */

Or:

SELECT DISTINCT SPX.SNO , ( SELECT SUM ( ALL SPY.QTY ) FROM SP AS SPYWHERE SPY.SNO = SPX.SNO )

AS TOTQFROM SP AS SPXWHERE ( SELECT SUM ( ALL SPY.QTY )

FROM SP AS SPYWHERE SPY.SNO = SPX.SNO ) > 250

HAVING is logically redundant!

SQL ANALOG /* or is it? */ :

Page 220: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 220

GROUP BY / HAVING formulations often more succinct

On the other hand, they sometimes give the "wrong" answer, or at least not the answer really wanted

Recommendations:

• If you use GROUP BY or HAVING, make sure you’re summarizing the right table (typically suppliers rather than shipments, in terms of our example)

• Watch out for empty sets ... Use COALESCE wherever necessary

Page 221: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 221

BACK TO Tutorial D :• Image rels can be very useful in connection with

summarization ... In fact, they make SUMMARIZE logically redundant!

SUMMARIZE SP PER ( S { SNO } ) ADD ( COUNT ( PNO ) AS PCT )

Or: EXTEND S { SNO } ADD ( COUNT ( !!SP ) AS PCT )

• For each supplier, get supplier details and total, maximum, and minimum shipment quantity:

EXTEND S ADD ( SUM ( !!SP , QTY ) AS TOTQ , MAX ( !!SP , QTY ) AS MAXQ , MIN ( !!SP , QTY ) AS MINQ )

/* note use of "multiple EXTEND" */

Page 222: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 222

• For each supplier, get supplier details, total shipment quantity, and grand total shipment quantity:

EXTEND S ADD SNO TOTQ GTOTQ ( SUM ( !!SP , QTY ) AS TOTQ ,

SUM ( SP , QTY ) AS GTOTQ ) S1 1300 3100 .. .... ....S5 0 3100

• For each city c, get c and total and average shipment quantities for all shipments for which supplier and part city are both c

WITH ( S JOIN SP JOIN P ) AS TEMP :EXTEND TEMP { CITY } ADD ( SUM ( !!TEMP , QTY ) AS

TOTQ , AVG ( !!TEMP , QTY ) AS

AVGQ )

Page 223: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 223

RECALL THESE RELATIONS :

R1 SNO PNO S2 P1 S2 P2S3 P2S4 P2 S4 P4S4 P5

R4 SNO PNO_REL

S2 PNO P1 P2

S3 PNO P2

S4 PNO P2

P4P5

Type of R4 =

RELATION { SNO CHAR ,

PNO_REL RELATION { PNO CHAR } }

Page 224: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 224

R1 GROUP ( { PNO } AS PNO_REL ) : gives R4

R4 UNGROUP ( PNO_REL ) : gives R1

SQL has no direct counterparts

Exercise: What does this do?—

EXTEND R1 { SNO } ADD ( !!R1 AS PNO_REL )

GROUP AND UNGROUP :

Page 225: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 225

What if parts in Paris were in Nice and their weight doubled?

UPDATE P WITH T1 ASWHERE CITY = ‘Paris’ : ( SELECT P.*

{ CITY := ‘Nice’ , FROM PWEIGHT := 2 * WEIGHT } WHERE CITY = ‘Paris’ ) ,

T2 AS/* read-only op !!! */ ( SELECT P.* , ‘Nice’ AS NC ,

2 * WEIGHT AS NWFROM T1 )

SELECT PNO , PNAME , COLOR ,NW AS WEIGHT ,NC AS CITY

FROM T2

"WHAT IF" QUERIES :

Page 226: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 226

WITH ( P WHERE CITY = ‘Paris’ ) AS R1 ,( EXTEND R1 ADD ( ‘Nice’ AS NC ,

2 * WEIGHT AS NW ) ) AS R2 ,R2 { ALL BUT CITY , WEIGHT } AS R3 :

R3 RENAME ( NC AS CITY , NW AS WEIGHT )

/* can now explain expansion of UPDATE statement: */

UPDATE P WHERE CITY = ‘Paris’ : { CITY := ‘Nice’ , WEIGHT := 2 * WEIGHT } ;

Expansion:

P := ( P WHERE CITY ‘Paris’ )UNION( UPDATE P WHERE CITY = ‘Paris’ :

{ CITY := ‘Nice’ , WEIGHT := 2 * WEIGHT } ) ;

Tutorial D EXPRESSION ISSHORTHAND FOR :

Page 227: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 227

WHAT ABOUT "ORDER BY" ???

Not a relational op (because result is not a relation) ... So not legal in relational exps, and hence not in view definitions etc.

Produces ordered list or sequence of tuples

Also, not a function

Result indeterminate (in general) … /* like many SQL expressions, in fact */

Also, produces a sequence of tuples, yet "<" and ">" aren't defined for tuples!

Page 228: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 228

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 229: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 229

INTEGRITY CONSTRAINTS :An integrity constraint is, loosely, a boolean expression that must evaluate to TRUE

Two basic kinds: Type constraints / database constraints

Constraints = really what DB management is all about!

• Talking of poor quality of education ...

Constraints are vital, and proper DBMS support for them is vital as well

• I don’t care how fast your system runs if I can’t trust the answers it’s giving me!

Page 230: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 230

TYPE CONSTRAINTS :

Define values that make up a given type ... For system defined types, not much to say ... So suppose for sake ofexample that quantities are of a user defined type, say QTY:

TYPE QTY /* quantities */ POSSREP QPR { Q INTEGER

CONSTRAINT Q 0 AND Q 5000 } ;

TYPE POINT /* geometric points in 2D space */ POSSREP CARTESIAN { X FIXED, Y FIXED

CONSTRAINT SQRT ( X ** 2 + Y ** 2 ) 100.0 } ;

Checked "immediately" (actually during selector operator invocations ... see next page)

Page 231: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 231

SELECTORS AND THE_ OPERATORS :

One selector per possrepOne THE_ op per possrep component

Examples:

• QPR ( 250 ) /* selector invocation *//* ... actually a literal */

Simplify QTY type def:

TYPE QTY POSSREP { Q INTEGERCONSTRAINT Q > 0 AND Q < 5000 } ;

Selector invocation becomes:

QTY ( 250 )

Page 232: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 232

SELECTORS AND THE_ OPERATORS (cont.) :

Examples (cont.):

• THE_Q ( QZ ) /* THE_ op invocation *//* (QZ is of type QTY) */

Simplify POINT type def:

TYPE POINT POSSREP { X FIXED , Y FIXED CONSTRAINT ... } ;

• POINT ( PX , PY ) /* POINT selector invocation */

• POINT ( 5.7 , -3.9 ) /* POINT literal */

• THE_X ( P ) /* THE_ op invocation */

Page 233: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 233

WHAT ABOUT SQL ???

SQL doesn’t support type constraints at all!

E.g.: CREATE TYPE QTY AS INTEGER FINAL ;

/* all available integers denote valid quantities ?!? */

So to constrain quantities further, must specify approp database constraint on every use of the type ... E.g.:

CREATE TABLE SP( SNO VARCHAR(5) NOT NULL ,

PNO VARCHAR(6) NOT NULL ,QTY QTY NOT NULL , … ,CONSTRAINT SPQC CHECK ( QTY >= QTY(0) AND

QTY <= QTY(5000) ) ) ;

Page 234: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 234

SQL does support selectors and THE_ ops (in effect), but doesn’t use these terms and support not entirely straightforward ... Further details beyond scope of this seminar

POINT example in SQL:

CREATE TYPE POINT AS( X NUMERIC(5,1) , Y NUMERIC(5,1) ) NOT FINAL ;

Recommendation: Use database constraints to make upfor SQL’s lack of type constraints

Duplication of effort much better than having bad data in the database!

Page 235: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 235

CONSTRAINT CX1 IS_EMPTY CREATE ASSERTION CX1 CHECK( S WHERE STATUS < 1 ( NOT EXISTS

OR STATUS > 100 ) ; ( SELECT * FROM S WHERE STATUS < 1 OR STATUS > 100 ) ) ;

CONSTRAINT CX2 IS_EMPTY CREATE ASSERTION CX2 CHECK( S WHERE CITY = ‘London’ ( NOT EXISTS AND STATUS 20 ) ; ( SELECT * FROM S

WHERE CITY = ‘London’ AND STATUS <> 20 ) ) ;

• CX1 and CX2 are "tuple" (or "row") constraints:Deprecated terms

DATABASE CONSTRAINTS :

Page 236: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 236

CONSTRAINT CX3 CREATE ASSERTION CX3 CHECKCOUNT ( S ) = ( UNIQUE ( SELECT SNO COUNT ( S { SNO } ) ; FROM S ) ) ;

• {SNO} is a superkey for S

• In practice would use KEY or UNIQUE shorthand

• Note: UNIQUE in SQL returns TRUE iff every row in its argument table is distinct /* more later */

• Alternative SQL formulation:

CREATE ASSERTION CX3 CHECK ( ( SELECT COUNT ( SNO ) FROM S ) =

( SELECT COUNT ( DISTINCT SNO ) FROM S ) ) ;

Page 237: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 237

CONSTRAINT CX4 CREATE ASSERTION CX4 CHECKCOUNT ( S { SNO } ) = ( NOT EXISTS ( SELECT *COUNT ( S { SNO , CITY } ) ; FROM S AS SX

WHERE EXISTS ( SELECT *FROM S AS SY

WHERE SX.SNO = SY.SNOAND SX.CITY <> SY.CITY ) ) ) ;

• Functional dependence {SNO} {CITY}

• In practice this FD implied by fact that {SNO} is a superkey, so no need to state CX4 explicitly ... but not all FDs are consequences of keys

• But most will be, if DB well designed!

Page 238: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 238

CONSTRAINT CX5 IS_EMPTY CREATE ASSERTION CX5 CHECK( ( S JOIN SP ) ( NOT EXISTS

WHERE STATUS < 20 ( SELECT *AND PNO = ‘P6’ ) ; FROM S NATURAL JOIN SP

WHERE STATUS < 20AND PNO = ‘P6’ ) ) ;

• "Multi-relvar" constraint: Slightly deprecated term

• CX1-CX4 were single-relvar constraints, or just relvar constraints for short: Slightly deprecated terms

Page 239: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 239

CONSTRAINT CX6 CREATE ASSERTION CX6 CHECKSP { SNO } S { SNO } ; ( NOT EXISTS

( SELECT SNOFROM SPEXCEPTSELECT SNOFROM S ) ) ;

• Foreign key constraint from SP to S

• In practice would use FOREIGN KEY shorthand (at least in SQL)

Page 240: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 240

DATABASE CONSTRAINTS IN SQL :

Any DB constraint expressible in Tutorial D can be expressed in SQL via CREATE ASSERTION (unless "possibly nondeterministic" ???)

But SQL also supports base table constraints ... e.g.:

CREATE TABLE SP( ... , CONSTRAINT CX5 CHECK

( PNO <> ‘P6’ OR ( SELECT STATUS FROM SWHERE SNO = SP. SNO ) > 20 ) ) ;

Equivalent formulation could be specified on base table S instead—or any base table in the database!

Useful for "row constraints" but not for other kinds

Page 241: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 241

CREATE TABLE S ( ... ,CONSTRAINT CX1 CHECK ( STATUS >= 1 AND STATUS <= 100 ) ) ;

CREATE TABLE S ( ... ,CONSTRAINT CX2 CHECK ( STATUS = 20 OR CITY <> ‘London’ ) ) ;

SQL also supports column constraints ... e.g., NOT NULL, and key constraints for keys of degree one

Note:

Base table constraint for T automatically satisfied if T is empty (!)

(Important) Most current products support simple row constraints (plus key and FK constraints) only !!!

Page 242: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 242

OK, so I saved the bad news till last ...

Recommendations:

State constraints declaratively wherever possible

Use triggered procedures to enforce constraints that can’t be stated declaratively

See Applied Mathematics for Database Professionals, by Lex de Haan and Toon Koppelaars (Apress, 2007)

Lobby the vendors!

Page 243: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 243

Distinction single- vs. multi-relvar constraints is more pragmatic than logical ... because:

Like single-relvar constraints, multi-relvar constraints must be checked "immediately" !!!

All constraints must be satisfied at statement boundaries —no "deferred" or COMMIT-time checking at all! (contrary to SQL standard and some commercial products)

In order to explain this unorthodox view, I need to digressfor a moment and talk about transactions ...

Page 244: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 244

THE "ACID" PROPERTIES :

Atomicity: Transactions are "all or nothing"

Consistency: Transactions transform a consistent state of the DB into another consistent state, without necessarily preserving consistency at all intermediate points

Isolation: Any given transaction's updates are concealed from all other transactions until the given transaction commits

Durability: Once a transaction commits, its updates survive in the DB, even if there's a subsequent system crash

Page 245: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 245

One argument in favor of transactions has always been that transactions are supposed to be a unit of integrity (see "Consistency" on previous page)

But I no longer believe this argument!—I now think statements have to be that "unit of integrity”—i.e.,to repeat, constraints must be satisfied at statement boundaries

Why have I changed my mind?

For at least five reasons:

Page 246: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 246

FIRST AND MOST IMPORTANT :

As we have seen, a DB can be regarded as a collection of propositions, assumed by convention to be ones that evaluate to TRUE

And if that collection is ever allowed to include any inconsistencies, then all bets are off!

I'll come back to this point later ...

The "I" property might mean that only one transaction eversees any particular inconsistency, but that particular transaction does see the inconsistency and can thus produce wrong answers

Page 247: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 247

SECOND :

I don't agree that any given inconsistency can be seen by only one transaction, anyway ... E.g.:

Suppose transaction TX1 obtains some incorrect information from the DB and writes it to file F

Suppose transaction TX2 now reads that same information from file F

TX1 has "infected" TX2 ... TX1 and TX2 aren't really isolated from each other ... Even if they run at totally different times!

I don't believe in the "I" property of transactions

Page 248: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 248

THIRD : Don't want every program or other “code unit” to have

to cater for the possibility that the DB might be inconsistent when it runs!

Severe loss of orthogonality if a procedure that assumes consistency becomes unsafe to use when checking is deferred

Desirable to be able to specify a code unit independently of whether that unit is to run as a transaction per se or as part of a transaction

In fact, I’d like nested transactions ... but that's a topic for another day

Page 249: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 249

FOURTH :

The Principle of Interchangeability (of base relvars andviews—see later) implies that the very same constraint might be a single-relvar constraint with one design for the DB and a multi-relvar constraint with another

E.g., VAR LS VIRTUAL ( S WHERE CITY = ‘London’ ) ; VAR NLS VIRTUAL ( S WHERE CITY ‘London’ ) ;

Instead of S being real and LS and NLS virtual, we could make LS and NLS real and S virtual!—S is the union of restrictions LS and NLS, and mapping works both ways

/* more on interchangeability later */

Page 250: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 250

SNO unique in S — single-relvar constraint

SNO unique across LS and NLS — multi-relvar constraint

CONSTRAINT CX7 IS_EMPTY CREATE ASSERTION CX7 CHECK( LS { SNO } JOIN ( NOT EXISTS

NLS { SNO } ) ; ( SELECT *FROM LS , NLSWHERE LS.SNO = NLS.SNO

) ) ;

Page 251: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 251

FIFTH :Semantic optimization uses constraints to simplify queries (for performance reasons) ... E.g.:

Constraint: All red parts must be stored in London

Query: Find suppliers who supply only red parts and are located in the same city as at least one of the parts they supply

Find London suppliers who supply only red parts

Payoff could be orders of magnitude greater than that from conventional optimization ... but it requires DB to be consistent at all times, not just transaction boundaries (if constraints aren’t satisfied, simplifications will be invalid, and answers will be wrong)

Page 252: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 252

BUT DOESN'T SOME CHECKING HAVE TO BE DEFERRED ???

E.g., "Supplier S1 and part P1 are in the same city":

If supplier S1 moves from London to Paris, then part P1 must move from London to Paris as well

Conventional solution /* SQL */ :

START TRANSACTION ; UPDATE S SET CITY = ‘Paris’ WHERE SNO = ‘S1’ ; UPDATE P SET CITY = ‘Paris’ WHERE PNO = ‘P1’ ; COMMIT ; /* integrity check done here */

If this transaction asks "Are supplier S1 and part P1 in the same city?" between the two UPDATEs, it will get the answer no

Page 253: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 253

Tutorial D SOLUTION :

The multiple assignment operator lets us carry out several assignments as a single operation, without any integrity checking being done until all assignments have been executed:

UPDATE S WHERE SNO = ‘S1’ : { CITY := ‘Paris’ } ,UPDATE P WHERE PNO = ‘P1’ : { CITY := ‘Paris’ } ;

Note comma separator … One statement, not two!

Shorthand for:

S := … , P := … ;

Page 254: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 254

SEMANTICS /* slightly simplified */ :

1. Evaluate source expressions

2. Execute individual assignments in sequence

3. Do integrity checking

No individual assignment depends on any other ... No way for the transaction to see an inconsistent state of the DB between the two UPDATEs, because notion of "between the two UPDATEs" has no meaning ... Now no need for deferred checking at all!

Note: I’m not saying we don’t need transactions !!!

By the way: SQL already has some multiple assignment!

Page 255: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 255

Recommendation:

Given the state of today’s SQL products, some constraint checking will probably have to be deferred ...

In which case, you should do whatever it takes—probably terminate the transaction—to force the check to be donebefore performing any operation that might rely on the constraint being satisfied

Page 256: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 256

CONSTRAINTS AND PREDICATES :

Relvar predicate for R is "intended interpretation" for R … but it (and corresp propositions) aren’t and can’t be understoodby the system

System can't know what it means for a "supplier" to "be located" somewhere, etc.—that's interpretation

System can't know a priori whether what the user tells it is true!—can only check the integrity constraints ...

If OK, system accepts user assertion as true from this point forward

System can't enforce truth, only consistency !!!

Page 257: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 257

Correct implies consistent

Converse not true

Inconsistent implies incorrect

Converse not true

DB is correct iff it fully reflects the true state of affairs in the real world ... but the best the systemcan do is ensure the DB is consistent (= satisfies allknown integrity constraints)

Page 258: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 258

Let C1, C2, ..., Cn be all of the DB constraints that mention base relvar R. Then:

( C1 ) AND ( C2 ) AND ... AND ( Cn ) AND TRUE

is THE (total) relvar constraint for R

Let R1, R2, ..., Rm be all of the base relvars in DB, and let corresp (total) relvar constraints be RC1, RC2, ..., RCm, respectively. Then:

( RC1 ) AND ( RC2 ) AND ... AND ( RCm ) AND TRUE

is THE (total) database constraint for DB

Page 259: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 259

The Golden Rule:

No database is ever allowed to violate its total DB constraint

/* and therefore: */

No relvar is ever allowed to violate its total relvar constraint

Criterion for acceptability of updates ...

Total relvar constraint for R is system’s best approximation to relvar predicate for R

Page 260: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 260

CONSTRAINTS ARE VITAL !!!

Recall that a DB can be regarded as a collection of propositions ... and if that collection is ever allowed to include any inconsistencies, all bets are off!

Proof:

Suppose DB implies both p and NOT p are TRUE (there's the inconsistency)

Let q be any arbitrary proposition

From truth of p, infer truth of p OR q

From truth of p OR q and truth of NOT p, infer truth of q ... but q was arbitrary !!!

Page 261: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 261

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 262: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 262

VIRTUAL RELVARS ("VIEWS") : A view is a relvar that "looks and feels" just like a base

relvar but doesn’t exist independently of other relvars (it’sdefined in terms of them)

Repeat: A view is a relvar! ("CREATE TABLE" vs."CREATE VIEW" was at least a psychological mistake)

A view is a derived relvar

All virtual relvars are derived but some derived onesaren’t virtual /* see snapshots, later */

A view is a window into underlying relvars ... Ops on view are "really" ops on those underlying relvars

A view is a "canned query" (i.e., named rel exp)

Page 263: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 263

A view V is a relvar whose value at time t = result of evaluatingcertain rel exp at time t ... View defining expression specifiedwhen V is defined and must mention at least one relvar

VAR LS VIRTUAL CREATE VIEW LS AS( S WHERE ( SELECT *

CITY = ‘London’ ) ; FROM SWHERE CITY = ‘London’ )

WITH CHECK OPTION ;

VAR NLS VIRTUAL CREATE VIEW NLS AS( S WHERE ( SELECT *

CITY ‘London’ ) ; FROM SWHERE CITY <> ‘London’ )

WITH CHECK OPTION ;

VIEWS ARE RELVARS :

Page 264: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 264

CREATE VIEW allows parenthesized column name commalist after view name ... E.g.

CREATE VIEW SDS ( SNAME , DOUBLE_STATUS )AS ( SELECT DISTINCT SNAME , 2 * STATUS

FROM S ) ;

Recommendation: Don’t do this. Instead:

CREATE VIEW SDS AS ( SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS

FROM S ) ;

Tell DBMS once not twice that SNAME column is called SNAME!

Page 265: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 265

THE PRINCIPLE OF INTERCHANGEABILITY :

Instead of S being real and LS and NLS virtual, we could make LS and NLS real and S virtual—S is the union of restrictions LS and NLS, and mapping works both ways:

VAR LS BASE RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR } KEY { SNO } ;

VAR NLS BASE RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR } KEY { SNO } ;

VAR S VIRTUAL ( LS D_UNION NLS ) ; /* disjoint union */

/* plus certain constraints on, e.g., CITY */

Page 266: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 266

Designs are information equivalent ... So: Which relvars are base ones and which virtual is arbitrary (formally speaking, at least) ... Hence:

The Principle of Interchangeability: There must be no arbitrary and unnecessary distinctions between base and virtual relvars ... Virtual relvars should "look and feel" justlike base ones to the user

Having keys or not Integrity in general

"Entity integrity" Tuple IDs

... and we MUST be able to "update views" !!!

Page 267: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 267

RELATION CONSTANTS/* digression */ :View defining exp must mention at least one relvar ...Otherwise the "variable" isn’t a variable! Consider, e.g., following SQL view defn:

CREATE VIEW S_CONST ( SNO , SNAME , STATUS , CITY )

AS VALUES ( ‘S1’ , ‘Smith’ , 20 , ‘London’ ) ,

( ‘S2’ , ‘Jones’ , 10 , ‘Paris’ ) ,

( ‘S3’ , ‘Blake’ , 30 , ‘Paris’ ) ,

( ‘S4’ , ‘Clark’ , 20 , ‘London’ ) ,

( ‘S5’ , ‘Adams’, 30 , ‘Athens’ ) ;

Not updatable! Really a named relation constant

Page 268: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 268

NAMED CONSTANTS ARE USEFUL :

CONST PERIODIC_TABLE INIT ( RELATION{ TUPLE { ELEMENT ‘Hydrogen’ , SYMBOL ‘H’ , ATOMICNO 1 }

, { TUPLE { ELEMENT ‘Helium’ , SYMBOL ‘He’ , ATOMICNO 2 }

, ................................................................................................{ TUPLE { ELEMENT ‘Uranium’ , SYMBOL ‘U’ , ATOMICNO 92 } } ) ;

Note: TABLE_DUM and TABLE_DEE are system defined "relcons"

Can simulate relcons via view mechanism, but there’s a logical difference between variables and constants ...

... also between constants and literals

Page 269: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 269

VIEWS AND PREDICATES :

A view is a relvar and has a relvar predicate, derived from preds for underlying relvars ... E.g., view LS:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY AND city CITY is London

More colloquially:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in London

But latter obscures fact that CITY is a parameter ... It is aparameter, but corresp argument is constant (in practice,would probably project away CITY attribute)

Page 270: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 270

User operates on views as if they were real ...DBMS mapsoperations into corresponding operations on base relvars interms of which views are (ultimately) defined

Read-only operations are straightforward: e.g.,

SELECT SNO maps to SELECT LS.SNOFROM LS FROM ( SELECT S.*WHERE STATUS > 10 FROM S

WHERE S.CITY = ‘London’ ) AS LS WHERE LS.STATUS > 10

and then (?) to SELECT S.SNOFROM SWHERE S.CITY = ‘London’ AND S.STATUS > 10

RETRIEVAL OPERATIONS :

Page 271: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 271

Foregoing substitution procedure works because of closure!Didn’t always work in early versions of SQL ... E.g.:

CREATE VIEW V AS ( SELECT CITY , SUM ( STATUS ) AS ST

FROM SGROUP BY CITY ) ;

SELECT CITY maps to (???) SELECT S.CITYFROM V FROM SWHERE ST > 25 WHERE SUM ( S.STATUS ) > 25

GROUP BY S.CITY

So some products implement some view retrievals bymaterialization instead of substitution (!)

RETRIEVAL OPERATIONS (cont.) :

Page 272: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 272

VIEWS AND CONSTRAINTS :

A view is a relvar and has a (total) relvar constraint, derived from constraints for underlying relvars

E.g., view LS: {SNO} is a key ... AND CITY = ‘London’

Even though derived, nice to be able to declare such viewconstraints explicitly ... (a) DBMS might not be able to do thederivation; (b) documentation (explain semantics); (c) anotherreason to come! E.g.:

VAR LS VIRTUAL ( S WHERE CITY = ‘London’ )KEY { SNO };

Page 273: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 273

Recommendation: In SQL, include such specificationsas comments. E.g.:

CREATE VIEW LS AS ( SELECT *

FROM SWHERE CITY = ‘London’ )

/* UNIQUE ( SNO ) */WITH CHECK OPTION ;

Note: "View constraints" can always be formulated viaCREATE ASSERTION (if supported!)

Of course, we don’t want "the same" constraint to be checked twice ...

Page 274: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 274

CREATE TABLE FDH ( FLIGHT ... , DESTINATION ... , HOUR ... ,

UNIQUE ( FLIGHT ) ) ;

CREATE TABLE DFGP ( DAY ... , FLIGHT ... , GATE ... , PILOT ... ,

UNIQUE ( DAY , FLIGHT ) ) ;

Constraints:

BTCX1: IF ( f1,n1,h ), ( f2,n2,h ) IN FDH AND ( d,f1,g,p1 ), ( d,f2,g,p2 ) IN DFGP

THEN f1 = f2 AND p1 = p2

BTCX1: IF ( f1,n1,h ), ( f2,n2,h ) IN FDH AND ( d,f1,g1,p ), ( d,f2,g2,p ) IN DFGP

THEN f1 = f2 AND g1 = g2

A MORE COMPLEX EXAMPLE :

Page 275: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 275

CREATE ASSERTION BTCX1 CHECK ( NOT ( EXISTS ( SELECT * FROM FDH AS FX WHERE

EXISTS ( SELECT * FROM FDH AS FY WHEREEXISTS ( SELECT * FROM DFGP AS DX WHEREEXISTS ( SELECT * FROM DFGP AS DY WHERE

FY.HOUR = FX.HOUR ANDDX.FLIGHT = FX.FLIGHT ANDDY.FLIGHT = FY.FLIGHT ANDDY.DAY = DX.DAY ANDDY.GATE = DX.GATE AND

( FX.FLIGHT <> FY.FLIGHT ORDX.PILOT <> DY.PILOT ) ) ) ) ) ) ) ;

BTCX2 is analogous

Page 276: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 276

CREATE VIEW V AS ( FDH NATURAL JOIN DFGP ,

UNIQUE ( DAY , HOUR , GATE ) , /* hypothetical */UNIQUE ( DAY , HOUR , PILOT ) ) ; /* syntax !!! */

Or /* valid syntax */ :

CREATE VIEW V AS FDH NATURAL JOIN DFGP ;

CREATE ASSERTION VCX1 CHECK ( UNIQUE ( SELECT DAY , HOUR , GATE FROM V ) ) ;

CREATE ASSERTION VCX2 CHECK ( UNIQUE ( SELECT DAY , HOUR , PILOT FROM V ) ) ;

/* Could replace "V" by defn */

BUT :

Page 277: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 277

UPDATE OPERATIONS :

The Principle of Interchangeability implies that views must be updatable!

(What? Really? Even views like S JOIN P?)

Well, certain updates on certain base relvars can’t be done, either! ... Fail on violations of either The Golden Rule or The Assignment Principle (ignore latter possibility for simplicity)

So to support updates on view V, DBMS needs to know total relvar constraint VC for V ... i.e., needs to do constraint inference

Today’s products don’t and are therefore very weak on view updating

Page 278: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 278

UPDATE OPERATIONS(cont.) :Today’s products typically don’t allow updating views any more complex than simple restrictions and/or projections of single underlying base table (and even here there are problems) ... e.g., DELETE on view LS probably OK ... but what about INSERT ???

Recommendation: Specify WITH CASCADED CHECK OPTION on view definitions whenever possible

Note: SQL’s support for view updating is not only limited and ad hoc—it’s also extremely hard to understand

From the SQL standard:

Page 279: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 279

[The] <query expression> QE1 is updatable if and only if forevery <query expression> or <query specification> QE2 thatis simply contained in QE1:

a) QE1 contains QE2 without an intervening <non join query expression> that specifies UNION DISTINCT, EXCEPT ALL, or EXCEPT DISTINCT.

b) If QE1 simply contains a <non join query expression> NJQE that specifies UNION ALL, then:

i) NJQE immediately contains <query expression> LO and a <query term> RO such that no leaf generally underlying table of LO is also a leaf generally underlying table of RO.

(cont.)

Page 280: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 280

ii) For every column of NJQE, the underlying columns in the tables identified by LO and RO, respectively, are either both updatable or not updatable.

c) QE1 contains QE2 without an intervening <non join query term> that specifies INTERSECT.

d) QE2 is updatable.

Page 281: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 281

Foregoing is just one of many rules that have to be taken in combination in order to determine whether a given SQL view is updatable

Rules scattered over many different parts of the document

Rules rely on many additional concepts and constructs—e.g., updatable columns, leaf generally underlying tables, <non join query term>s—defined in still further parts of the document

OBSERVE THAT :

Page 282: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 282

1. Restriction and/or projection of single base table

2. One to one or one to many join of two base tables (many side only, in latter case)

3. UNION ALL or INTERSECT of two distinct base tables

4. Certain combinations of Cases 1-3 above

Even these cases are treated incorrectly, because of (a) lackof constraint inference; (b) duplicates; (c) nulls

LOOSELY, FOLLOWING SQL VIEWSARE UPDATABLE :

Page 283: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 283

Picture complicated still further ... A view can be:

• Updatable

• Potentially updatable

• Simply updatable

• Insertable into

Note implication that some views might permit someupdates but not others ... and further implication thatDELETE and INSERT might not be inverses

Recommendation: Lobby the vendors!

Page 284: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 284

1. User U1 who defines view V is aware of exp X that defines V ... U1 can use name V wherever exp X is intended, but such uses are really just shorthand

E.g., U1 might have perception

S and SP (for updates)plus V S JOIN SP (for retrievals)

but U1 knows these relvars aren’t all independent

2. User U2 who is merely informed that V is available for use should typically not be aware of exp X ... To U2, V should look just like a base relvar (logical data independence)/* have been assuming this case */

WHAT ARE VIEWS FOR ???

Page 285: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 285

Contrast views and snapshots—also derived, but real not virtual ... e.g.:

VAR LSS SNAPSHOT ( S WHERE CITY = ‘London’ )KEY { SNO }REFRESH EVERY DAY ;

SQL has CREATE TABLE AS ... but no REFRESH

Many applications can tolerate—might even require—data"as of" some point in time (e.g., end of an accounting period)

VIEWS AND SNAPSHOTS :

Page 286: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 286

Much current DB literature refers to snapshots as"materialized views" ... which is a contradiction in terms,pretty much (whole point about views as far as RM isconcerned is that they’re virtual)

And then typically goes on to abbreviate "materialized view"to just view (!) ...

So ubiquitously, in fact, that the unqualified term view hascome to mean, almost always, a snapshot instead (at leastin the academic world), and we no longer have a good termfor view in its original sense

Recommendations: Never use the term view, unqualified,to mean a snapshot; never use the term materialized view;and watch out for violations of these recommendations!

WATCH OUT FOR TERMINOLOGY !

Page 287: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 287

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 288: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 288

Relational calculus:

• Alternative to relational algebra

• Queries, constraints, view definitions, etc. can be stated in calculus terms as well as algebraic ones/* sometimes one is easier, sometimes the other */

• Applied form of predicate calculus (aka predicate logic)

• RDB language can be based on either algebra or calculus ... Tutorial D? SQL?

SQL AND LOGIC :

Page 289: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 289

LOGIC : PROPOSITIONSA proposition is a declarative sentence, or statement, that’scategorically either true or false. Examples:

1. 2 + 3 = 5

2. 2 + 3 > 7

3. Jupiter is a star

4. Mars has two moons

5. Venus is between Earth and Mercury

Page 290: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 290

POINTS ARISING : Don’t fall into the common trap of thinking propositions are

always true ... A false proposition is still a valid proposition

Informally, P is a valid proposition if and only if the following is a valid question: "Is it true that P?"

Very fine point (which I’m mostly going to ignore): The proposition isn’t really the declarative sentence as such—rather, it’s the assertion made by that sentence ... E.g., "It’s hot" and "Il fait chaud" denote the same proposition

Page 291: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 291

SO HOW MANY OF THE FOLLOWINGARE PROPOSITIONS ???

1. Bach is the greatest musicianwho ever lived.

2. What’s the time?

3. Supplier S2 is located insome city x.

4. Some countries have a femalepresident.

5. All politicians are corrupt.

6. Supplier S1 is located in London.

7. We both have the same favorite author x.

8. Nothing is heavier thanlead.

9. It will rain tomorrow.

10. Supplier S6’s city is unknown.

Page 292: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 292

LOGIC : CONNECTIVESOperators for combining propositions to make further (compound) propositions ... Simple proposition = one with no connectives ... Truth tables:

Negation: E.g., NOT (Jupiter is a star) : TRUE

Disjunction: E.g., (Mars has two moons) OR(2 + 3 > 7) : TRUE

Conjunction: E.g., (Mars has two moons) AND(2 + 3 > 7) : FALSE

NOT OR t f AND t f IF t f IFF t ft f t t t t t f t t f t t f f t f t f f f f f t t f f t

Page 293: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 293

Implication (IMPLIES, also written IF ... THEN ...):E.g.,

IF (Mars has two moons) THEN (Venus is between Earth and Mercury) : TRUE

/* see later */

Bi-implication (BI-IMPLIES, also written IF AND ONLY IF or IFF or "") : E.g.,

(2 + 3 = 5) IFF (Jupiter is a star) : FALSE

In practice we use symbols for the connectives (usually) and adopt precedence rules that allow us to drop parens

Page 294: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 294

CAVEAT :Connectives are close but not identical to their naturallanguage counterparts ... because they’re meant to becontext independent

E.g., p AND q q AND p

But "and" is not necessarily commutative in naturallanguage... Contrast:

• I voted for a change in leadership and I was seriously disappointed

• I was seriously disappointed and I voted for a change in leadership

Page 295: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 295

A NOTE ON IMPLICATION :Truth table not symmetric (i.e., op not commutative):

TRUE if p is FALSE and q is TRUE IF p THEN q is

FALSE if p is TRUE and q is FALSE

FALSE implies anything!

IF p THEN q ( NOT p ) OR q

Aside: This latter is a tautology ... Evaluates to TRUE no matter what p and q stand for*

And here’s a contradiction: p AND NOT p

Tautologies of form a b are particularly important*

Page 296: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 296

RE "FALSE IMPLIES ANYTHING" :

Consider integrity constraint on suppliers:

If supplier s is located in London, then supplier s must have status 20

Formally, this is an implication:*

IF s.CITY = ‘London’ THEN s.STATUS = 20

Don’t want the check to fail if the city isn’t London!

* Slightly simplified for sake of the example

Page 297: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 297

Again consider following constraint:

IF s.CITY = ‘London’ THEN s.STATUS = 20

Following is logically equivalent:

IF NOT ( s.STATUS = 20 ) THEN NOT ( s.CITY = ‘London’ )

i.e., IF s.STATUS 20 THEN s.CITY ‘London’

Contrapositive of original ... More generally:

IF p THEN q IF NOT q THEN NOT p

Page 298: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 298

HOW MANY OF THE FOLLOWING PROPOSITIONS ARE LOGICALLY DISTINCT ???

1. ( P.WEIGHT > 17.0 ) IMPLIES ( P.CITY ‘Paris’ )

2. ( P.CITY = ‘Paris’ ) IMPLIES ( P.WEIGHT < 17.0 )

3. ( P.WEIGHT < 17.0 ) OR ( P.CITY ‘Paris’ )

4. NOT ( ( P.CITY = ‘Paris’ ) AND ( P.WEIGHT > 17.0 ) )

Page 299: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 299

Let x = (P.WEIGHT > 17.0), y = (P.CITY ‘Paris’ )

• IF x THEN y

• IF NOT y THEN NOT x

• ( NOT x ) OR y

• NOT ( ( NOT y ) AND x )

Lessons learned:

• Manipulations can be done purely formally!

• Equivalences not always immediately obvious!

HOW MANY OF THE FOLLOWING PROPOSITIONS ARE LOGICALLY DISTINCT ???

Page 300: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 300

MORE CONNECTIVES :

p or q but NOT (p OR q) NOT (p AND q)

not both = neither p = not both p

nor q and q

Peirce arrow Sheffer stroke*

pq p q

Exactly 4 monadic / 16 dyadic connectives in total

(not all named):

Slightly unfortunate because " " is also used for OR*

XOR t f NOR t f NAND t ft f t t f f t f tf t f f f t f t t

Page 301: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 301

THE 4 MONADICS :

NOTt t t t t f t ff t f f f t f f

Page 302: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 302

THE 16 DYADICS :

t f IF t f NAND t f t f t t t t t f t f t t f ff t t f t t f t t f t t

OR t f t f XOR t f t ft t t t t f t f t t f f f t f f t f f t f f t f

t f IFF t f t f NOR t ft t t t t f t f t t f ff f t f f t f f t f f t

t f AND t f t f t ft t t t t f t f t t f ff f f f f f f f f f f f

Page 303: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 303

COMPLETENESS :

A logical system is truth functionally complete if and only

if all possible connectives can be expressed in terms of the given ones

The 20 possible connectives are not all primitive

Primitive sets: { NOT, OR }

{ NOT, AND }

{ NOR }

{ NAND }

Page 304: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 304

TRUTH TABLES REVISITED :Alternative style (example):

This style can be used to

show truth value of arb

log exp in terms of truth

values of components: e.g.,

(NOT q) IMPLIES (NOT p)

p q NOT p NOT q (NOT q) IMPLIES (NOT p)

t t f f tt f f t ff t t f tf f t t t

p q p AND q

t t tt f ff t ff f f

Page 305: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 305

EXAMPLES : Prove (NOT p) OR q p IMPLIES q

Prove (NOT p) AND ( p OR q) IMPLIES q is a tautology

p q NOT p (NOT p) OR q p IMPLIES qt t f t tt f f f ff t t t tf f t t t

p q NOT p p OR q (NOT p) AND (p OR q) *

t t f t f tt f f t f tf t t t t tf f t f f t

Page 306: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 306

CONNECTIVES REVISITED :

OR and AND are fundamentally dyadic ... but n-adic versions can be defined (why, exactly?). Let p1, p2 ..., pn (n > 0) be propositions. Then:

OR {p1,p2,...,pn} is equivalent to:

FALSE OR (p1) OR (p2) OR ... OR (pn)

Note: If none of the p’s involves any ORs, this prop is in disjunctive normal form (DNF)

AND {p1,p2,...,pn} is equivalent to:

TRUE AND (p1) AND (p2) AND ... AND (pn)

Note: If none of the p’s involves any ANDs, this prop is in conjunctive normal form (CNF)

Page 307: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 307

LOGIC : PREDICATES

A predicate is a truth valued function. Examples:

1. x is a star

2. x has two moons

3. x has m moons

4. x is between Earth and y

5. x is between y and z

Note parameters (or placeholders or free variables) ... Invoking ("instantiating") predicate involves replacingparameters by arguments and yields a proposition(which evaluates to TRUE or FALSE, by definition)

Page 308: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 308

Arguments satisfy predicate iff resulting proposition evaluates to TRUE ... E.g., the sun satisfies "x is a star," the moon doesn’t

Predicate with n parameters is n-place or n-adic (and if n = 0 the predicate is a proposition)

Connectives apply to predicates as well as propositions ... Simple/compound terminology applies too

Terminology: Predicate logic (aka predicate calculus) = study of predicates, connectives, and logical inferences that can be made using such predicates and connectives

Page 309: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 309

LOGIC : INFERENCE

Logic includes rules of inference by which new truths(theorems) can be inferred from given truths (axiomsand/or previously proved theorems)

Modus Ponens: If p IMPLIES q is true and p is true, wecan infer that q is true ("direct reasoning")

E.g., given the truth of both "If I have no money then Iwill have to wash dishes" and "I have no money," wecan infer truth of "I will have to wash dishes"

Modus Tollens: If p IMPLIES q is true and q is false, wecan infer that p is false ("indirect reasoning")

Page 310: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 310

LOGIC : QUANTIFICATIONAnother way to get a proposition from a predicate ...Consider monadic predicate p(x) (parameter shown for clarity). Then these are propositions:

• EXISTS x ( p ( x ) ) /* existential quantifier *//* —"backward E" */

Meaning: At least one value a exists such that p(a)evaluates to TRUE

• FORALL x ( p ( x ) ) /* universal quantifier *//* —"upside down A" */

Meaning: All possible values a are such that p(a)evaluates to TRUE

Page 311: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 311

EXAMPLES :• EXISTS x ( x is a logician )

TRUE (e.g., take x to be Bertrand Russell)

Single example suffices to show truth

• FORALL x ( x is a logician )

FALSE (e.g., take x to be George W. Bush)

Single counterexample suffices to show falsity

Note: Parameter x must "range over" some set of permissible values—see later

Page 312: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 312

LET x AND y RANGE OVER PERSONS :

Consider dyadic predicate "x is taller than y"

Quantify over x (using EXISTS, for definiteness):

EXISTS x ( x is taller than y )

Monadic predicate ... Invoke ("instantiate") withargument Steve:

EXISTS x ( x is taller than Steve )

Proposition: TRUE iff there exists at least one person, say Arnold, taller than Steve

Page 313: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 313

ALTERNATIVELY :Quantify over both parameters (using EXISTS, againfor definiteness):

EXISTS x ( EXISTS y ( x is taller than y ) )

Proposition: TRUE iff there are at least two persons not of the same height

Given an n-adic predicate, quantifying over m parameters(m < n) yields a k-adic predicate, where k = n - m

EXISTS x ( EXISTS y ( x is taller than y ) ) EXISTS y ( EXISTS x ( x is taller than y ) )

Similarly for FORALL ... Series of like quantifiers can bewritten in any sequence without changing semantics

Page 314: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 314

SIX POSSIBLE "FULL QUANTIFICATIONS"(and six distinct meanings) :

Assuming at least two distinct persons:

1. EXISTS x EXISTS y ( x is taller than y )

Meaning: Somebody is taller than somebody else; TRUE, unless everybody is the same height

2. EXISTS x FORALL y ( x is taller than y )

Meaning: Somebody is taller than everybody; FALSE

3. FORALL x EXISTS y ( x is taller than y )

Meaning: Everybody is taller than somebody; FALSE

Page 315: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 315

4. EXISTS y FORALL x ( x is taller than y )

Meaning: Somebody is shorter than everybody; FALSE

/* But need to explain that predicates "x is taller *//* than y" and "y is shorter than x" are logically *//* equivalent! */

5. FORALL y EXISTS x ( x is taller than y )

Meaning: Everybody is shorter than somebody; FALSE

6. FORALL x FORALL y ( x is taller than y )

Meaning: Everybody is taller than everybody; FALSE

Page 316: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 316

LOGIC : FREE AND BOUND VARIABLES Recap: A free variable is just a parameter

Quantifying over a free variable makes it bound

E.g.:

x is taller than y /* x, y both free */

EXISTS x ( x is taller than y) /* x bound, y free */

EXISTS x EXISTS y ( x is taller than y) /* x, y both bound */

So a proposition is a predicate with no free variables!

Page 317: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 317

THE TERMINOLOGY ISN’T VERY GOOD :

Free variables = parameters; but bound variables have no exact counterpart in conventional programming terms ... They serve as a kind of dummy, linking the predicate inside the parens to the quantifier outside. E.g.:

EXISTS x ( x > 3 ) vs. EXISTS y ( y > 3 )

By contrast, consider:

EXISTS x ( x > 3 ) AND x < 0 /* two different x’s !!! */EXISTS y ( y > 3 ) AND x < 0EXISTS y ( y > 3 ) AND y < 0

"Free" and "bound" really apply to variable occurrences in expressions, not to variables as such ... (sigh)

Page 318: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 318

EXERCISE (Honest Abe) :"You can fool some of the people some of the time, and some of the people all the time, but you cannot fool all the people all of the time."

Is this statement unambiguous? What does it mean?

Analysis: Statement involves three simple predicates (or propositions?) ANDed together:

you can fool some of the people some of the timeANDyou can fool some of the people all the timeAND /* but maps to AND */you cannot fool all the people all of the time

Page 319: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 319

EXERCISE (cont.) :

Denote "you can fool person x at time y" by fool(x,y)

"You can fool some of the people some of the time":

EXISTS x EXISTS y ( fool (x, y ) ) — easy enough

"You can fool some of the people all the time":

FORALL y EXISTS x ( fool (x, y ) ) — ???

EXISTS x FORALL y ( fool (x, y ) ) — ???

"You cannot fool all the people all of the time":

I’ll leave this one to you!

Page 320: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 320

RELATIONAL CALCULUS :SNO and STATUS for suppliers in Paris who supply part P2:

( S WHERE CITY = ‘Paris’ ) { SNO , STATUS }MATCHING ( SP WHERE PNO = ‘P2’ )

Relational calculus:

RANGEVAR SX RANGES OVER S ;RANGEVAR SPX RANGES OVER SP ;

{ SX.SNO , SX.STATUS } WHERE SX.CITY = ‘Paris’ ANDEXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = ‘P2’ )

Generic form /* of rel calc exp per se */ :

proto tuple WHERE predicate

Page 321: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 321

SQL ANALOG OF EXAMPLE :SELECT SX.SNO , SX.STATUSFROM S AS SXWHERE SX.CITY = ‘Paris’AND EXISTS

( SELECT *FROM SP AS SPXWHERE SPX.SNO = SX.SNOAND SPX.PNO = ‘P2’ )

So SQL does support range variables /* see next page */

SQL also supports EXISTS, but indirectly: EXISTS sq gives TRUE if table denoted by sq nonempty, FALSE otherwise* /* sq usually "correlated" */

* Never UNKNOWN !!!

Page 322: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 322

SQL RANGE VARIABLES CAN BE IMPLICIT :

SELECT S.SNO , S.STATUSFROM S /* implicit: AS S */WHERE S.CITY = ‘Paris’AND EXISTS

( SELECT *FROM SP /* implicit: AS SP */WHERE SP.SNO = S.SNOAND SP.PNO = ‘P2’ )

"S." and "SP." do not refer to tables S and SP !!!—they refer to implicit range variables (implicit correlation names, in SQL terms)

Page 323: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 323

MORE EXAMPLES :• SNAMEs for suppliers who supply all parts

/* range variable defns omitted */ :

{ SX.SNAME } WHERE FORALL PX ( EXISTS SPX ( SPX.SNO = SX.SNO AND

SPX.PNO = PX.PNO ) )

Quantifier order important!

SQL analog ??? /* see later */

• SNAMEs for suppliers who supply all red parts:

{ SX.SNAME } WHERE FORALL PX ( IF PX.COLOR = ‘Red’ THEN

EXISTS SPX ( SPX.SNO = SX.SNO ANDSPX.PNO = PX.PNO ) )

Page 324: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 324

PRENEX NORMAL FORM :{ SX.SNAME } WHERE FORALL PX

( EXISTS SPX ( IF PX.COLOR = ‘Red’ THEN SPX.SNO = SX.SNO ANDSPX.PNO = PX.PNO ) )

A predicate is in prenex normal form (PNF) iff (a) it’s quantifier free or (b) it’s of the form EXISTS x (p) or FORALL x (p), where p is in PNF in turn:

Q1 x1 ( Q2 x2 ( ... ( Qn xn ( q ) ) ... ) )

where n > 0, each Qi is either EXISTS or FORALL, and q is quantifier free

PNF is no more correct than any other form, but often easiest to write

Page 325: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 325

MORE QUERIES :

• Pairs of SNOs where the suppliers are colocated:

{ SX.SNO AS SA , SY.SNO AS SB } WHERE SX.CITY = SY.CITYAND SX.SNO < SY.SNO

• SNAMEs for suppliers who don’t supply part P2:

{ SX.SNAME } WHERE NOT EXISTS SPX ( SPX.SNO = SX.SNO ANDSPX.PNO = ‘P2’ )

• For each shipment, shipment details, including total shipment weight:

{ SPX , PX.WEIGHT * SPX.QTY AS SHIPWT } WHERE PX.PNO = SPX.PNO

Page 326: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 326

• For each part, PNO and total shipment quantity:

{ PX.PNO , SUM ( SPX WHERE SPX.PNO = PX.PNO , QTY ) AS TOTQ } [ WHERE TRUE ]

• Cities that store more than five red parts:

{ PX.CITY } WHERE COUNT ( PY WHERE PY.CITY = PX.CITY AND PY.COLOR = ‘Red’ ) > 5

Page 327: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 327

CONSTRAINTS :

• STATUS must be in the range 1 to 100 inclusive:

CONSTRAINT CX1FORALL SX ( SX.STATUS > 0 AND SX.STATUS < 101 ) ;

SQL base table constraint (on base table S):

CONSTRAINT CX1 CHECK ( STATUS > 0 AND STATUS < 101 )

Elides the quantifier (and explicit range variable)

• Suppliers in London must have status 20:

CONSTRAINT CX2 FORALL SX ( IF SX.CITY = ‘London’ THEN SX.STATUS = 20 ) ;

Page 328: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 328

• No two suppliers have same SNO:

CONSTRAINT CX3FORALL SX ( FORALL SY ( IF SX.SNO = SY.SNO THEN SX.SNAME = SY.SNAME ANDSX.STATUS = SY.STATUS ANDSX.CITY = SY.CITY ) ) ;

• No supplier with status less than 20 can supply part P6:

CONSTRAINT CX5FORALL SX ( IF SX.STATUS < 20 THENNOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = ‘P6’ ) ) ;

Page 329: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 329

• Every SNO in SP must appear in S:

CONSTRAINT CX6FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;

/* more on this one later */

• No SNO appears in both LS and NLS:

CONSTRAINT CX7 FORALL LX ( FORALL NX ( LX.SNO NX.SNO ) ) ;

• There must always be at least one supplier:

CONSTRAINT CX9 EXISTS SX ( TRUE ) ;

Page 330: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 330

MORE ON THE QUANTIFIERS :1. WE DON’T NEED BOTH

EXISTS x ( x is taller than Steve )

NOT FORALL x ( NOT x is taller than Steve )

Say the same thing! More generally:

EXISTS x ( p ( x ) ) NOT FORALL x ( NOT p ( x ) )

Likewise:

FORALL x ( p ( x ) ) NOT EXISTS x ( NOT p ( x ) )

So we don’t need both ... but it’s nice to have both. E.g.:

Page 331: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 331

"GET SUPPLIERS WHO SUPPLY ALL PARTS" :

Compare and contrast:

SX WHERE FORALL PX ( EXISTS SPX ( SX.SNO = SPX.SNO AND SPX.PNO = PX.PNO )

vs. SELECT SX.*FROM S AS SXWHERE NOT EXISTS

( SELECT PX.*FROM P AS PXWHERE NOT EXISTS ( SELECT SPX.*

FROM SP AS SPXWHERE SX.SNO = SPX.SNOAND SPX.PNO = PX. PNO ) )

Page 332: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 332

MORE ON THE QUANTIFIERS :2. EMPTY RANGES

EXISTS x ( p ( x ) ) NOT FORALL x ( NOT p ( x ) )

Suppose there are no x’s; then LHS evaluates to FALSE

So RHS evaluates to FALSE

So FORALL x ( NOT p ( x ) ) evaluates to TRUE

But p was arbitrary ...

So FORALL x ( q ( x ) ) evaluates to TRUE:regardless of the predicate q(x) !

Page 333: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 333

SOME CONSEQUENCES : Business rule or constraint of the form FORALL x (...)

is "automatically" satisfied if there aren’t any x’s. E.g., "all taxpayers with taxable income > $1 billion must pay supertax" automatically satisfied if no taxpayer has such a large taxable income

Certain queries produce "unexpected" results (if you don’t know logic). E.g., "get suppliers who supply all purple parts"—

SX WHERE FORALL PX ( IF PX.COLOR = ‘Purple’ THEN EXISTS SPX ( SX.SNO = SPX.SNO AND

SPX.PNO = PX.PNO ) )

—returns all suppliers if there are no purple parts (!)

Page 334: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 334

MORE ON THE QUANTIFIERS :3. DEFINITIONS

Consider p(x); let x range over {x1,x2,...,xn}. Then:

EXISTS x ( p ( x ) ) FALSE OR p ( x1 ) OR p ( x2 ) OR ... OR p ( xn )

FORALL x ( p ( x ) ) TRUE AND p ( x1 ) AND p ( x2 ) AND ... AND p ( xn )

E.g.: let p(x) = x has a moon; let x range over {Mercury, Venus, Earth, Mars}

But foregoing definitions are valid only because the sets are all finite! (And even though the quantifiers are thus "just shorthand," they’re very useful shorthand!)

Page 335: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 335

MORE ON THE QUANTIFIERS :4. ADDITIONAL KINDS

Possibilities include:

There exist at least three x’s such that

A majority of x’s are such that

An odd number of x’s are such that

and so on ... One important one:

There exists exactly one x such that ("UNIQUE")

E.g.: UNIQUE x ( x has social security number y )

Meaning: Exactly one person has social security number y

Page 336: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 336

CONSTRAINT CX6 REVISITED :• Every shipment must have a supplier:

CONSTRAINT CX6FORALL SPX ( EXISTS SX ( SX.SNO = SPX.SNO ) ) ;

Better:

CONSTRAINT CX6FORALL SPX ( UNIQUE SX ( SX.SNO = SPX.SNO ) ) ;

• SQL has very indirect support:

UNIQUE sq where sq is (SELECT * FROM T WHERE bx) gives TRUE if at most one row in T satisfies bx, else FALSE

So CX6 becomes:

Page 337: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 337

CREATE ASSERTION CX6 CHECK ( NOT EXISTS ( SELECT *

FROM SP AS SPXWHERE NOT EXISTS ( SELECT *

FROM S AS SXWHERE SX.SNO = SPX.SNO )

OR NOT UNIQUE ( SELECT *

FROM S AS SXWHERE SX.SNO = SPX.SNO ) ) ) ;

/* but "OR ... (...)" could be dropped *//* because (SNO) is key for S */

Page 338: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 338

SOME EQUIVALENCES :If IS_EMPTY supported, quantifiers need not be:

EXISTS x ( p ) NOT ( IS_EMPTY ( X WHERE p ) )FORALL x ( p ) IS_EMPTY ( X WHERE NOT ( p ) )

/* x ranges over X */

These equivalences explain SQL’s EXISTS (which is really an operator, not a quantifier, in SQL) ... and SQL’s lack of support for FORALL

EXISTS x ( p ) COUNT ( X WHERE p ) > 0FORALL x ( p ) COUNT ( X WHERE p ) = COUNT ( X ) UNIQUE x ( p ) COUNT ( X WHERE p ) = 1

Recommendation: Don’t use COUNT in preference to EXISTS

Page 339: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 339

RELATIONAL COMPLETENESS :

For every expression of the rel algebra, there exists an expression of the rel calculus that’s logically equivalent (i.e., has same semantics) ...

So rel calculus is at least as “powerful” (better: expressive) as rel algebra

Not obvious (?), but converse is true too

Both are relationally complete

/* basic measure of expressive power of lang */

What about SQL ???

Page 340: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 340

TO SUM UP :

DB professionals in general and SQL practitioners in particular should have at least a basic understanding of logic or relational calculus (it comes to the same thing) !!! Here’s a quote:

Surely it’s worth investing a little effort up front in becoming familiar with [basic logic] in order to avoid the problems associated with ambiguous business rules. Ambiguity in business rules leads to implementation delays at best or implementation errors at worst (possibly both). And such delays and errors certainly have costs associated with them, costs that are likely to outweigh those initial learning costs many times over. In other words, framing business rules properly is a serious matter, and it requires a certain level of technical competence.

Page 341: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 341

These remarks are set in the context of business rulesspecifically, but they’re of wider applicability—as we’ll see

Yes, I know the counterarguments ... but I don’t agree with them

Reviewer: "Counterarguments to what? Surely not to the assertion that it would be better if the rule

designer were trained in logic? If so, I’d like to be told them, and perhaps some others would feel the same."

Page 342: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 342

Yes, that’s what I meant ... Claim is:

Logic is simply too difficult for most people to deal with

Might be true in general (big subject!) ... but don’t need to understand the whole of logic for the purpose at hand ... and the benefits are so huge!

Small effort up front pays for itself many times over in avoiding errors in rules, and constraints, and queries, and on and on

Page 343: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 343

A FINAL REMARK :

Logic is very solid !!!

Began with the ancient Greeks: Aristotle 384-322 BCE

Leibniz 1646-1716: Laid foundations of modern logic

Boole 1815-1864: Laws of Thought (1854)

Frege 1848-1925: Quantifiers (1879)

Wittgenstein 1889-1951: Truth tables (1922)

Etc., etc., etc.

Page 344: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 344

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 345: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 345

HOW TO WRITE CORRECT SQL AND KNOW IT :

SQL is complicated and difficult—much more so than SQL advocates would have you believe ... In fact, it’s unteachable !!! (so my title might be an overclaim)

So to have a hope of writing correct SQL, you must follow some discipline

Logic is a HUGE help!

Formulate query (or ...) in logic or rel calc

Map that formulation systematically to SQL

In other words, expression transformation once again

Page 346: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 346

SOME IMPORTANT TRANSFORMATION LAWS :

Law of the form exp1 exp2 implies that if some exp contains an occurrence of exp1, it can be rewritten as an exp containing an occurrence of exp2 without changing the meaning /* crucial point */ ... E.g.

SELECT SNOFROM SWHERE ( STATUS > 10 AND CITY = ‘London’ )OR ( STATUS > 10 AND CITY = ‘Athens’ )

Boolean exp here clearly equivalent to:

STATUS > 10 AND ( CITY = ‘London’ OR CITY = ‘Athens’ )

Thanks to distributivity (of AND over OR)

Page 347: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 347

The distributive laws:

p AND ( q OR r ) ( p AND q ) OR ( p AND r )

p OR ( q AND r ) ( p OR q ) AND ( p OR r )

Here and elsewhere p, q, r denote arb boolean expsp q r p AND q p AND r 2 OR 3 q OR r p AND 5

T T T T T T T T

T T F T F T T T

T F T F T T T T

T F F F F F F F

F T T F F F T F

F T F F F F T F

F F T F F F T F

F F F F F F F F

Page 348: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 348

The implication law:

IF p THEN q ( NOT p ) OR q

The double negation law:

NOT ( NOT p ) p

De Morgan’s laws:

NOT (p AND q ) ( NOT p ) OR ( NOT q )

NOT (p OR q ) ( NOT p ) AND ( NOT q )

Page 349: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 349

The quantification law:

FORALL x ( p ( x ) ) NOT EXISTS x ( NOT p ( x ) )

/* repeated application of De Morgan */

De Morgan’s "first" law revisited:

NOT (p AND q ) ( NOT p ) OR ( NOT q )

Often applied to result of prior application of implication law ... So restate, replacing q by NOT q:

NOT (p AND NOT q ) ( NOT p ) OR q

Page 350: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 350

EXAMPLE 1:LOGICAL IMPLICATION

All red parts must be stored in London ... i.e.:

IF COLOR = ‘Red’ THEN CITY = ‘London’ /* for given part */

Apply implication law /* add parens for clarity */ :

( NOT ( COLOR = ‘Red’ ) ) OR CITY = ‘London’

Map to base table constraint (SQL):

CONSTRAINT BTCX1 CHECK( NOT ( COLOR = ‘Red’ ) OR CITY = ‘London’ )

Simplify /* i.e., more transformations! */ :

CONSTRAINT BTCX1 CHECK ( COLOR <> ‘Red’ OR CITY = ‘London’ )

Page 351: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 351

EXAMPLE 2:UNIVERSAL QUANTIFICATION

FORALL PX ( IF COLOR = ‘Red’ THEN PX.CITY = ‘London’ )

Apply quantification law:

NOT EXISTS PX ( NOT ( IF PX.COLOR = ‘Red’ THEN PX.CITY = ‘London’ ) )

/* henceforth add/drop parens freely */

Implication law:

NOT EXISTS PX ( NOT ( NOT ( PX.COLOR = ‘Red’ )OR PX.CITY = ‘London’ ) )

Could now map to SQL, but let’s tidy it up first:

Page 352: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 352

De Morgan:

NOT EXISTS PX ( NOT ( NOT ( ( PX.COLOR = ‘Red’ )AND NOT ( PX.CITY = ‘London’ ) ) ) )

Double negation (and drop some parens):

NOT EXISTS PX ( PX.COLOR = ‘Red’ AND NOT ( PX.CITY = ‘London’ ) )

One more obvious transformation:

NOT EXISTS PX ( PX.COLOR = ‘Red’ AND PX.CITY ‘London’ )

Page 353: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 353

TRANSFORM FINAL EXP TO SQL : NOT maps to NOT

EXISTS PX ( bx ) EXISTS ( SELECT *FROM P AS PXWHERE ( sbx ) )

/* sbx is SQL analog of bx */

Parens around sbx can be dropped

Wrap up entire exp inside CREATE ASSERTION

CREATE ASSERTION ... CHECK ( NOT EXISTS ( SELECT *

FROM P AS PXWHERE PX.COLOR = ‘Red’AND PX.CITY <> ‘London’ ) ) ;

Page 354: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 354

EXAMPLE 3:IMPLIES AND FORALL

PNAMEs for parts whose weight is different from that of everypart in Paris:

{ PX.PNAME } WHERE FORALL PY ( IF PY.CITY = ‘Paris’ THEN PY.WEIGHT PX.WEIGHT )

Quantification law:

{ PX.PNAME } WHERE NOT EXISTS PY ( NOT ( IF PY.CITY = ‘Paris’ THEN PY.WEIGHT PX.WEIGHT ) )

Implication law:

{ PX.PNAME } WHERE NOT EXISTS PY ( NOT ( NOT ( PY.CITY = ‘Paris’ )

OR ( PY.WEIGHT PX.WEIGHT ) ) )

Page 355: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 355

De Morgan:

{ PX.PNAME } WHERE NOT EXISTS PY ( NOT ( NOT ( ( PY.CITY = ‘Paris’ )

AND NOT ( PY.WEIGHT PX.WEIGHT ) ) ) )

Tidy up:

{ PX.PNAME } WHERE NOT EXISTS PY ( PY.CITY = ‘Paris’ ANDPY.WEIGHT = PX.WEIGHT )

Map to SQL:

Page 356: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 356

SELECT DISTINCT PX.PNAME /* DISTINCT needed here! */FROM P AS PXWHERE NOT EXISTS

( SELECT *FROM P AS PYWHERE PY.CITY = ‘Paris’AND PY.WEIGHT = PX.WEIGHT )

But ... suppose there’s at least one part in Paris, but such parts all have a null weight

Original query now can’t be answered ... Any definite result is a lie!

But foregoing SQL exp will return all PNAMEs in table P

Page 357: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 357

SELECT DISTINCT PX.PNAME FROM P AS PXWHERE PX.WEIGHT NOT IN ( SELECT PY.WEIGHT

FROM P AS PYWHERE PY.CITY = ‘Paris’ )

Looks equivalent ...

Is equivalent in 2VL ...

But gives different but equally incorrect result: viz.,empty table! (under same conditions as before)

Moral ???

WHAT’S MORE :

Page 358: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 358

Names of suppliers who supply both part P1 and part P2:

{ SX.SNAME } WHEREEXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = ‘P1’ ) AND EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = ‘P2’ )

SELECT DISTINCT SX.SNAME FROM S AS SXWHERE EXISTS ( SELECT *

FROM SP AS SPXWHERE SPX.SNO = SX.SNOAND SPX.PNO = ‘P1’ )

AND EXISTS ( SELECT *FROM SP AS SPXWHERE SPX.SNO = SX.SNOAND SPX.PNO = ‘P2’ )

EXAMPLE 4:CORRELATED SUBQUERIES

Page 359: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 359

Correlated subqueries often contraindicated from a performance point of view,* because (conceptually, at least) they have to be evaluated once for each row in the outer table, instead of just once and for all

So eliminate them? ... Easy (for subqueries in EXISTS):

SELECT DISTINCT SX.SNAME FROM S AS SXWHERE SX.SNO IN ( SELECT SPX.SNO

FROM SP AS SPXWHERE SPX.PNO = ‘P1’ )

AND SX.SNO IN ( SELECT SPX.SNOFROM SP AS SPXWHERE SPX.PNO = ‘P2’ )

* Mirabile dictu ...

Page 360: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 360

SELECT sic /* "select item commalist" */FROM T1WHERE [ NOT ] EXISTS ( SELECT *

FROM T2WHERE T2.C = T1.C AND bx )

Maps to:

SELECT sic FROM T1WHERE T1.C [ NOT ] IN ( SELECT T2.C

FROM T2WHERE bx )

But what if there are nulls?

Page 361: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 361

EXAMPLE 5:NAMING SUBEXPRESSIONS

Get supplier details for suppliers who supply all purple parts

{ SX } WHERE FORALL PX ( IF PX.COLOR = ‘Purple’ THEN EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

Implication law:

{ SX } WHERE FORALL PX ( NOT ( PX.COLOR = ‘Purple’ ) OR EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

De Morgan:

{ SX } WHERE FORALL PX ( NOT ( PX.COLOR = ‘Purple’ ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) )

Page 362: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 362

Quantification law:

{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( ( PX.COLOR = ‘Purple’ ) AND

NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) ) )

Double negation:

{ SX } WHERE NOT EXISTS PX ( ( PX.COLOR = ‘Purple’ ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

Page 363: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 363

Drop some parens and map to SQL:

SELECT * FROM S AS SXWHERE NOT EXISTS

( SELECT *FROM P AS PXWHERE PX.COLOR = ‘Purple’ AND NOT EXISTS ( SELECT * FROMSP AS SPX

WHERE SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )

Page 364: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 364

A BETTER APPROACH :Introduce names for subexpressions:

exp1 : PX.COLOR = ‘Purple’

exp2 : EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO )

/* both map fairly directly to SQL */

Original rel calc formulation:

{ SX } WHERE FORALL PX ( IF exp1 THEN exp2 )

Can see the forest as well as the trees! ... and can apply usualtransformations—but in a different sequence, because we now have better grasp of the big picture

Page 365: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 365

Quantification law:

{ SX } WHERE NOT EXISTS PX ( NOT ( IF exp1 THEN exp2 ) )

Implication law:

{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 ) OR ( exp2 ) )

De Morgan:

{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 AND NOT exp2 ) ) )

Double negation:

{ SX } WHERE NOT EXISTS PX ( exp1 AND NOT ( exp2 ) )

Can now expand exp1 and exp2 and map to SQL

Page 366: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 366

Get suppliers such that every part they supply is in the same city as that supplier

{ SX } WHERE FORALL PX ( IF EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO )

THEN PX.CITY = SX.CITY )

{ SX } WHERE FORALL PX ( IF exp1 THEN exp2 )

{ SX } WHERE NOT EXISTS PX ( NOT ( IF exp1 THEN exp2 ) )

{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 ) OR exp2 ) )

{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( exp1 AND NOT ( exp2 ) ) ) )

{ SX } WHERE NOT EXISTS PX ( exp1 AND NOT ( exp2 ) )

EXAMPLE 6:NAMING SUBEXPRESSIONS bis

Page 367: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 367

Expand exp1 and exp2 and map to SQL:

SELECT * FROM S AS SXWHERE NOT EXISTS

( SELECT *FROM P AS PXWHERE EXISTS ( SELECT * FROMSP AS SPX

WHERE SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) AND PX.CITY <> SX.CITY )

Page 368: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 368

Get suppliers such that every part they supply is in the same city

Possible interpretations include:

Get suppliers SX such that for all parts PX and PY, if SX supplies both of them, then PX.CITY = PY.CITY

Get suppliers SX such that for all parts PX and PY, if SX supplies both of them and they’re distinct, then PX.CITY = PY.CITY

Assume first interpretation ...

EXAMPLE 7:DEALING WITH AMBIGUITY

Page 369: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 369

{ SX } WHERE FORALL PX ( FORALL PY( IF EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) AND EXISTS SPY ( SPY.SNO = SX.SNO AND SPY.PNO = PY.PNO ) THEN PX.CITY = PY.CITY ) )

{ SX } WHERE FORALL PX ( FORALL PY ( IF exp1 AND exp2 THEN exp3 ) )

{ SX } WHERE NOT EXISTS PX ( NOT FORALL PY ( IF exp1 AND exp2 THEN exp3 ) )

{ SX } WHERE NOT EXISTS PX ( NOT ( NOT EXISTS PY ( NOT ( IF exp1 AND exp2 THEN exp3 ) ) ) )

Page 370: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 370

{ SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT ( IF exp1 AND exp2 THEN exp3 ) ) )

{ SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT ( NOT ( exp1 AND exp2 ) OR exp3 ) ) )

{ SX } WHERE NOT EXISTS PX ( EXISTS PY ( NOT ( NOT ( exp1 ) OR NOT ( exp2 ) OR ( exp3 ) ) )

{ SX } WHERE NOT EXISTS PX ( EXISTS PY ( ( exp1 AND exp2 AND NOT ( exp3 ) ) ) )

Page 371: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 371

SELECT * FROM S AS SXWHERE NOT EXISTS

( SELECT *FROM P AS PXWHERE EXISTS ( SELECT * FROM P AS PY

WHERE EXISTS ( SELECT *

FROM SP AS SPXWHERE SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO )

AND EXISTS ( SELECT *FROM SP AS SPY

WHERE SPY.SNO = SX.SNO AND SPY.PNO = PY.PNO )

AND PX.CITY <> PY.CITY ) )

Page 372: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 372

Get suppliers such that every part they supply is in the same city /* same as Example 7 */ ... Or:

Get suppliers SX such that the number of cities for parts supplied by SX is less than or equal to one

{ SX } WHERE COUNT ( PX.CITY WHERE EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) < 1

SELECT *FROM S AS SXWHERE ( SELECT COUNT ( DISTINCT PX.CITY )

FROM P AS PXWHERE EXISTS ( SELECT *

FROM SP AS SPXWHERE SPX.SNO = SX.SNOAND SPX.PNO = PX.PNO ) ) <=1

EXAMPLE 8:USING COUNT

Page 373: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 373

Reminder: Don’t use COUNT when EXISTS is what you mean

Is that DISTINCT in the COUNT invocation necessary?

Can you formulate the query in terms of GROUP BY and HAVING?

If so, what are the logical steps involved in constructing that formulation?

Page 374: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 374

E.g., P.WEIGHT >ALL ( SELECT ... )

rx theta sq subquery, denoting table t

=, <, (etc.) followed by ALL or ANY

row expression (usually scalar in practice: coercion)

ALL : TRUE iff comparison without ALL returns TRUE for all rows in t (hence, TRUE if t empty)

ANY : TRUE iff comparison without ANY returns TRUE for at least one row in t (hence, FALSE if t empty)

EXAMPLE 11*:ALL OR ANY COMPARISON

* For Examples 9 and 10, see the book

Page 375: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 375

PNAMEs for parts with weight > that of every blue part:

SELECT DISTINCT PX.PNAMEFROM P AS PXWHERE PX.WEIGHT >ALL ( SELECT PY.WEIGHT

FROM P AS PYWHERE PY.COLOR = ‘Blue’ )

Recommendation: Don’t use ALL or ANY comparisons!

Error prone (e.g., replace "every" by "any" in example?)

Redundant ... e.g., consider:

SELECT DISTINCT SNAMEFROM SWHERE CITY <>ANY ( SELECT CITY FROM P )

Page 376: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 376

SNAMEs for suppliers whose city isn’t equal to any part city? Wrong! Actually equivalent* to:

SELECT DISTINCT SNAMEFROM SWHERE EXISTS ( SELECT *

FROM PWHERE P.CITY <> S.CITY )

ALL or ANY comparisons can always be transformed intoequivalent exps involving EXISTS (as above) ... Can alsousually be transformed into exps involving MAX or MIN

* Is it? What if cities could be null?

Page 377: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 377

=ANY equivalent to IN

<>ALL equivalent to NOT IN

=ALL, <>ANY ... Use EXISTS

ANY ALL

= IN

<> NOT IN

< < MAX < MIN

<= <=MAX <=MIN

> > MIN > MAX

>= >=MIN >=MAX

Page 378: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 378

1. SELECT DISTINCT PX.PNAME FROM P AS PXWHERE PX.WEIGHT >ALL ( SELECT PY.WEIGHT

FROM P AS PYWHERE PY.COLOR = ‘Blue’ )

2. SELECT DISTINCT PX.PNAME FROM P AS PXWHERE PX.WEIGHT > ( SELECT MAX ( PY.WEIGHT )

FROM P AS PYWHERE PY.COLOR = ‘Blue’ )

Exercise: What coercions are involved in the above?

FOR EXAMPLE :

Page 379: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 379

MAX gives null if argument is empty ...

1. SELECT DISTINCT PX.PNAME FROM P AS PXWHERE PX.WEIGHT >ALL ( SELECT PY.WEIGHT

FROM P AS PYWHERE PY.COLOR = ‘Blue’ )

2. SELECT DISTINCT PX.PNAME FROM P AS PXWHERE PX.WEIGHT > ( SELECT MAX ( PY.WEIGHT )

FROM P AS PYWHERE PY.COLOR = ‘Blue’ )

No blue parts: Exp 1 gives all PNAMEs ... Exp 2 gives empty !!!

BUT :

Page 380: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 380

2. SELECT DISTINCT PX.PNAME FROM P AS PXWHERE PX.WEIGHT > ( SELECT COALESCE ( MAX ( PY.WEIGHT ) , 0.0 )

FROM P AS PYWHERE PY.COLOR = ‘Blue’ )

Page 381: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 381

For each part supplied by no more than two suppliers, get PNAME and city and total quantity supplied

{ PX.PNO , PX.CITY , SUM ( SPX.QTY WHERE SPX.PNO = PX.PNO , QTY ) AS TPQ }

WHERE COUNT ( SPY WHERE SPY.PNO = PX.PNO ) < 2

SELECT PX.PNO , PX.CITY , ( SELECT COALESCE ( SUM ( SPX.QTY ) , 0 ) AS TPQ

FROM SP AS SPXWHERE SPX.PNO = PX.PNO ) AS TPQ

FROM P AS PXWHERE ( SELECT COUNT ( * )

FROM SP AS SPYWHERE SPY.PNO = PX.PNO ) <= 2

EXAMPLE 12:GROUP BY AND HAVING

Page 382: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 382

SELECT PX.PNO , PX.CITY , COALESCE ( SUM ( SPX.QTY ) , 0 ) AS TPQ

FROM P AS PX , SP AS SPXWHERE PX.PNO = SPX.PNOGROUP BY PX.PNOHAVING COUNT ( * ) <= 2

Easier to understand?

Is PX.CITY in SELECT clause legal?

Correct for parts supplied by no suppliers at all?/* No */

Are formulations equivalent in presence of nulls?Or duplicates?

OR :

Page 383: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 383

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 384: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 384

• Implementation defined vs.implementation dependent

• SELECT *

• Explicit tables

• Dot qualification

• Range variables

FURTHER SQL TOPICS :

• Subqueries

• "Possibly nondeterministic"expressions

• Empty sets

• BNF grammar for SQLtable expressions

Page 385: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 385

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

THANK YOU FOR LISTENING !!!

Page 386: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 386

THESIS (stake in the ground) :

DBs are not just "data stores" !!!

I claim, if you think about the issue at the approp levelof abstraction, you’re inexorably led to the position:

DBs must be relational

All other "models"—inverted lists, IMS-style hierarchies,CODASYL-style networks, objects (= CODASYLwarmed over), XML or "semistructured model" (= IMSwarmed over), etc., etc.—are simply ad hoc storagestructures that have been elevated above their stationand will not endure

Page 387: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 387

JUSTIFICATION :Want to record "true facts": e.g., Joe’s salary is 50K... i.e., true propositions

Easily encoded as ordered pairs: e.g., <Joe,50K>

value of type NAMEvalue of type MONEY

But not just arbitrary propositions ... Rather, all trueinstantiations of certain predicates ... In the example:

x’s salary is y

value of type NAMEvalue of type MONEY

Page 388: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 388

JUSTIFICATION (cont.) :In other words, we want to record extension of "x’s salary is y"—i.e., a set of ordered pairs—i.e., a binary relation! ... which we can depict as a table:

values of type NAMEvalues of type MONEY

Joe 50K

Amy 60K Actually a function,Sue 45K because each person ... ... has just one salaryRon 60K

Subset of cartesian product of set of all names ("type NAME") and set of all money values ("type MONEY"), in that order

Page 389: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 389

Humble (but very solid) beginnings! But Codd realized:

1. Need n-adic predicates and propositions (not justdyadic); hence n-ary relations (not just binary) and n-tuples (not just pairs)—tuples for short

2. Ordering OK for pairs but soon gets unwieldy for n > 2... So replace attribute ordinal positions by attributenames and (re)define relation concept accordingly

3. Representation obviously not the end of the story ...Need operators for deriving further relations from given("base") ones for queries etc.—e.g., "Find all personswith salary 60K" ... Hence relational calculus (logic) /relational algebra (set theory)

JUSTIFICATION (cont.) :

Page 390: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 390

EXAMPLE REVISITED :attribute of type NAME

attribute of type MONEY

heading PERSON SALARY No "first" or "second"attribute

Joe 50KAmy 60K Note logical difference

body Sue 45K between attribute and ... ... underlying typeRon 60K

From this point forward relation means a relation in above sense, barring explicit statements to the contrary

Page 391: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 391

THE RELATIONAL MODEL DEFINED :

1. An open-ended collection of scalar types (including in particular the type boolean or truth value)

2. A relation type generator and an intended interpretation for relations of types generated thereby

3. Facilities for defining relation variables of such generated relation types

4. A relational assignment operation for assigning relation values to such relation variables

5. An open-ended collection of generic relational operators for deriving relation values from other relation values

Page 392: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 392

SOME IMPLICATIONS :

1. User defined types and user defined operators

2. Users can specify individual relation types

3. Relvars are the only variables allowed inside an RDB— in accordance with Codd's Information Principle:

Entire information content of the DB is represented in one and only one way: as explicit attribute values within tuples within relations

4. INSERT / DELETE / UPDATE just shorthand

5. System defined operators (plus user-defined ones?)— used for many purposes, including constraints in particular

Page 393: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 393

WHAT REMAINS TO BE DONE ???

• Proper implementation

The Third Manifesto The TransRelationaltm Model

• Further foundation issues: e.g.,

Constraint inference Database design "Missing information" Etc.

Page 394: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 394

• Higher level abstractions

PACK and UNPACK "U_" ops, keys, etc. Etc.

• Higher level interfaces

Propositions Data mining, decision support, etc.

• What about SQL ???

Page 395: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 395

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

STRUCTURE OF PRESENTATION :

Page 396: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 396

SOME REMARKS ON DATABASE DESIGN :

DB design theory is not part of RM as such—rather, it builds on RM

Obviously true for physical design!—but true of logical design too, to some extent

Concepts such as further normalization on which design theory is based are themselves based on

more fundamental concepts that are part of RM

So I'll be brief ... Quick look at:

Normalization Denormalization Orthogonality

Page 397: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 397

FUNCTIONAL DEPENDENCIES :"Everyone knows" that 2NF, 3NF, BCNF all depend on functional dependencies (FDs)

Let A and B be subsets of the heading of R; then R satisfies the FD

A B

iff, whenever two tuples of R agree on A, they also agree on B

E.g., given EMP { ENO , SALARY , DNO , MNO } :

{ DNO } { MNO }

Page 398: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 398

Reminder: If SK is a superkey for R and A is any subset of the heading of R, then R satisfies SK A

The fact that a given FD holds for R is a relvar constrainton R (of course): e.g., for EMP as on previous page,

CONSTRAINT FDX COUNT ( EMP { DNO } ) = COUNT ( EMP { DNO , MNO } ) ;

Likewise for multi-valued dependencies (MVDs), which are relevant to "4NF", and join dependencies (JDs), which are relevant to "5NF" (CONSTRAINT formulations left as an exercise)

Page 399: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 399

NORMAL FORMS :

1NF : All relvars are in 1NF—even with relation-valued attributes (RVAs)—though RVAs usually contraindicated

2NF, 3NF : Mainly historical interest

BCNF : R is in BCNF iff for every nontrivial FD X A satisfied by R, X is a superkey

Loosely: Every fact is a fact about the key, the whole key, and nothing but the key

(The FD A B is trivial iff it can't possibly be violated— i.e., iff B A)

4NF : Mainly historical interest

Page 400: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 400

JOIN DEPENDENCIES :

Let R be a relvar, and let A, B, ..., C be subsets of the heading of R. Then R satisfies the join dependence

* { A , B , … , C }

if and only if every legal value of R is equal to the join of its projections on A, B, ..., C (i.e., if and only if R can be nonloss decomposed into those projections)

E.g.: Relvar S satisfies JD * { SN , SS , SC } where SN = { SNO , SNAME }, etc.

Note: UNION { A , B , … , Z } must equal heading

Every MVD is a JD, every FD is an MVD

Page 401: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 401

EVERY FD IS A JD (example) :

Suppose relvar S satisfies additional FD:

{ CITY } { STATUS } /* see next page */

Then S can be nonloss decomposed into projections on:

{ SNO , SNAME , CITY } { CITY , STATUS }

In other words, S satisfies following JD:

* { SNC , CS }

where SNC = { SNO , SNAME , CITY } CS = { CITY , STATUS }

Page 402: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 402

SAMPLE VALUE OF RELVAR SSATISFYING { CITY } { STATUS } :

S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 30 Paris note the S3 Blake 30 Paris change S4 Clark 20 London S5 Adams 30 Athens

Page 403: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 403

NONLOSS DECOMPOSE :

SNC SNO SNAME CITY S1 Smith London S2 Jones Paris S3 Blake Paris S4 Clark London S5 Adams Athens

CS CITY STATUS

Athens 30 London 20Paris 30

S SNC JOIN CS ... In other words, S satisfies

* { { SNO , SNAME , CITY } , { CITY , STATUS } }

Page 404: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 404

NORMAL FORMS (cont.) : 5NF : The "final" normal form!*—R is in 5NF iff, for every

nontrivial JD * {A,B,...,C} satisfied by R, each of A,B,...,C is a superkey [and keys can be ordered such that each adjacent pair is included in at least one of A, B, ..., C]

The JD * {A,B,...,C} is trivial iff at least one of A, B, ..., C = heading

Theorem (Date & Fagin 1991): 3NF and no composite keysimplies 5NF

And another: BCNF and not all key implies 5NF

* Well .... except for 6NF

Page 405: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 405

NORMAL FORMS (cont.) : 6NF : The true final normal form—R is in 6NF iff

the only JDs it satisfies are trivial ones

E.g., SP (but not S or P)

R is in 6NF iff the only JDs it satisfies are of the form*{...,{H},...}, where {H} is the heading

R is in 6NF iff it’s in 5NF, is of degree n, and has no keyof degree less than n-1 ... 6NF implies 5NF

E.g., PLUS{A,B,C} : 6NF (every key is of degree two)

Note: 6NF has extended defn in temporal DB context

Page 406: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 406

OBJECTIVES OF NORMALIZATION :

Reduce redundancy

Avoid update anomalies

"Better" representation of semantics

Easier enforcement of constraints (normalization to 5NF gives us a simple way of enforcing certain important and commonly occurring constraints)

Only need to enforce KEY UNIQUENES

All JDs (and so all MVDs and all FDs) will then be enforced automatically

Page 407: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 407

SOME REASONS WHY NORMALIZATIONIS NOT A PANACEA :

Enforces certain constraints very simply, but JDs etc. are not the ONLY kind of constraint

Decomposition is not unique, in general

Not all redundancies can be removed by taking projections

BCNF and "dependency preservation" objectives can conflict

In fact, normalization can cause some FDs (etc.) to cease to be FDs (etc.), since they now span relvars!

Some design issues are simply not addressed

Nevertheless ... DENORMALIZE ONLY AS A LAST RESORT !!!

Page 408: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 408

DENORMALIZATION CONSIDERED HARMFUL:

Almost always, anything less than full normalization is strongly contraindicated—even in a "direct image" implementation !!! /* big topic in its own right */

Fully normalized design is a "good" representation of the real world—intuitively easy to understand, good base for future growth

Everyone knows denormalization makes update harder ... but it can make retrieval harder too—see next page

Can be bad for performance as well!—usually means improving the performance of one application at the

expense of others

Page 409: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 409

S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 30 Paris note the S3 Blake 30 Paris change S4 Clark 20 London S5 Adams 30 Athens

DENORMALIZATION BAD FOR RETRIEVAL (example) :

Again suppose suppliers satisfy { CITY } { STATUS }:

Can be regarded as denormalization of SNC and CS /* see earlier */

"Find average city status" (i.e., 26.667)

Page 410: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 410

SELECT DISTINCT AVG (STATUS) AS REQDFROM S — result (incorrect): 26

SELECT DISTINCT AVG (DISTINCT STATUS) AS REQDFROM S — result (incorrect): 25

SELECT DISTINCT CITY, AVG (STATUS) AS REQDFROM S GROUP BY CITY — gives avg status per city, not overall avg

SELECT DISTINCT CITY, AVG (AVG (STATUS)) AS REQDFROM S GROUP BY CITY — syntax error

SELECT DISTINCT AVG (STATUS) AS REQDFROM ( SELECT DISTINCT CITY, STATUS FROM S ) AS POINTLESS — correct (at last!) ... — but is it supported?

Page 411: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 411

ORTHOGONALITY (a little more science!) :

Design theory is about reducing redundancy (true fact!) —but what’s redundancy ??? Well, certainly:

If DB is such that if tuple t appears at all it must appearmore than once, then DB clearly involves someredundancy

Note that normalization is precisely about eliminatingredundant appearances of the same tuple!

E.g., suppose once again that suppliers satisfy FD

{ CITY } { STATUS }

Page 412: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 412

S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 30 Paris note the S3 Blake 30 Paris change S4 Clark 20 London S5 Adams 30 Athens

(Sub)tuples <20,London> and <30,Paris> both appear twice(and do represent redundancy) ... /* recall that everysubset of a tuple is a tuple */

So normalize

Page 413: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 413

SNC SNO SNAME CITY S1 Smith London S2 Jones Paris S3 Blake Paris S4 Clark London S5 Adams Athens

CS CITY STATUS

Athens 30 London 20Paris 30

Now <20,London> and <30,Paris> both appear just once

Page 414: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 414

BUT WHAT ABOUT : /* part weight < 17.0 pounds */

LP P# PNAME COLOR WEIGHT CITY

P1 Nut Red 12.0 LondonP2 Bolt Green 17.0 ParisP3 Screw Blue 17.0 OsloP4 Screw Red 14.0 LondonP5 Cam Blue 12.0 Paris

HP P# PNAME COLOR WEIGHT CITY

P2 Bolt Green 17.0 ParisP3 Screw Blue 17.0 OsloP6 Cog Red 19.0 London

/* part weight > 17.0 pounds */

Page 415: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 415

Normalization doesn’t help … but problem is easy to see!

Relvar predicates for LP and HP "overlap"

I.e., they require tuples for parts with weight 17.0 pounds to appear in both relvars:

CONSTRAINT LP_AND_HP_OVERLAP( LP WHERE WEIGHT = 17.0 ) =( HP WHERE WEIGHT = 17.0 ) ;

So:

Page 416: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 416

THE PRINCIPLE OF ORTHOGONAL DESIGN :

First version: No two base relvars should be such thattheir relvar constraints might require the same tuple toappear in both

—McGoveran & Date 1994but somewhat revised here

Solves the LP / HP problem

Remember that (as far as the user is concerned) allrelvars in the DB are base relvars!

Orthogonality principle as stated applies to relvars ofthe same type … But what about:

Page 417: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 417

SX SNO SNAME STATUS SY SNO SNAME CITY

S1 Smith 20 S1 Smith LondonS2 Jones 10 S2 Jones ParisS3 Blake 30 S3 Blake ParisS4 Clark 20 S4 Clark LondonS5 Adams 30 S5 Adams Athens

Second version: Let A and B be distinct relvars. Then there should not exist nonloss decompositions of A and B into projections A1, …, Am and B1, …, Bn, respectively, such that the relvar constraints for some Ai and some Bj might require the same tuple to appear in both.

Subsumes first version … But what about:

Page 418: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 418

SX SNO SNAME STATUS SY ID LABEL CITY

S1 Smith 20 S1 Smith LondonS2 Jones 10 S2 Jones ParisS3 Blake 30 S3 Blake ParisS4 Clark 20 S4 Clark LondonS5 Adams 30 S5 Adams Athens

Oh, all right ...

Page 419: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 419

THE PRINCIPLE OF ORTHOGONAL DESIGN (final version) :

Let A and B be distinct relvars. Replace A and B by nonloss decompositions into projections A1, …, Am and B1, …, Bn, respectively, such that every Ai (i = 1, …, m) and every Bj (j = 1, …, n) is in 6NF. Let some i and j be such that there exists a sequence of zero or more attribute renamings with the property that (a) when applied to Ai, it produces Ak, and (b) Ak and Bj are of the same type. Then there must not exist a constraint to the effect that, at all times, (Ak WHERE ax) = (Bj WHERE bx), where ax and bx are restriction conditions, neither of which is a contradiction.

Subsumes second version

Page 420: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 420

ORTHOGONALITY COMPLEMENTS NORMALIZATION :

Consider again decomposition of S into SX and SY:

SX { SNO, SNAME, STATUS } SY { SNO, SNAME, CITY }

Satisfies all normalization principles!—

Both projections in 5NF

Decomposition nonloss

Dependencies preserved

Both projections needed in reconstruction

Orthogonality, not normalization, tells us the decomposition is bad

Page 421: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 421

POINTS ARISING : FORMALIZED COMMON SENSE (like normalization)

Reduces redundancy, avoids update anomalies (like normalization)

If R is decomposed via restriction, restrictions should be pairwise disjoint (and R should be reconstructable via disjoint union)

Orthogonal decomposition: Any decomposition that abides by The Principle of Orthogonal Design

No strong logical reasons for horizontal decomposition? (Contrast normalization)

Horizontal and vertical decomposition both lead to need for multi-relvar ("database") constraints

Page 422: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 422

ONE FURTHER POINT :Much confusion over this topic, even though the idea is basically so simple (mea culpa) …

Example where orthogonality is NOT violated (acks Hugh Darwen) … Consider predicates:

Employee ENO is on vacation

Employee ENO is awaiting phone number allocation

Obvious design:

ON_VACATION { ENO } NEEDS_PHONE { ENO }KEY { ENO } KEY { ENO }

Page 423: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 423

Same tuple can appear in ON_VACATION andNEEDS_PHONE—but different propositions / no redundancy /no violation of orthogonality

Note difference in kind between this example and LP / HPexample:

For LP / HP, there’s a formal constraint that a tuple must satisfy in order to be accepted into either relvar … and constraints "overlap"

For ON_VACATION / NEEDS_PHONE, no analogous property holds … DBMS must just trust the user!

Page 424: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 424

TO SUM UP : LOGICAL DATABASE DESIGN ... ... is, precisely, specifying constraints !!!

DB is supposed to be "a faithful representation of the realworld" ... It's constraints that represent semantics ... So:

1. Pin down relvar predicates as carefully as possible (albeit informally)

2. Map the output from the first step into relvars and corresponding constraints (some of which will involve FDs, MVDs, JDs in particular)

Note: "E/R modeling" is almost totally incapable of dealing with constraints!

Note: All of the above is highly relevant to what the commercial world calls business rules

Page 425: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 425

SOME REMARKS ON PHYSICAL DESIGN : Should follow logical design; automatable ??? … Not a farfetched idea

RM deliberately meant to give implementers freedom to implement the model any way they liked … But typically:

base relvar physical table

....attributes... .....fields......

tuple record tuple record tuple record tuple record tuple record

Page 426: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 426

Many things wrong with direct image style … In particular, almost no data independence !!!

Hence "denormalize for performance" (etc.)

But something better is on its way:

The TransRelationaltm Model

No penalty for full normalization!

MANY other advantages … including, possibly, a basis for a relational approach to missing information

Page 427: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 427

Suppose just two suppliers S1 and S2, and S2’s status is unknown ...

SN SS SC

SNO SNAME SNO STATUS SNO CITYS1 Smith S1 20 S1 LondonS2 Jones S2 Paris

If you don’t know something, better to say nothing at all!/* but be careful over relvar predicates */

Wovon man nicht reden kann, darüber muss man schweigen("whereof one cannot speak, thereon one must remain silent") —Wittgenstein

Page 428: Copyright © C.J. Date 2008. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or

Copyright C. J. Date 2008 page 428

1. Setting the scene 8. SQL and constraints

2. Types and domains 9. SQL and views

3. Tuples and relations, 10. SQL and logic I:rows and tables Relational calculus

4. No duplicates, no nulls 11. SQL and logic II:Using logic to write SQL

5. Base relvars, base tables12. Further SQL topics

6. SQL and algebra I:The original operators 13. Appendix:

The relational model7. SQL and algebra II:

Additional operators 14. Appendix: DB design

THANK YOU FOR LISTENING !!!