70
Database Systems – Set Theory Database Systems – Set Theory RELATIONS RELATIONS A relational database consists of tables, each of which is assigned a A relational database consists of tables, each of which is assigned a unique name. unique name. A row in a table represents a relationship among a set of values. A row in a table represents a relationship among a set of values. A table is a collection of such relationships. A table is a collection of such relationships. Column Headers are commonly referred to as attributes Column Headers are commonly referred to as attributes Websites-Schema=(website, organization, first-year, category) Websites-Schema=(website, organization, first-year, category) websites relation: websites relation: website website organization organization first- first- year year category category www.zojjed.com www.zojjed.com Walking Walking Promotions Promotions 2006 2006 Fiction Fiction www.racewalk.com www.racewalk.com Walking Walking Promotions Promotions 1995 1995 Health Health www.greattreks.com www.greattreks.com Walking Walking Promotions Promotions 2006 2006 Travel Travel www.twofeetgallery.com www.twofeetgallery.com Walking Walking Promotions Promotions 2004 2004 Photograph Photograph s www.walkinghealthy.com www.walkinghealthy.com Walking Walking Promotions Promotions 2002 2002 Health Health

Database Systems – Set Theory RELATIONS A relational database consists of tables, each of which is assigned a unique name. A row in a table represents

Embed Size (px)

Citation preview

Database Systems – Set TheoryDatabase Systems – Set TheoryRELATIONSRELATIONS

A relational database consists of tables, each of which is assigned a unique name.A relational database consists of tables, each of which is assigned a unique name. A row in a table represents a relationship among a set of values.A row in a table represents a relationship among a set of values. A table is a collection of such relationships.A table is a collection of such relationships. Column Headers are commonly referred to as attributesColumn Headers are commonly referred to as attributes

Websites-Schema=(website, organization, first-year, category)Websites-Schema=(website, organization, first-year, category)

websites relation:websites relation:

website website organization organization first-yearfirst-year categorycategory

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel

www.twofeetgallery.comwww.twofeetgallery.com Walking PromotionsWalking Promotions 20042004 PhotographsPhotographs

www.walkinghealthy.comwww.walkinghealthy.com Walking PromotionsWalking Promotions 20022002 HealthHealth

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage Drexel UniversityDrexel University 20052005 EducationEducation

Database Systems – Set TheoryDatabase Systems – Set TheoryRELATIONSRELATIONS

A relational database consists of tables, each of which is assigned a unique name.A relational database consists of tables, each of which is assigned a unique name. A row in a table represents a relationship among a set of values.A row in a table represents a relationship among a set of values. A table is a collection of such relationships.A table is a collection of such relationships. Column Headers are commonly referred to as attributesColumn Headers are commonly referred to as attributes

Websites-Schema=(website, organization, first-year, category)Websites-Schema=(website, organization, first-year, category)

websites relation:websites relation:

website website organization organization first-yearfirst-year categorycategory

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel

www.twofeetgallery.comwww.twofeetgallery.com Walking PromotionsWalking Promotions 20042004 PhotographsPhotographs

www.walkinghealthy.comwww.walkinghealthy.com Walking PromotionsWalking Promotions 20022002 HealthHealth

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage Drexel UniversityDrexel University 20052005 EducationEducation

Database Systems – Set TheoryDatabase Systems – Set Theory

DOMAINSDOMAINS

A Domain is the set of permitted values for a column/attribute. A Domain is the set of permitted values for a column/attribute. The domain can be any positive number as in the case with first yearThe domain can be any positive number as in the case with first year The domain can be a series of letters up to a maximum number of letters as in the The domain can be a series of letters up to a maximum number of letters as in the case with organization.case with organization. The domain can be valid web addresses, whose rules might be slightly more The domain can be valid web addresses, whose rules might be slightly more complicated.complicated.

IfIf

D1 denotes the set of all websitesD1 denotes the set of all websites

D2 denotes the set of all organizationsD2 denotes the set of all organizations

D3 denotes the set of all first yearsD3 denotes the set of all first years

D4 denotes the set of all categoriesD4 denotes the set of all categories

Any row of Any row of websiteswebsites must contain a 4-tuple(v1,v2,v3, v4) where must contain a 4-tuple(v1,v2,v3, v4) where

v1 is a website in the domain D1v1 is a website in the domain D1

v2 is a organization in the domain D2v2 is a organization in the domain D2

v3 is year in the domain D3v3 is year in the domain D3

V4 is a category in the domain D4V4 is a category in the domain D4

Therefore, account is a subset of D1xD2xD3xD4.Therefore, account is a subset of D1xD2xD3xD4.

Database Systems – Set TheoryDatabase Systems – Set Theory

DOMAINSDOMAINS

In general a table must be a subset of D1xD2x…xDn-1xDnIn general a table must be a subset of D1xD2x…xDn-1xDn

Tables vs. RelationsTables vs. Relations

There exists a close relationship between this language and the terminology used in There exists a close relationship between this language and the terminology used in databases. databases.

Instead of numbers DB’s use names.Instead of numbers DB’s use names.

Relation -> tableRelation -> table

tuple -> rowtuple -> row

Websites table has 6 tuples.Websites table has 6 tuples.

Database Systems – Set TheoryDatabase Systems – Set Theory

TUPLE NOTATIONTUPLE NOTATION

If t is a variable denoting the first tuple relationship, then t[website] denotes the If t is a variable denoting the first tuple relationship, then t[website] denotes the website of the tuple t.website of the tuple t.

t[website] = “www.zojjed.com”t[website] = “www.zojjed.com”

t[organization]=”Walking Promotions”t[organization]=”Walking Promotions”

t[first-year] = 2006t[first-year] = 2006

t[category] = “Fiction”t[category] = “Fiction”

AlternativelyAlternatively

t[1] = “www.zojjed.com”t[1] = “www.zojjed.com”

t[2]= ”Walking Promotions”t[2]= ”Walking Promotions”

t[3] = 2006t[3] = 2006

t[4] = “Fiction”t[4] = “Fiction”

t t r, indicate the tuple t is in the relation r r, indicate the tuple t is in the relation r

Database Systems – Set TheoryDatabase Systems – Set Theory

DOMAINSDOMAINS

It is possible for several attributes to have the same domain. It is possible for several attributes to have the same domain.

Later we will introduce a customer relation. It has a customer name, if I also had a Later we will introduce a customer relation. It has a customer name, if I also had a employee table with the field employee name, technically they both have the same employee table with the field employee name, technically they both have the same domain. domain.

It depends upon how you look at it. If the domain is the set of all possible names, this is It depends upon how you look at it. If the domain is the set of all possible names, this is true.true.

What about the domains website and first-year. They are incompatible.What about the domains website and first-year. They are incompatible.

What about website and category? While they both may allow the “same” values, I What about website and category? While they both may allow the “same” values, I would consider them as distinct domains.would consider them as distinct domains.

In a set, a attribute may contain the value Null. In a set, a attribute may contain the value Null.

For now we will assume they do not.For now we will assume they do not.

Database Systems – Set TheoryDatabase Systems – Set Theory

DATABASE SCHEMAS DATABASE SCHEMAS Logical design of the databaseLogical design of the database defines the type definition of a variabledefines the type definition of a variable

DATABASE INSTANCEDATABASE INSTANCE Snapshot of the database at a given timeSnapshot of the database at a given time an instance of a variablean instance of a variable

A database schema in relations is defined by using a capitalized name for the A database schema in relations is defined by using a capitalized name for the relationship-schema and a lowercase name of each attribute. An instance of a relation relationship-schema and a lowercase name of each attribute. An instance of a relation is represented by a lowercase name.is represented by a lowercase name.

Websites-schema(website, organization, first-year, category)Websites-schema(website, organization, first-year, category)

A relation on the Website-schema is as follows:A relation on the Website-schema is as follows:

websites(Website-schema)websites(Website-schema)

Side notes, very important:Side notes, very important:

A relation has no orderA relation has no order

A relation can not contain duplicate tuplesA relation can not contain duplicate tuples

Database Systems – Set TheoryDatabase Systems – Set Theory

Customers-Schema=(website, first-name, last-name)Customers-Schema=(website, first-name, last-name)

customers Relationcustomers Relation

websitewebsite first-namefirst-name last-namelast-name

www.zojjed.comwww.zojjed.com DerekDerek JeterJeter

www.zojjed.comwww.zojjed.com ChaseChase UtleyUtley

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage JeremyJeremy JohnsonJohnson

www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

www.zojjed.comwww.zojjed.com RyanRyan HowardHoward

Notice the website attribute appears in both the customers relation and Websites relation.

This is not a coincidence, often fields are repeated.

This allows distinct relations to be related.

If we wanted to gather website information for all websites from customers need information from both relations

Database Systems – Set TheoryDatabase Systems – Set Theory

Combined information from website and customers relationsCombined information from website and customers relations

website website categorycategory first-namefirst-name last-namelast-name

www.zojjed.comwww.zojjed.com FictionFiction DerekDerek JeterJeter

www.zojjed.comwww.zojjed.com FictionFiction ChaseChase UtleyUtley

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage EducationEducation JeremyJeremy JohnsonJohnson

www.racewalk.comwww.racewalk.com HealthHealth RyanRyan HowardHoward

www.zojjed.comwww.zojjed.com FictionFiction RyanRyan HowardHoward

In real databases, unique id fields would be used to identify the customer and the website so the website name would not be repeated

Database Systems – Set TheoryDatabase Systems – Set Theory

Instead of having two schemas, it’s possible to have one schema as follows:Instead of having two schemas, it’s possible to have one schema as follows:

WebsiteCustomers(website, organization, first-year, category, first-name, last-name)WebsiteCustomers(website, organization, first-year, category, first-name, last-name)

What is wrong with this?What is wrong with this?

Database Systems – Set TheoryDatabase Systems – Set Theory

Redundant data as well as null fields.Redundant data as well as null fields.

website website organization organization first-yearfirst-year categorycategory first-namefirst-name last-namelast-name

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction DerekDerek JeterJeter

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth RyanRyan HowardHoward

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel NullNull NullNull

www.twofeetgallery.comwww.twofeetgallery.com Walking PromotionsWalking Promotions 20042004 PhotographsPhotographs NullNull NullNull

www.walkinghealthy.comwww.walkinghealthy.com Walking PromotionsWalking Promotions 20022002 HealthHealth NullNull NullNull

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction RyanRyan HowardHoward

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction ChaseChase UtleyUtley

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage Drexel UniversityDrexel University 20052005 EducationEducation JeremyJeremy JohnsonJohnson

Database Systems – Set TheoryDatabase Systems – Set Theory

hit-counts-Schema= (website, date, hit-count)hit-counts-Schema= (website, date, hit-count)

hit-counts relationhit-counts relation

website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

www.twofeetgallery.comwww.twofeetgallery.com 5/20/20075/20/2007 3232

www.walkinghealthy.comwww.walkinghealthy.com 5/20/20075/20/2007 159159

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 5/20/20075/20/2007 376376

www.racewalk.comwww.racewalk.com 5/21/20075/21/2007 20992099

Is there anything wrong with the above relation?

Database Systems – Set TheoryDatabase Systems – Set Theory

hit-counts-Schema= (website, date, hit-count)hit-counts-Schema= (website, date, hit-count)

hit-counts relationhit-counts relation

website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

www.twofeetgallery.comwww.twofeetgallery.com 5/20/20075/20/2007 3232

www.walkinghealthy.comwww.walkinghealthy.com 5/20/20075/20/2007 159159

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 5/20/20075/20/2007 376376

www.racewalk.comwww.racewalk.com 5/21/20075/21/2007 20992099

Is there anything wrong with the above relation?

No there is no reason why we can not list a website more than once.

Database Systems – Set TheoryDatabase Systems – Set Theory

If we did not care about the date and only cared about the hit count, could we define If we did not care about the date and only cared about the hit count, could we define the hit-counts Schema as follows:the hit-counts Schema as follows:

hit-counts-Schema= (website, hit-count)hit-counts-Schema= (website, hit-count)

hit-counts relation:hit-counts relation:

website website hit-counthit-count

www.zojjed.comwww.zojjed.com 55

www.racewalk.comwww.racewalk.com 20192019

www.greattreks.comwww.greattreks.com 10501050

www.twofeetgallery.comwww.twofeetgallery.com 3232

www.walkinghealthy.comwww.walkinghealthy.com 159159

www.zojjed.comwww.zojjed.com 66

www.zojjed.comwww.zojjed.com 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 376376

www.racewalk.comwww.racewalk.com 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

If we did not care about the date and only cared about the hit count, could we define If we did not care about the date and only cared about the hit count, could we define the hit-counts Schema as follows:the hit-counts Schema as follows:

hit-counts-Schema= (website, hit-count)hit-counts-Schema= (website, hit-count)

hit-counts relation:hit-counts relation:

website website hit-counthit-count

www.zojjed.comwww.zojjed.com 55

www.racewalk.comwww.racewalk.com 20192019

www.greattreks.comwww.greattreks.com 10501050

www.twofeetgallery.comwww.twofeetgallery.com 3232

www.walkinghealthy.comwww.walkinghealthy.com 159159

www.zojjed.comwww.zojjed.com 66

www.zojjed.comwww.zojjed.com 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 376376

www.racewalk.comwww.racewalk.com 20992099

In real databases there would be no problem, but we said that you can not repeat In real databases there would be no problem, but we said that you can not repeat tuples in a relation. So the answer is no.tuples in a relation. So the answer is no.

Database Systems – Set TheoryDatabase Systems – Set Theory

QUERY LANGUAGESQUERY LANGUAGES

A query language is a language in which the user request information from the A query language is a language in which the user request information from the database.database.

Can be procedural or non-procedural.Can be procedural or non-procedural.

We will study We will study Relational AlgebraRelational Algebra

It Is a procedural language consisting of sets of operations that take one or two It Is a procedural language consisting of sets of operations that take one or two relations as input and output a relation. Operations include:relations as input and output a relation. Operations include: selectselect projectproject unionunion set difference set difference Cartesian productCartesian product RenameRename IntersectionIntersection Aggregate functionsAggregate functions

We will also study various forms of joining relations. We will also study various forms of joining relations.

Database Systems – Set TheoryDatabase Systems – Set Theory

Unary- operates on one relationUnary- operates on one relation

Binary – operates on a pair of relationsBinary – operates on a pair of relations

The Select OperationThe Select Operation

Unary operationUnary operation

Selects tuples that satisfy a given predicateSelects tuples that satisfy a given predicate

- represents a select operation - sigma- represents a select operation - sigma

<select condition>(R)<select condition>(R)

<selection condition> = <attribute name> <comparison op> <constant value> or <selection condition> = <attribute name> <comparison op> <constant value> or

<selection condition> = <attribute name> <comparison op> <attribute name><selection condition> = <attribute name> <comparison op> <attribute name>

comparison operators are: =, <>, <, <=, >, >=comparison operators are: =, <>, <, <=, >, >=

equal, not equal, less than, less than or equal to, greater than, greater than or equal toequal, not equal, less than, less than or equal to, greater than, greater than or equal to

Database Systems – Set TheoryDatabase Systems – Set Theory

To select those tuples of the hit-counts relation where the website is “www.zojjed.com” To select those tuples of the hit-counts relation where the website is “www.zojjed.com” we write.we write.

website = “www.zojjed.com”website = “www.zojjed.com”

((hit-counts) hit-counts)

This returns the relation:This returns the relation:website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

hit-counts relationhit-counts relation

website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

www.twofeetgallery.comwww.twofeetgallery.com 5/20/20075/20/2007 3232

www.walkinghealthy.comwww.walkinghealthy.com 5/20/20075/20/2007 159159

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 5/20/20075/20/2007 376376

www.racewalk.comwww.racewalk.com 5/21/20075/21/2007 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

To select those tuples of the hit-counts relation where the hit-count is greater than 1000 To select those tuples of the hit-counts relation where the hit-count is greater than 1000 we write.we write.

hit-count > 1000 hit-count > 1000

((hit-counts) hit-counts)

This returns the relation:This returns the relation:website website datedate hit-counthit-count

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

ww.racewalk.comww.racewalk.com 5/21/20075/21/2007 20992099

hit-counts relationhit-counts relation

website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

www.twofeetgallery.comwww.twofeetgallery.com 5/20/20075/20/2007 3232

www.walkinghealthy.comwww.walkinghealthy.com 5/20/20075/20/2007 159159

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 5/20/20075/20/2007 376376

www.racewalk.comwww.racewalk.com 5/21/20075/21/2007 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

Can combine predicates with and, or, and notCan combine predicates with and, or, and not

To select those tuples of the hit-counts relation where the hit-count is greater than 5 To select those tuples of the hit-counts relation where the hit-count is greater than 5 and the website is www.zojjed.com, we write.and the website is www.zojjed.com, we write.

hit-count > 5 and website = “www.zojjed.com”hit-count > 5 and website = “www.zojjed.com”

((hit-counts) hit-counts)

This returns the relation:This returns the relation:website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

website = “www.zojjed.com”website = “www.zojjed.com”

hit-count > 5hit-count > 5website website datedate hit-counthit-count

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

www.twofeetgallery.comwww.twofeetgallery.com 5/20/20075/20/2007 3232

www.walkinghealthy.comwww.walkinghealthy.com 5/20/20075/20/2007 159159

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 5/20/20075/20/2007 376376

www.racewalk.comwww.racewalk.com 5/21/20075/21/2007 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

The Project OperationThe Project Operation

unaryunary

returns arguments in relation without all attributesreturns arguments in relation without all attributes

duplicates are removedduplicates are removed

- represent project operation - pi- represent project operation - pi

<attribute list> (R)<attribute list> (R)

website, category(Websites)website, category(Websites)

website website categorycategory

www.zojjed.comwww.zojjed.com FictionFiction

www.racewalk.comwww.racewalk.com HealthHealth

www.greattreks.comwww.greattreks.com TravelTravel

www.twofeetgallery.comwww.twofeetgallery.com PhotographsPhotographs

www.walkinghealthy.comwww.walkinghealthy.com HealthHealth

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage EducationEducation

Database Systems – Set TheoryDatabase Systems – Set Theory

Composition of Relational OperationsComposition of Relational Operations

Often we need to combine operations. Often we wish to select a set of tuples and limit Often we need to combine operations. Often we wish to select a set of tuples and limit the relation returned to a few attributes.the relation returned to a few attributes.

What if we want to find out only the websites that have had greater than 1000 hits in a What if we want to find out only the websites that have had greater than 1000 hits in a given day?given day?

First we must find out what tuples have hit counts greater than 1000. First we must find out what tuples have hit counts greater than 1000.

We can accomplish this with the following relational query:We can accomplish this with the following relational query:

hit-count>1000 hit-count>1000

(hit-counts)(hit-counts)

By using the Project operation we can remove the extra attributes such as hit-count By using the Project operation we can remove the extra attributes such as hit-count and date and only return the values in the website column.and date and only return the values in the website column.

website(website(hit-count>1000 hit-count>1000

(hit-counts))(hit-counts))

What is the relation that is returned?What is the relation that is returned?

Database Systems – Set TheoryDatabase Systems – Set TheoryWhat if we want to find out only the websites that have had greater than 1000 hits in a What if we want to find out only the websites that have had greater than 1000 hits in a given day?given day?

website(website(hit-count>1000 hit-count>1000

(hit-counts))(hit-counts))

What is the relation that is returned?What is the relation that is returned?

website website

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

hit-counts relationhit-counts relation

website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

www.twofeetgallery.comwww.twofeetgallery.com 5/20/20075/20/2007 3232

www.walkinghealthy.comwww.walkinghealthy.com 5/20/20075/20/2007 159159

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 5/20/20075/20/2007 376376

www.racewalk.comwww.racewalk.com 5/21/20075/21/2007 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

Union OperatorUnion Operator

binarybinary

- union operator- union operator

It is often useful to combine the results of queries.It is often useful to combine the results of queries.

Again, remember that set theory removes duplicates.Again, remember that set theory removes duplicates.

Relation 1 Relation 1 Relation 2 = Result Set Relation 2 = Result Set

Database Systems – Set TheoryDatabase Systems – Set Theory

What is a query that returns all websites that have customers What is a query that returns all websites that have customers OROR a hit count greater a hit count greater than 1000?than 1000?

We need information from both the customers relation as well as the hit count relation.We need information from both the customers relation as well as the hit count relation.

First we need the names of all websites that have customersFirst we need the names of all websites that have customers

website(customers)website(customers)

websitewebsite

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage

www.racewalk.comwww.racewalk.com

www.zojjed.comwww.zojjed.com

customers Relationcustomers Relation

websitewebsite first-namefirst-name last-namelast-name

www.zojjed.comwww.zojjed.com DerekDerek JeterJeter

www.zojjed.comwww.zojjed.com ChaseChase UtleyUtley

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage JeremyJeremy JohnsonJohnson

www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

www.zojjed.comwww.zojjed.com RyanRyan HowardHoward

Database Systems – Set TheoryDatabase Systems – Set TheoryThen we need the names of the websites that have a hit count greater than 1000:Then we need the names of the websites that have a hit count greater than 1000:

website(website(hit-count>1000 hit-count>1000

(hit-counts)) (hit-counts))

website website

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

hit-counts relationhit-counts relation

website website datedate hit-counthit-count

www.zojjed.comwww.zojjed.com 5/20/20075/20/2007 55

www.racewalk.comwww.racewalk.com 5/20/20075/20/2007 20192019

www.greattreks.comwww.greattreks.com 5/20/20075/20/2007 10501050

www.twofeetgallery.comwww.twofeetgallery.com 5/20/20075/20/2007 3232

www.walkinghealthy.comwww.walkinghealthy.com 5/20/20075/20/2007 159159

www.zojjed.comwww.zojjed.com 5/21/20075/21/2007 66

www.zojjed.comwww.zojjed.com 5/22/20075/22/2007 55

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage 5/20/20075/20/2007 376376

www.racewalk.comwww.racewalk.com 5/21/20075/21/2007 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

Combine the results using a union operationCombine the results using a union operation

website(customers) website(customers) website(website(hit-count>1000 hit-count>1000

(hit-counts)) (hit-counts))

Remember, order not important!

• Unions MUST be of similar types• They MUST have the same number of attributes• The domains of the attributes MUST be the same

websitewebsite

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

www.zojjed.comwww.zojjed.com

websitewebsite

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage

www.racewalk.comwww.racewalk.com

www.zojjed.comwww.zojjed.com

website website

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

website(customers)website(customers)website(website(hit-count>1000 (hit-counts))hit-count>1000 (hit-counts))

Database Systems – Set TheoryDatabase Systems – Set Theory

Intersection OperatorIntersection Operator

binarybinary

∩ ∩ - intersection operator- intersection operator

Returns all tuples contained within both relationsReturns all tuples contained within both relations

Relation 1 ∩ Relation 2 = Result SetRelation 1 ∩ Relation 2 = Result Set

Database Systems – Set TheoryDatabase Systems – Set Theory

What is a query that returns all websites that have customers What is a query that returns all websites that have customers ANDAND a hit count greater a hit count greater than 1000? than 1000?

First we need the names of all websites that have customersFirst we need the names of all websites that have customers

website(customers)website(customers)

websitewebsite

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage

www.racewalk.comwww.racewalk.com

www.zojjed.comwww.zojjed.com

Then we need the names of the websites that have a hit count greater than 1000:Then we need the names of the websites that have a hit count greater than 1000:

website(website(hit-count>1000 hit-count>1000

(hit-counts)) (hit-counts))

website website

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

Database Systems – Set TheoryDatabase Systems – Set Theory

What is a query that returns all websites that have customers What is a query that returns all websites that have customers ANDAND a hit a hit count greater than 1000? count greater than 1000?

website(customers) ∩ website(customers) ∩ website(website(hit-count>1000 hit-count>1000

(hit-counts)) (hit-counts))

websitewebsite

www.racewalk.comwww.racewalk.com

websitewebsite

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage

www.racewalk.comwww.racewalk.com

www.zojjed.comwww.zojjed.com

website website

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

website(customers)website(customers)website(website(hit-count>1000 (hit-counts))hit-count>1000 (hit-counts))

Database Systems – Set TheoryDatabase Systems – Set Theory

The Set Difference Operation (MINUS)The Set Difference Operation (MINUS)

binarybinary

-, denotes set difference-, denotes set difference

Relation 1 - Relation 2 = Result SetRelation 1 - Relation 2 = Result Set

Finds tuples in one set but not in anotherFinds tuples in one set but not in another

r – s, produces a set containing those tuples in r but not in s.r – s, produces a set containing those tuples in r but not in s.

Database Systems – Set TheoryDatabase Systems – Set Theory

Produce a list of websites who have a hit count > 1000 and Produce a list of websites who have a hit count > 1000 and do not havedo not have a a customer.customer.

We need the names of all websites that have customersWe need the names of all websites that have customerswebsite(customers)website(customers)

websitewebsite

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage

www.racewalk.comwww.racewalk.com

www.zojjed.comwww.zojjed.com

Then we need the names of the websites that have a hit count greater than 1000:Then we need the names of the websites that have a hit count greater than 1000:

website(website(hit-count>1000 hit-count>1000

(hit-counts)) (hit-counts))

website website

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

website(website(hit-count>1000 (hit-counts)) - hit-count>1000 (hit-counts)) - website(customers)website(customers)

Database Systems – Set TheoryDatabase Systems – Set Theory

Produce a list of websites who have a hit count > 1000 and Produce a list of websites who have a hit count > 1000 and do not havedo not have a customer. a customer.

website(website(hit-count>1000 (hit-counts)) - hit-count>1000 (hit-counts)) - website(customers)website(customers)

websitewebsite

www.greattreks.comwww.greattreks.com

websitewebsite

www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage

www.racewalk.comwww.racewalk.com

www.zojjed.comwww.zojjed.com

website website

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

website(customers)website(customers)website(website(hit-count>1000 (hit-counts))hit-count>1000 (hit-counts))

Notice the attributes in R1 that are not in R2 are included in the result set, but the Notice the attributes in R1 that are not in R2 are included in the result set, but the attributes in R2 that are not in R1 are not included in the result set.attributes in R2 that are not in R1 are not included in the result set.

Database Systems – Set TheoryDatabase Systems – Set Theory

What is the result of the following?What is the result of the following?

website(empty set) - website(empty set) - website(websites)website(websites)

website website organization organization first-yearfirst-year categorycategory

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel

www.twofeetgallery.comwww.twofeetgallery.com Walking PromotionsWalking Promotions 20042004 PhotographsPhotographs

www.walkinghealthy.comwww.walkinghealthy.com Walking PromotionsWalking Promotions 20022002 HealthHealth

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage Drexel UniversityDrexel University 20052005 EducationEducation

Given the relation websites below:Given the relation websites below:

Database Systems – Set TheoryDatabase Systems – Set Theory

What is the result of the following?What is the result of the following?

website(empty set) - website(empty set) - website(websites)website(websites)

website website organization organization first-yearfirst-year categorycategory

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel

www.twofeetgallery.comwww.twofeetgallery.com Walking PromotionsWalking Promotions 20042004 PhotographsPhotographs

www.walkinghealthy.comwww.walkinghealthy.com Walking PromotionsWalking Promotions 20022002 HealthHealth

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage Drexel UniversityDrexel University 20052005 EducationEducation

Given the relation websites below:Given the relation websites below:

The result is the empty set, because no records are contained in “R1” and only the The result is the empty set, because no records are contained in “R1” and only the records in R1 that are not in R2 are returned from the MINUS operator.records in R1 that are not in R2 are returned from the MINUS operator.

Database Systems – Set TheoryDatabase Systems – Set Theory

What is the result of the following?What is the result of the following?

website(websites) - website(websites) - website(empty set)website(empty set)

website website organization organization first-yearfirst-year categorycategory

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel

www.twofeetgallery.comwww.twofeetgallery.com Walking PromotionsWalking Promotions 20042004 PhotographsPhotographs

www.walkinghealthy.comwww.walkinghealthy.com Walking PromotionsWalking Promotions 20022002 HealthHealth

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage Drexel UniversityDrexel University 20052005 EducationEducation

Given the relation websites below:

Database Systems – Set TheoryDatabase Systems – Set Theory

website website

www.zojjed.comwww.zojjed.com

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

www.twofeetgallery.comwww.twofeetgallery.com

www.walkinghealthy.comwww.walkinghealthy.com

www.cs.drexel.edu/~jsalvagewww.cs.drexel.edu/~jsalvage

The result of website(websites) - website(websites) - website(empty set) is the complete relation website(empty set) is the complete relation websites, since no records are contained in the empty set all records from the websites, since no records are contained in the empty set all records from the websites relation are included in the result set.websites relation are included in the result set.

Database Systems – Set TheoryDatabase Systems – Set Theory

The Cartesian-Product OperationThe Cartesian-Product Operationbinarybinary

x – combines information in two relationsx – combines information in two relations

Relation 1 x Relation 2 = Result SetRelation 1 x Relation 2 = Result Set

because attributes can be repeated in different relations, we need a notationbecause attributes can be repeated in different relations, we need a notationrelation.attribute will be used. relation.attribute will be used.

Therefore, the resulting schema of r = websites x customersTherefore, the resulting schema of r = websites x customers

(websites.website, websites.organization, websites.first-year, websites.category, (websites.website, websites.organization, websites.first-year, websites.category, customers.website, customers.first-name, customers.last-name)customers.website, customers.first-name, customers.last-name)

Note that issues exist if you wish to use the same relation twice, we will address this Note that issues exist if you wish to use the same relation twice, we will address this with the rename operation shortly.with the rename operation shortly.

What tuples exist in r if r = websites x customers?What tuples exist in r if r = websites x customers?

The combination of all tuples in websites with every tuple in customers.The combination of all tuples in websites with every tuple in customers.

Given r1 with n1 tuples and r2 with n2 tuples then r1xr2 has n1*n2 tuplesGiven r1 with n1 tuples and r2 with n2 tuples then r1xr2 has n1*n2 tuples

Database Systems – Set TheoryDatabase Systems – Set Theory

Let’s look at a simplified example first.Let’s look at a simplified example first.

If relation R1 contains the following:If relation R1 contains the following:

Value1Value1

11

22

33

and if relation R2 contains the following:and if relation R2 contains the following:

Value2Value2

AA

BB

CC

Then R1 x R2 contains the following:

R1.Value 1R1.Value 1 R2.Value 2R2.Value 2

11 AA

11 BB

11 CC

22 AA

22 BB

22 CC

33 AA

33 BB

33 CC

Database Systems – Set TheoryDatabase Systems – Set Theory

Similarly, websites x customers appears as follows:Similarly, websites x customers appears as follows:

websites.website websites.website organization organization first-yearfirst-year categorycategory customers.websitecustomers.website first-namefirst-name last-namelast-name

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction www.zojjed.comwww.zojjed.com DerekDerek JeterJeter

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction www.zojjed.comwww.zojjed.com ChaseChase UtleyUtley

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage JeremyJeremy JohnsonJohnson

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

www.zojjed.comwww.zojjed.com Walking PromotionsWalking Promotions 20062006 FictionFiction www.zojjed.comwww.zojjed.com RyanRyan HowardHoward

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com DerekDerek JeterJeter

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com ChaseChase UtleyUtley

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage JeremyJeremy JohnsonJohnson

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com RyanRyan HowardHoward

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel www.zojjed.comwww.zojjed.com DerekDerek JeterJeter

www.greattreks.com www.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel www.zojjed.comwww.zojjed.com ChaseChase UtleyUtley

www.greattreks.comwww.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage JeremyJeremy JohnsonJohnson

www.greattreks.com www.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

www.greattreks.com www.greattreks.com Walking PromotionsWalking Promotions 20062006 TravelTravel www.zojjed.comwww.zojjed.com RyanRyan HowardHoward

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

..

Database Systems – Set TheoryDatabase Systems – Set Theory

What if we want to find all the customers who bought from a website created before What if we want to find all the customers who bought from a website created before the year 2000?the year 2000?

We could try the following:We could try the following:

first-year <2000first-year <2000

(websites x customers) (websites x customers)

Note that we are not using a projection to reduce the number of names to show what is Note that we are not using a projection to reduce the number of names to show what is really happening. In the end, you would use a projection to show only the fields really happening. In the end, you would use a projection to show only the fields requested by the question.requested by the question.

websites.website websites.website organization organization first-yearfirst-year categorycategory customers.websitecustomers.website first-namefirst-name last-namelast-name

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com DerekDerek JeterJeter

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com ChaseChase UtleyUtley

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage JeremyJeremy JohnsonJohnson

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com RyanRyan HowardHoward

Oops, too many tuples!

Database Systems – Set TheoryDatabase Systems – Set Theory

Because the Cartesian-product pairs all possible tuples from websites are combined Because the Cartesian-product pairs all possible tuples from websites are combined with all tuples from customers. While only those with the first-year < 2000 are with all tuples from customers. While only those with the first-year < 2000 are selected, it still returns 5 tuples.selected, it still returns 5 tuples...Of those sets, we only want the ones where the websites relation’s website attribute Of those sets, we only want the ones where the websites relation’s website attribute equals the customers relation’s website attribute.equals the customers relation’s website attribute.

websites.website websites.website organization organization first-yearfirst-year categorycategory customers.websitecustomers.website first-namefirst-name last-namelast-name

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com DerekDerek JeterJeter

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com ChaseChase UtleyUtley

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.drexel.edu/~jsalvagewww.drexel.edu/~jsalvage JeremyJeremy JohnsonJohnson

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.zojjed.comwww.zojjed.com RyanRyan HowardHoward

The only tuple we truly want is the highlighted tuple.

we can write this as follows:

websites.website = customers.website(first-year <2000first-year <2000

(websites x customers) (websites x customers))

Database Systems – Set TheoryDatabase Systems – Set Theory

Since, Since, websites.website = customers.website(

first-year <2000 first-year <2000 (websites x customers)(websites x customers))

Returns the following tuple with too many attributes, we must also use a projection to remove the excessive attributes.

Applying the projection of first-name, last name to the previous query gives us the following query:

first-name, last-name(websites.website = customers.website(first-year <2000first-year <2000

(websites x (websites x

customers)customers)))

websites.website websites.website organization organization first-yearfirst-year categorycategory customers.websitecustomers.website first-namefirst-name last-namelast-name

www.racewalk.comwww.racewalk.com Walking PromotionsWalking Promotions 19951995 HealthHealth www.racewalk.comwww.racewalk.com RyanRyan HowardHoward

Database Systems – Set TheoryDatabase Systems – Set Theory

The Assignment OperatorThe Assignment Operatorunaryunaryallows an expression to be assigned to a variableallows an expression to be assigned to a variableNewRelation NewRelation OldRelation OldRelation

For example:For example:

1200loans 1200loans amount > 1200(loan)amount > 1200(loan)or or

result result loan-number(1200loans)loan-number(1200loans)

Or Or

The Rename OperationUnary

x(E) renames the expression E to x.

Relational-algebra expressions do not have a name that we can refer to them by using the rename operator,

is roh.

Database Systems – Set TheoryDatabase Systems – Set Theory

Example, without using an aggregation function (not yet shown), find the largest hit Example, without using an aggregation function (not yet shown), find the largest hit count of any website for a single day. If the same max hit count exists more than count of any website for a single day. If the same max hit count exists more than once, you are only allowed to return a single tuple containing the answer.once, you are only allowed to return a single tuple containing the answer.

We accomplish this in two steps:We accomplish this in two steps:

First, compute a temporary relationship consisting of hit counts not greater than the First, compute a temporary relationship consisting of hit counts not greater than the largest hit count.largest hit count.

Second, take the set difference between the relation Second, take the set difference between the relation hit-count(hit-counts) and the hit-count(hit-counts) and the temporary relationtemporary relation

Compute all the websites hit counts compared to all the websites hit counts, in other Compute all the websites hit counts compared to all the websites hit counts, in other words compute the Cartesian product of the relation hit-counts with itself.words compute the Cartesian product of the relation hit-counts with itself.

hit-counts x hit-countshit-counts x hit-counts

However, we must rename one of the hit-counts relations so that we can identify the However, we must rename one of the hit-counts relations so that we can identify the balance distinctlybalance distinctly

Database Systems – Set TheoryDatabase Systems – Set Theory

Given the projection of only the hit-count field from the relation hit-counts viaGiven the projection of only the hit-count field from the relation hit-counts via

hit-counthit-count

(hit-counts)(hit-counts)

We have:We have:

hit-counthit-count

55

20192019

10501050

3232

159159

66

376376

20992099

If we rename the result of this projectionIf we rename the result of this projection

dd (hit-count(hit-counts))hit-count(hit-counts))

And thus create a cross product of the two relations as And thus create a cross product of the two relations as

hit-count(hit-counts)hit-count(hit-counts) x dd (hit-count(hit-counts))hit-count(hit-counts))

Database Systems – Set TheoryDatabase Systems – Set Theory

hit-count(hit-counts)hit-count(hit-counts) x dd (hit-count(hit-counts))hit-count(hit-counts))

hit-counthit-count d(hit-count)d(hit-count)

55 55

20192019 55

10501050 55

3232 55

159159 55

66 55

376376 55

20992099 55

55 20192019

20192019 20192019

10501050 20192019

3232 20192019

159159 20192019

66 20192019

376376 20192019

20992099 20192019

hit-counthit-count d(hit-count)d(hit-count)

55 10501050

20192019 10501050

10501050 10501050

3232 10501050

159159 10501050

66 10501050

376376 10501050

20992099 10501050

55 3232

20192019 3232

10501050 3232

3232 3232

159159 3232

66 3232

376376 3232

20992099 3232

hit-counthit-count d(hit-count)d(hit-count)

55 159159

20192019 159159

10501050 159159

3232 159159

159159 159159

66 159159

376376 159159

20992099 159159

55 66

20192019 66

10501050 66

3232 66

159159 66

66 66

376376 66

20992099 66

Database Systems – Set TheoryDatabase Systems – Set Theory

hit-count(hit-counts)hit-count(hit-counts) x dd (hit-count(hit-counts))hit-count(hit-counts))

hit-counthit-count d(hit-count)d(hit-count)

55 376376

20192019 376376

10501050 376376

3232 376376

159159 376376

66 376376

376376 376376

20992099 376376

55 20992099

20192019 20992099

10501050 20992099

3232 20992099

159159 20992099

66 20992099

376376 20992099

20992099 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

Now we select only those tuples that have the first attibute containing a Now we select only those tuples that have the first attibute containing a value less than the second attribute, we do so with the following query:value less than the second attribute, we do so with the following query:

hitcounts.hit-count < d.hit-counthitcounts.hit-count < d.hit-count

(hit-count(hit-counts)hit-count(hit-counts) x dd (hit-count(hit-counts)))hit-count(hit-counts)))

hit-counthit-count d(hit-count)d(hit-count)

55 55

20192019 55

10501050 55

3232 55

159159 55

66 55

376376 55

20992099 55

55 20192019

20192019 20192019

10501050 20192019

3232 20192019

159159 20192019

66 20192019

376376 20192019

20992099 20192019

hit-counthit-count d(hit-count)d(hit-count)

55 10501050

20192019 10501050

10501050 10501050

3232 10501050

159159 10501050

66 10501050

376376 10501050

20992099 10501050

55 3232

20192019 3232

10501050 3232

3232 3232

159159 3232

66 3232

376376 3232

20992099 3232

hit-counthit-count d(hit-count)d(hit-count)

55 159159

20192019 159159

10501050 159159

3232 159159

159159 159159

66 159159

376376 159159

20992099 159159

55 66

20192019 66

10501050 66

3232 66

159159 66

66 66

376376 66

20992099 66

Database Systems – Set TheoryDatabase Systems – Set Theory

hit-count(hit-counts)hit-count(hit-counts) x dd (hit-count(hit-counts))hit-count(hit-counts))

hit-counthit-count d(hit-count)d(hit-count)

55 376376

20192019 376376

10501050 376376

3232 376376

159159 376376

66 376376

376376 376376

20992099 376376

55 20992099

20192019 20992099

10501050 20992099

3232 20992099

159159 20992099

66 20992099

376376 20992099

20992099 20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

This certainly gives us a lot of tuples, but if we then project just the hit-count from the This certainly gives us a lot of tuples, but if we then project just the hit-count from the first column and remove the duplicates, we are left with the following:first column and remove the duplicates, we are left with the following:

hit-counthit-count

55

20192019

10501050

3232

159159

66

376376

This is the set containing all hit counts, but the largest hit count.This is the set containing all hit counts, but the largest hit count.

Database Systems – Set TheoryDatabase Systems – Set Theory

To get just the largest hit count we now simply subtract our result set from the To get just the largest hit count we now simply subtract our result set from the projection of the original hit count relation as follows:projection of the original hit count relation as follows:

hit-count(hit-counts) -hit-count(hit-counts) - hit-counthit-count((hitcounts.hit-count < d.hit-counthitcounts.hit-count < d.hit-count

(hit-count(hit-counts)hit-count(hit-counts) x dd

(hit-count(hit-counts))))hit-count(hit-counts))))

hit-counthit-count

20992099

Database Systems – Set TheoryDatabase Systems – Set Theory

We need a better way to represent certain queries as the notation for joining two We need a better way to represent certain queries as the notation for joining two relations and only selecting records where the attributes match is too cumbersome. relations and only selecting records where the attributes match is too cumbersome. Therefore we have:Therefore we have:

The Natural Join OperationThe Natural Join OperationBinaryBinary

Result Set = R1 Result Set = R1 |x| R2|x| R2

The natural join operation finds the Cartesian product of two relations, but only returns The natural join operation finds the Cartesian product of two relations, but only returns tuples where the attributes whose names are the same in both relations contain the tuples where the attributes whose names are the same in both relations contain the same values.same values.

Database Systems – Set TheoryDatabase Systems – Set Theory

Let’s look at a simplified example first.Let’s look at a simplified example first.

If relation R1 contains the following:If relation R1 contains the following:

Value1Value1 Value2Value2

11 XX

22 YY

33 ZZ

and if relation R2 contains the following:and if relation R2 contains the following:

Value2Value2 Value3Value3

XX AA

ZZ BB

AA CC

Database Systems – Set TheoryDatabase Systems – Set Theory

Then R1 x R2 contains the following:

R1.Value 1R1.Value 1 R1.Value 2R1.Value 2 R2.Value2R2.Value2 R2.Value3R2.Value3

11 XX XX AA

11 XX ZZ BB

11 XX AA CC

22 YY XX AA

22 YY ZZ BB

22 YY AA CC

33 ZZ XX AA

33 ZZ ZZ BB

33 ZZ AA CC

R1.Value 1R1.Value 1 R1.Value 2R1.Value 2 R2.Value2R2.Value2 R2.Value3R2.Value3

11 XX XX AA

33 ZZ ZZ BB

Then R1 |x||x| R2 contains the following:

Database Systems – Set TheoryDatabase Systems – Set Theory

Example:Example:

Find the names of all customers who have made a purchase from a health or travel Find the names of all customers who have made a purchase from a health or travel website. Return the name of the customer, the website, and the category of the website. Return the name of the customer, the website, and the category of the website.website.

The old way:The old way:

Form a Cartesian product of the websites and customers relations.Form a Cartesian product of the websites and customers relations.

Select the tuples of the same website as well as a category equal to “health” or Select the tuples of the same website as well as a category equal to “health” or “travel.” “travel.”

Project the first-name, last-name, website, and categoryProject the first-name, last-name, website, and category

first-name, last-name, website, categoryfirst-name, last-name, website, category

((websites.website = customers.website and websites.website = customers.website and

category = “Health” or category = “Travel”category = “Health” or category = “Travel”(websites x customers))(websites x customers))

Database Systems – Set TheoryDatabase Systems – Set Theory

Another example: Another example:

Find all the names of websites and the dates they have a hit count for web sites that Find all the names of websites and the dates they have a hit count for web sites that are in the health or travel category.are in the health or travel category.

Database Systems – Set TheoryDatabase Systems – Set Theory

Another example: Another example:

Find all the names of websites and the dates they have a hit count for web sites that Find all the names of websites and the dates they have a hit count for web sites that are in the health category.are in the health category.

website, datewebsite, date ( (category = “Health” or category = “Travel”category = “Health” or category = “Travel” (websites (websites |x||x| hit-counts))hit-counts))

Database Systems – Set TheoryDatabase Systems – Set Theory

Generalized ProjectionsGeneralized Projections

Allows basic arithmetic operations within fields of a tupleAllows basic arithmetic operations within fields of a tupleObserve the Sales relation:Observe the Sales relation:

productproduct first-namefirst-name last-namelast-name taxtax total-costtotal-cost

Zojjed!Zojjed! DerekDerek JeterJeter 1.001.00 17.9517.95

Zojjed!Zojjed! ChaseChase UtleyUtley 1.001.00 17.9517.95

VB .NET CoachVB .NET Coach JeremyJeremy JohnsonJohnson 00 54.9554.95

Race Walk Like A ChampionRace Walk Like A Champion RyanRyan HowardHoward 1.251.25 25.9525.95

Zojjed!Zojjed! RyanRyan HowardHoward 00 16.9516.95

What was the price of the cost of the product sold minus the tax paid?

product, first-name, last-name, (total-cost – tax) as net-payproduct, first-name, last-name, (total-cost – tax) as net-pay (Sales) (Sales)

Database Systems – Set TheoryDatabase Systems – Set Theory

Aggregate FunctionsAggregate Functions

takes a collection of values and returns a single value as a resulttakes a collection of values and returns a single value as a result

i.ei.esum {1, 1, 3, 4, 4, 11} returns the value 24.sum {1, 1, 3, 4, 4, 11} returns the value 24.avg {1, 1, 3, 4, 4, 11} returns the value 4.avg {1, 1, 3, 4, 4, 11} returns the value 4.count {1, 1, 3, 4, 4, 11} returns the value 6.count {1, 1, 3, 4, 4, 11} returns the value 6.min {1, 1, 3, 4, 4, 11} returns the value 1.min {1, 1, 3, 4, 4, 11} returns the value 1.max {1, 1, 3, 4, 4, 11} returns the value 11.max {1, 1, 3, 4, 4, 11} returns the value 11.count-distinct {1, 1, 3, 4, 4, 11} returns the value 4.count-distinct {1, 1, 3, 4, 4, 11} returns the value 4.

Ignore the fact that we said sets can’t contain duplicate valuesIgnore the fact that we said sets can’t contain duplicate values

Database Systems – Set TheoryDatabase Systems – Set Theory

Operations to Modify the Contents of RelationsOperations to Modify the Contents of Relations

DeletionDeletionr r r – E r – E

Delete all of the sale of “Zojjed!” from the Sales relationDelete all of the sale of “Zojjed!” from the Sales relation

sales sales sales - sales - product = “Zojjed!” product = “Zojjed!” (sales)(sales)

Delete all sales with no tax collectedDelete all sales with no tax collected

sales sales sales - sales - tax = 0tax = 0 (sales) (sales)

InsertionInsertionr r r r E EAdd a record to the sales relationAdd a record to the sales relationsales sales sales sales {(“I Walk to Eat”, “Chase”, “Utley”, 0, 15.95)} {(“I Walk to Eat”, “Chase”, “Utley”, 0, 15.95)}

Add a record to the websites relationAdd a record to the websites relationwebsites websites websites websites {(“www.mediatitan.net”, “Walking Promotions”, 2006, {(“www.mediatitan.net”, “Walking Promotions”, 2006,

“Fiction”)}“Fiction”)}

You can also insert touples based on the result of another query.You can also insert touples based on the result of another query.

Database Systems – Set TheoryDatabase Systems – Set Theory

UpdatingUpdatingRemove all the tax from the sales relationRemove all the tax from the sales relation

sales sales product, first-name, last-name, 0, total-cost(sales) product, first-name, last-name, 0, total-cost(sales)

Database Systems – Set TheoryDatabase Systems – Set Theory

JoinsJoinsThere are other forms of joins. Let’s look at the following two simple relations:There are other forms of joins. Let’s look at the following two simple relations:

employee-nameemployee-name citycity

JeterJeter New York CityNew York City

HowardHoward PhiladelphiaPhiladelphia

UtleyUtley PhiladelphiaPhiladelphia

SchillingSchilling BostonBoston

employee-nameemployee-name teamteam

GlavineGlavine MetsMets

HowardHoward PhilliesPhillies

BondsBonds GiantsGiants

SchillingSchilling Choke SoxChoke Sox

employee-nameemployee-name citycity employee-nameemployee-name teamteam

HowardHoward PhiladelphiaPhiladelphia HowardHoward PhilliesPhillies

SchillingSchilling BostonBoston SchillingSchilling Choke SoxChoke Sox

cities relation teams relation

Natural Join -> cities |x||x| teams

The natural join omits records that do not match, so we do not have records for Jeter, Utley, Glavine, or Bonds.

Database Systems – Set TheoryDatabase Systems – Set Theory

employee-nameemployee-name citycity

JeterJeter New York CityNew York City

HowardHoward PhiladelphiaPhiladelphia

UtleyUtley PhiladelphiaPhiladelphia

SchillingSchilling BostonBoston

employee-nameemployee-name teamteam

GlavineGlavine MetsMets

HowardHoward PhilliesPhillies

BondsBonds GiantsGiants

SchillingSchilling Choke SoxChoke Sox

employee-nameemployee-name citycity employee-nameemployee-name teamteam

JeterJeter New York CityNew York City NullNull NullNull

HowardHoward PhiladelphiaPhiladelphia HowardHoward PhilliesPhillies

UtleyUtley PhiladelphiaPhiladelphia NullNull NullNull

SchillingSchilling BostonBoston SchillingSchilling Choke SoxChoke Sox

cities relation teams relation

Left Outer Join -> cities LOJLOJ teams

Includes all records from the left and only those records on the right that match

Database Systems – Set TheoryDatabase Systems – Set Theory

employee-nameemployee-name citycity

JeterJeter New York CityNew York City

HowardHoward PhiladelphiaPhiladelphia

UtleyUtley PhiladelphiaPhiladelphia

SchillingSchilling BostonBoston

employee-nameemployee-name teamteam

GlavineGlavine MetsMets

HowardHoward PhilliesPhillies

BondsBonds GiantsGiants

SchillingSchilling Choke SoxChoke Sox

employee-nameemployee-name citycity employee-nameemployee-name teamteam

NullNull NullNull GlavineGlavine MetsMets

HowardHoward PhiladelphiaPhiladelphia HowardHoward PhilliesPhillies

NullNull NullNull BondsBonds GiantsGiants

SchillingSchilling BostonBoston SchillingSchilling Choke SoxChoke Sox

cities relation teams relation

Right Outer Join -> cities ROJROJ teams

Includes all records from the right and only those records on the left that match

Database Systems – Set TheoryDatabase Systems – Set Theory

employee-nameemployee-name citycity

JeterJeter New York CityNew York City

HowardHoward PhiladelphiaPhiladelphia

UtleyUtley PhiladelphiaPhiladelphia

SchillingSchilling BostonBoston

employee-nameemployee-name teamteam

GlavineGlavine MetsMets

HowardHoward PhilliesPhillies

BondsBonds GiantsGiants

SchillingSchilling Choke SoxChoke Sox

employee-nameemployee-name citycity employee-nameemployee-name teamteam

NullNull NullNull GlavineGlavine MetsMets

HowardHoward PhiladelphiaPhiladelphia HowardHoward PhilliesPhillies

NullNull NullNull BondsBonds GiantsGiants

SchillingSchilling BostonBoston SchillingSchilling Choke SoxChoke Sox

JeterJeter New York CityNew York City NullNull NullNull

UtleyUtley PhiladelphiaPhiladelphia NullNull NullNull

cities relation teams relation

Full Outer Join -> cities FOJFOJ teams

Includes all records from the right and left, tuples that do not match have nulls in their place.

Database Systems – Set TheoryDatabase Systems – Set Theory

NULLS

And (true and unknown = unknown, false and unknown = false, unknown and unknown = unknown

Or (true or unknown = true, false or unknown = unknown, unknown or unknown = unknown)

Not (not unknown = unknown)

Database Systems – Set TheoryDatabase Systems – Set Theory

REFERENTIAL INTEGRITY

Superkey of RA unique identified for a tuple.

A set of attributes SK of R such that no two tuples in any valid relation instance r(R) will have the same value for SK. That is, for any distinct tuples t1 and t2 in r(R), t1[SK] t2[SK].

Key of RA "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey.

Example: The CAR relation schema

CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys

Key1 = {State, Reg#}Key2 = {SerialNo}

Both are superkeys. {SerialNo, Make} is a superkey but not a key.

If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined.

Database Systems – Set TheoryDatabase Systems – Set Theory

REFERNTIAL INTEGRITY

Relational Database Schema

A set S of relation schemas that belong to the same database. S is the name of the database.

S = {R1, R2, ..., Rn}

Entity Integrity:

The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples.

t[PK] null for any tuple t in r(R)

Note, other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.

Database Systems – Set TheoryDatabase Systems – Set Theory

Referential Integrity

A constraint involving two relations (the previous constraints involve a single relation).

Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation.

Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2.

A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].

A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.