cs464_564_08

Embed Size (px)

Citation preview

  • 8/2/2019 cs464_564_08

    1/72

    CS 464 /CS 564

    INTRODUCTION TO

    DATABASE MANAGEMENTLecturer: Lydia Tapia

    Lecture Eight: Introduction to SQL

    http://www.cs.unm.edu/~tapia/cs464_cs564/

    Source: Slides by Jeffrey Ullman

  • 8/2/2019 cs464_564_08

    2/72

    INTRODUCTION TO SQL

    Select-From-Where Statements

    Subqueries

    Grouping and Aggregation

    2

  • 8/2/2019 cs464_564_08

    3/72

    Why SQL?

    SQL is a very-high-level language. Say what to do rather than how to do it. Avoid a lot of data-manipulation details needed in

    procedural languages like C++ or Java.

    Database management system figures outbest way to execute query.

    Called query optimization.

    3

  • 8/2/2019 cs464_564_08

    4/72

    Select-From-Where Statements

    SELECT desired attributes

    FROM one or more tables

    WHERE condition about tuples of

    the tables

    4

  • 8/2/2019 cs464_564_08

    5/72

    Our Running ExampleAll our SQL queries will be based on the followingdatabase schema. Underline indicates key attributes.

    Candies(name, manf)Stores(name, addr, license)

    Consumers(name, addr, phone)

    Likes(consumer, candy)

    Sells(store, candy, price)

    Frequents(consumer, store)

    5

  • 8/2/2019 cs464_564_08

    6/72

    ExampleUsing Candies(name, manf), what candies aremade by Hershey?

    SELECT nameFROM Candies

    WHERE manf = Hershey;

    6

    Notice SQL uses single-quotes for strings.SQL is case-insensitive, except inside strings.

  • 8/2/2019 cs464_564_08

    7/72

    Result of Queryname

    Twizzler

    KitkatAlmondJoy

    . . .

    7

    The answer is a relation with a single attribute,name, and tuples with the name of each candyby Hershey, such as Twizzler.

  • 8/2/2019 cs464_564_08

    8/72

    Meaning of Single-Relation Query

    Begin with the relation in the FROM clause.Apply the selection indicated by the WHEREclause.

    Apply the extended projection indicated by theSELECT clause.

    8

  • 8/2/2019 cs464_564_08

    9/72

    Operational Semantics

    To implement this algorithm think of a tuplevariable (tv) ranging over each tuple of the

    relation mentioned in FROM.Check if the current tuple satisfies theWHERE clause.

    If so, compute the attributes or expressions ofthe SELECT clause using the components ofthis tuple.

    9

  • 8/2/2019 cs464_564_08

    10/72

    Operational Semantics

    10

    Check if

    Hershey

    name manf

    Twizzler Hersheytv

    Include tv.namein the result

  • 8/2/2019 cs464_564_08

    11/72

    * In SELECT clauses

    When there is one relation in the FROM clause, *in the SELECT clause stands forall attributes of

    this relation.

    Example using Candies(name, manf):SELECT *

    FROM Candies

    WHERE manf = Hershey;

    11

  • 8/2/2019 cs464_564_08

    12/72

    Result of Query:

    name manf

    Twizzler Hershey

    Kitkat Hershey

    AlmondJoy Hershey

    . . . . . .

    12

    Now, the result has each of the attributesof Candies.

  • 8/2/2019 cs464_564_08

    13/72

    Renaming Attributes

    If you want the result to have different attributenames, use AS to rename an

    attribute.Example based on Candies(name, manf):

    SELECT name AS candy, manf

    FROM Candies

    WHERE manf = Hershey

    13

  • 8/2/2019 cs464_564_08

    14/72

    Result of Query:

    candy manf

    Twizzler Hershey

    Kitkat HersheyAlmondJoy Hershey

    . . . . . .

    14

  • 8/2/2019 cs464_564_08

    15/72

    Expressions in SELECT Clauses

    Any expression that makes sense can appear asan element of a SELECT clause.

    Example: from Sells(store, candy, price):SELECT store, candy,

    price * 79 AS priceInYen

    FROM Sells;

    15

  • 8/2/2019 cs464_564_08

    16/72

    Result of Query

    store candy priceInYen

    7-11 Twizzler 285

    Smiths Snickers 342

    16

  • 8/2/2019 cs464_564_08

    17/72

    Another Example: Constant Expressions

    From Likes(consumer, candy) :

    SELECT consumer,likes Kitkats

    AS whoLikesKitkats

    FROM Likes

    WHERE candy =Kitkat

    ;

    17

  • 8/2/2019 cs464_564_08

    18/72

    Result of Query

    consumer whoLikesKitkats

    Sally likes Kitkats

    Fred likes Kitkats

    18

  • 8/2/2019 cs464_564_08

    19/72

    Complex Conditions in WHERE Clause

    From Sells(store, candy, price), find the price that7-11 charges for Twizzlers:

    SELECT price

    FROM Sells

    WHERE store = 7-11 AND

    candy = Twizzler;

    19

  • 8/2/2019 cs464_564_08

    20/72

    Patterns

    WHERE clauses can have conditions inwhich a string is compared with a pattern,

    to see if it matches.

    General form: LIKE or NOT LIKE

    Pattern is a quoted string with % = anystring; _ = any character.

    20

  • 8/2/2019 cs464_564_08

    21/72

    Example

    From Consumers(name, addr, phone) find theconsumers with exchange 555:

    SELECT name

    FROM Consumers

    WHERE phone LIKE%555-_ _ _ _

    ;

    21

  • 8/2/2019 cs464_564_08

    22/72

    NULL Values

    Tuples in SQL relations can have NULL as avalue for one or more components.

    Meaning depends on context. Two commoncases:

    Missing value: e.g., we know 7-11 has someaddress, but we dont know what it is.

    Inapplicable : e.g., the value of attribute spouse foran unmarried person.

    22

  • 8/2/2019 cs464_564_08

    23/72

    Comparing NULLs to Values

    The logic of conditions in SQL is really 3-valuedlogic: TRUE, FALSE, UNKNOWN.

    When any value is compared with NULL, thetruth value is UNKNOWN.

    But a query only produces a tuple in the answer ifits truth value for the WHERE clause is TRUE

    (not FALSE or UNKNOWN).

    23

  • 8/2/2019 cs464_564_08

    24/72

    Three-Valued Logic

    To understand how AND, OR, and NOT work in3-valued logic, think of TRUE = 1, FALSE = 0,

    and UNKNOWN = .

    AND = MIN; OR = MAX, NOT(x) = 1-x.Example:TRUE AND (FALSE OR NOT(UNKNOWN)) =

    MIN(1, MAX(0, (1 - ))) =

    MIN(1, MAX(0, ) = MIN(1, ) = =

    UNKNOWN.

    24

  • 8/2/2019 cs464_564_08

    25/72

    25

  • 8/2/2019 cs464_564_08

    26/72

    Surprising Example

    From the following Sells relation:store candy price

    7-11 Twizzler NULL

    SELECT store

    FROM Sells

    WHERE price < 2.00 OR price >= 2.00;

    26

    UNKNOWN UNKNOWN

    UNKNOWN

  • 8/2/2019 cs464_564_08

    27/72

    Reason: 2-Valued Laws != 3-Valued Laws

    Some common laws, like commutativity of AND,hold in 3-valued logic.

    But not others, e.g., the law of the excludedmiddle:p OR NOTp = TRUE.

    Whenp = UNKNOWN, the left side is MAX( , (1 )) = != 1.

    27

  • 8/2/2019 cs464_564_08

    28/72

    Multirelation Queries

    Interesting queries often combine data from morethan one relation.

    We can address several relations in one query bylisting them all in the FROM clause.

    Distinguish attributes of the same name by.

    28

  • 8/2/2019 cs464_564_08

    29/72

    Example

    Using relations Likes(consumer, candy) andFrequents(consumer, store), find the candies likedby at least one person who frequents 7-11.

    SELECT candyFROM Likes, Frequents

    WHERE store = 7-11 AND

    Frequents.consumer =

    Likes.consumer;

    29

  • 8/2/2019 cs464_564_08

    30/72

    Formal Semantics

    Almost the same as for single-relation queries: Start with the product of all the relations in the

    FROM clause.

    Apply the selection condition from the WHEREclause.

    Project onto the list of attributes and expressions inthe SELECT clause.

    30

  • 8/2/2019 cs464_564_08

    31/72

    Operational Semantics

    Imagine one tuple-variable for each relation in theFROM clause.

    These tuple-variables visit each combination of tuples,one from each relation.

    If the tuple-variables are pointing to tuples thatsatisfy the WHERE clause, send these tuples to

    the SELECT clause.

    31

  • 8/2/2019 cs464_564_08

    32/72

    Example

    32

    consumer store consumer candy

    tv1 tv2Sally Twizzler

    Sally 7-11

    LikesFrequents

    to outputcheck theseare equal

    check

    for 7-11

  • 8/2/2019 cs464_564_08

    33/72

    Explicit Tuple-Variables

    Sometimes, a query needs to use two copiesof the same relation.

    Distinguish copies by following the relationname by the name of a tuple-variable, in the

    FROM clause.

    Its always an option to rename relations thisway, even when not essential.

    33

  • 8/2/2019 cs464_564_08

    34/72

    Example

    From Candies(name, manf), find all pairs ofcandies by the same manufacturer.

    Do not produce pairs like (Twizzler, Twizzler). Produce pairs in alphabetic order, e.g. (Kitkat,

    Twizzler), not (Twizzler, Kitkat).

    SELECT c1.name, c2.name

    FROM Candies c1, Candies c2WHERE c1.manf = c2.manf AND

    c1.name < c2.name;

    34

    tuplevariables

  • 8/2/2019 cs464_564_08

    35/72

    Subqueries

    A parenthesized SELECT-FROM-WHEREstatement (subquery) can be used as a value

    in a number of places, including FROM and

    WHERE clauses.Example: in place of a relation in the FROMclause, we can place another query, and then

    query its result.

    Can use a tuple-variable to name tuples of theresult.

    35

  • 8/2/2019 cs464_564_08

    36/72

    Subqueries That Return One Tuple

    If a subquery is guaranteed to produce onetuple, then the subquery can be used as a

    value. Usually, the tuple has one component. A run-time error occurs if there is no tuple or more

    than one tuple.

    36

  • 8/2/2019 cs464_564_08

    37/72

    Example

    From Sells(store, candy, price), find the storesthat sell Kitkats for the same price 7-11

    charges for Twizzlers. Two queries would surely work:

    Find the price 7-11 charges for Twizzlers. Find the stores that sell Kitkats at that price.

    37

  • 8/2/2019 cs464_564_08

    38/72

    Query + Subquery Solution

    SELECT store

    FROM Sells

    WHERE candy = Kitkat AND

    price = (SELECT price

    FROM Sells

    WHERE store= 7-11

    AND candy = Twizzler);

    38

    The price atwhich 7-11

    sells Twizzlers

  • 8/2/2019 cs464_564_08

    39/72

    The IN Operator

    IN is true if and only if thetuple is a member of the relation. NOT IN means the opposite.

    IN-expressions can appear in WHERE clauses.The is often a subquery.

    39

  • 8/2/2019 cs464_564_08

    40/72

    Example

    From Candies(name, manf) and Likes(consumer,candy), find the name and manufacturer of each

    candy that Fred likes.

    SELECT *

    FROM Candies

    WHERE name IN (SELECT candy

    FROM LikesWHERE consumer =Fred);

    40

    The set of

    candies Fredlikes

  • 8/2/2019 cs464_564_08

    41/72

    The Exists Operator

    EXISTS( ) is true if and only if the is not empty.

    Example: From Candies(name, manf) , findthose candies that are the unique candy bytheir manufacturer.

    41

  • 8/2/2019 cs464_564_08

    42/72

    Example Query with EXISTS

    SELECT name

    FROM Candies c1

    WHERE NOT EXISTS(

    SELECT *

    FROM Candies

    WHERE manf = c1.manf AND

    name c1.name);

    42

    Set ofcandieswith thesamemanf as

    c1, butnot thesamecandy

    Notice scope rule: manf refersto closest nested FROM witha relation having that attribute.

    Notice theSQL notequalsoperator

    From Candies(name, manf) ,find those candies that are

    the unique candy by their

    manufacturer.

  • 8/2/2019 cs464_564_08

    43/72

    The Operator ANY

    x= ANY( ) is a boolean condition trueifxequals at least one tuple in the relation.

    Similarly, = can be replaced by any of thecomparison operators.

    Example:x> ANY( ) meansxis notthe smallest tuple in the relation.

    Note tuples must have one component only.

    43

  • 8/2/2019 cs464_564_08

    44/72

    The Operator ALL

    Similarly,x ALL( ) is true if andonly if for every tuple t in the relation,xis not

    equal to t. That is,xis not a member of the relation.

    The can be replaced by any comparisonoperator.

    Example:x>= ALL( ) means thereis no tuple larger thanx in the relation.

    44

  • 8/2/2019 cs464_564_08

    45/72

    Example

    From Sells(store, candy, price), find the candiessold for the highest price.

    SELECT candy

    FROM Sells

    WHERE price >= ALL(

    SELECT price

    FROM Sells);

    45

    price from the outerSells must not beless than any price.

  • 8/2/2019 cs464_564_08

    46/72

    Union, Intersection, and Difference

    Union, intersection, and difference of relations areexpressed by the following forms, each involving

    subqueries:

    ( subquery ) UNION ( subquery ) ( subquery ) INTERSECT ( subquery ) ( subquery ) EXCEPT ( subquery )

    46

  • 8/2/2019 cs464_564_08

    47/72

    47

  • 8/2/2019 cs464_564_08

    48/72

    Example

    From relations Likes(consumer, candy), Sells(store, candy, price), and Frequents

    (consumer, store), find the consumers and

    candies such that: The consumer likes the candy, and The consumer frequents at least one

    store that sells the candy.

    48

  • 8/2/2019 cs464_564_08

    49/72

    Solution

    (SELECT * FROM Likes)

    INTERSECT

    (SELECT consumer, candyFROM Sells, Frequents

    WHERE Frequents.store = Sells.store

    );

    49

    The consumer frequentsa store that sells thecandy.

  • 8/2/2019 cs464_564_08

    50/72

    Bag Semantics

    Although the SELECT-FROM-WHEREstatement uses bag semantics, the default for

    union, intersection, and difference is set

    semantics.

    That is, duplicates are eliminated as the operation isapplied.

    50

  • 8/2/2019 cs464_564_08

    51/72

    Motivation: Efficiency

    When doing projection, it is easier to avoideliminating duplicates.

    Just work tuple-at-a-time.For intersection or difference, it is mostefficient to sort the relations first.

    At that point you may as well eliminate theduplicates anyway.

    51

  • 8/2/2019 cs464_564_08

    52/72

    Controlling Duplicate Elimination

    Force the result to be a set by SELECTDISTINCT . . .

    Force the result to be a bag (i.e., donteliminate duplicates) by ALL, as in . . .

    UNION ALL . . .

    52

  • 8/2/2019 cs464_564_08

    53/72

    Example: DISTINCT

    From Sells(store, candy, price), find all thedifferent prices charged for candies:

    SELECT DISTINCT price

    FROM Sells;Notice that without DISTINCT, each price wouldbe listed as many times as there were store/candy pairs at that price.

    53

  • 8/2/2019 cs464_564_08

    54/72

    Example: ALL

    Using relations Frequents(consumer, store) andLikes(consumer, candy):

    (SELECT consumer FROM Frequents)EXCEPT ALL

    (SELECT consumer FROM Likes);

    Lists consumers who frequent more stores thanthey like candies, and does so as many times asthe difference of those counts.

    54

  • 8/2/2019 cs464_564_08

    55/72

    Join Expressions

    SQL provides several versions of (bag) joins.These expressions can be stand-alone queries orused in place of relations in a FROM clause.

    55

  • 8/2/2019 cs464_564_08

    56/72

    Products and Natural Joins

    Natural join:R NATURAL JOIN S;

    Product:R CROSS JOIN S;

    Example:Likes NATURAL JOIN Sells;

    Relations can be parenthesized subqueries, aswell.

    56

  • 8/2/2019 cs464_564_08

    57/72

    Theta Join

    R JOIN S ON Example: using Consumers(name, addr) andFrequents(consumer, store):

    Consumers JOIN Frequents ON

    name = consumer;

    gives us all (c, a, c, s) quadruples such that

    consumerclives at address a and frequents stores.

    57

  • 8/2/2019 cs464_564_08

    58/72

    Outerjoins

    R OUTER JOIN S is the core of an outerjoinexpression. It is modified by:

    1. Optional NATURAL in front of OUTER.2. Optional ON after JOIN.3. Optional LEFT, RIGHT, or FULL before OUTER.

    LEFT = pad dangling tuples of R only. RIGHT = pad dangling tuples of S only. FULL = pad both; this choice is the default.

    58

  • 8/2/2019 cs464_564_08

    59/72

    Aggregations

    SUM, AVG, COUNT, MIN, and MAX can beapplied to a column in a SELECT clause to

    produce that aggregation on the column.

    Also, COUNT(*) counts the number of tuples.

    59

  • 8/2/2019 cs464_564_08

    60/72

    Example: Aggregation

    From Sells(store, candy, price), find the averageprice of Twizzlers:

    SELECT AVG(price)

    FROM Sells

    WHERE candy = Twizzler;

    60

  • 8/2/2019 cs464_564_08

    61/72

    Eliminating Duplicates in an Aggregation

    Use DISTINCT inside an aggregation.Example: find the number of different pricescharged for Twizzlers:

    SELECT COUNT(DISTINCT price)

    FROM Sells

    WHERE candy =Twizzler

    ;

    61

  • 8/2/2019 cs464_564_08

    62/72

    NULLs Ignored in Aggregation

    NULL never contributes to a sum, average, orcount, and can never be the minimum or

    maximum of a column.

    But if there are no non-NULL values in acolumn, then the result of the aggregation is

    NULL.

    62

    63

  • 8/2/2019 cs464_564_08

    63/72

    Example: Effect of NULLs

    SELECT count(*)

    FROM Sells

    WHERE candy = Twizzler;

    SELECT count(price)

    FROM Sells

    WHERE candy = Twizzler;

    63

    The number of storesthat sell Twizzlers.

    The number of storesthat sell Twizzlers at aknown price.

    64

  • 8/2/2019 cs464_564_08

    64/72

    Grouping

    We may follow a SELECT-FROM-WHEREexpression by GROUP BY and a list of attributes.

    The relation that results from the SELECT-FROM-WHERE is grouped according to the values of allthose attributes, and any aggregation is applied

    only within each group.

    64

    65

  • 8/2/2019 cs464_564_08

    65/72

    Example: Grouping

    From Sells(store, candy, price), find the averageprice for each candy:

    SELECT candy, AVG(price)

    FROM Sells

    GROUP BY candy;

    65

    66

  • 8/2/2019 cs464_564_08

    66/72

    Example: Grouping

    From Sells(store, candy, price) and Frequents(consumer, store), find for each consumer the

    average price of Twizzlers at the stores they

    frequent:SELECT consumer, AVG(price)

    FROM Frequents, Sells

    WHERE candy = Twizzler AND

    Frequents.store = Sells.store

    GROUP BY consumer;

    66

    Computeconsumer-store-pricefor Twiz.tuples first,then group

    by consumer.

    67

  • 8/2/2019 cs464_564_08

    67/72

    Restriction on SELECT Lists With Aggregation

    If any aggregation is used, then eachelement of the SELECT list must be either:

    1. Aggregated, or2. An attribute on the GROUP BY list.

    67

    68

  • 8/2/2019 cs464_564_08

    68/72

    Query Example Correct or Not?

    You might think you could find the store thatsells Twizzlers the cheapest by:

    SELECT store, MIN(price)FROM Sells

    WHERE candy = Twizzler;

    But this query is illegal in SQL.

    68

    69

  • 8/2/2019 cs464_564_08

    69/72

    HAVING Clauses

    HAVING may follow a GROUP BYclause.

    If so, the condition applies to each group, andgroups not satisfying the condition are eliminated.

    69

    70

  • 8/2/2019 cs464_564_08

    70/72

    Example: HAVING

    From Sells(store, candy, price) and Candies(name, manf), find the average prices of those

    candies that are either sold in at least three stores

    or are manufactured by Nestle.

    70

    71

  • 8/2/2019 cs464_564_08

    71/72

    Solution

    SELECT candy, AVG(price)

    FROM Sells

    GROUP BY candyHAVING COUNT(store) >= 3 OR

    candy IN (SELECT name

    FROM Candies

    WHERE manf = Nestle);

    71

    Candiesmanufactured

    by Nestle.

    Candy groups with at least3 non-NULL stores and alsocandy groups where themanufacturer is Nestle.

    72

  • 8/2/2019 cs464_564_08

    72/72

    Requirements on HAVING Conditions

    These conditions may refer to any relation ortuple-variable in the FROM clause.

    They may refer to attributes of those relations,as long as the attribute makes sense within agroup; i.e., it is either:

    A grouping attribute, or Aggregated.

    72