45
Structured/System Query Language SQL History Basics of SQL Data Definition Language Query Language Data Manipulation Language Miscellany

Structured/System Query Language SQL History Basics of SQL Data Definition Language Query Language Data Manipulation Language Miscellany

Embed Size (px)

Citation preview

Structured/System Query Language SQL

HistoryBasics of SQLData Definition LanguageQuery LanguageData Manipulation Language Miscellany

History Originally it is was called SEQUEL

(Structured English QUEry Language)

Designed and implemented at IBM Research

San Jose Research (now IBM Almaden Research Center)

as the interface for an experimental relational database system, SYSTEM R.

The language for IBM’s DB2

A number of variations of SQL have been implemented by most commercial DBMS vendors

A joint effort by ANSI (American National Standards Institute) and ISO (International Standards Organization) led 1st standard version (ANSI 1986) called SQL-1

Revised and expanded in SQL-92 (SQL-2) and

SQL-1999 (SQL-3) (newer?)

Basics Language developed specifically to query databases

It is a comprehensive database language (data definition, query and update)

Based on set and relational operations with certain modifications and enhancements

A typical SQL query has the form:

(PROJECTION) select A1, A2, ..., An

(CART. PROD.) from r1, r2, ..., rm(SELECTION) where P

Ais represent attributes

ris represent relations

P is a predicate.

The result of an SQL query is a relation

Data Definition Language (DDL) Introduction

Allows the specification of a set of relations along with information about each relation, including

The schema for the relations. The domain of values associated with each attribute in every

relation. Integrity constraints The set of indices to be maintained for each relation. Security and authorization information for each relation. The physical storage structure of each relation on disk.

SQL uses the terms schema, table, row and column for database, relation, tuple and attribute, respectively

The main DDL commands are create alter drop

Schema and Catalog Concepts

Prior to SQL-2, there was no concept of a relational database schema

All tables (relations) were considered part of the same schema

In SQL-2, the schema concept was incorporated in order to group together tables and other constructs (e.g. views) that belong to same DB application

Formally, an SQL schema is identified by a schema name and authorization as well as descriptions for each element in the schema

Examples of elements include tables and views Two ways to create a schema using create schema

Assigns a name and an authorization identifier but the elements are defined later

create schema Bank authorization xyz

A collection of schemas is known as a catalogue

An SQL relation is defined using the create table command:create table r (A1 D1, A2 D2,…, An Dn, integrity-constraint1,…, integrity-constraintk)

r is the name of the relationeach Ai is an attribute name in the schema of relation rDi is the data type of values in the domain of attribute Ai

E.g. create table branch (branch-name char(15), branch-city char(30), assets integer)

DATA TYPES char(n). Fixed length character string, with user-specified length n. varchar(n). Variable length character strings, with user-specified maximum length n. int. Integer (a finite subset of the integers that is machine-dependent). smallint. Small integer (a machine-dependent subset of the integer domain type). numeric(p,d). Fixed pt number, user-specified precision of p digits, with d digits to the right of dec. pt. real, double precision. Floating pt & double-precision floating pt numbers, machine-dependent precision float(n). Floating pt number, with user-specified precision of at least n digits. Null values allowed in all domain types. Declaring attribute not null prohibits null values for that attribute. create domain (SQL-92) creates user-defined types: create domain name as char(20) not null

date. Dates, containing a (4 digit) year, month and date E.g. date ‘2001-7-27’ (on MySQL ‘YYYY-MM-DD’)

time. Time of day, in hours, minutes and seconds. E.g. time ’09:00:30’ timestamp: date plus time of day E.g. timestamp ‘2001-7-27

09:00:30’ Interval: period of time E.g. Interval ‘1’ day

Subtracting a date/time/timestamp value from anothergives an interval value, which can be added todate/time/timestamp values

Can extract values ofl field() from date/time/timestamp E.g. extract (year from r.starttime)

Can cast string types to date/time/timestamp E.g. cast <string-valued-expression> as date

Integrity Constraints in Create Table not null (after domain) Unique (after domain) primary key (A1, ..., An) foreign Key

Create table Table_YYY (

A1 char(3),A2 int,A3 int UNIQUE,A4 float NOT NULL, PRIMARY KEY(A1,A2),FOREIGN KEY (A1) REFERENCES TABLE_XXX(A1) ON DELETE CASCADE ON UPDATE CASCADE);

check (P), where P is a predicate Example: Declare branch-name as the primary key for branch and

ensure that the values of assets are non-negative.

create table branch(branch-name char(15), branch-city char(30), assets integer,primary key (branch-name),check (assets >= 0))

The drop table and alter table keywords

The drop table command deletes all information about the dropped relation from the database.

The alter table command is used to add attributes to an existing relation.

alter table r add A D

where A is the name of the attribute to be added to relation r and D is the domain of A.

All tuples in the relation are assigned null as the value for the new attribute

The alter table command can also be used to drop attributes of a relation

alter table r drop A

where A is the name of an attribute of relation r

Query language, SQL, Schema Used in Examples

The select clause

The select clause list the attributes desired in the result of a query corresponds to the projection operator

E.g. find the names of all branches in the loan relationselect branch-namefrom loan

SQL names are case insensitive, i.e. you can use capital or small letters.

SQL allows duplicates in query results hence in general an SQL table is not a set of tuples but rather a multiset (a.k.a bag) of tuples

To force the elimination of duplicates, insert the keyword distinct after select.

Find the names of all branches in the loan relations, and remove duplicates select distinct branch-name from loan

An asterisk (Nathan Hale symbol) in the select clause denotes “all attributes”select *from loan

The select clause can contain arithmetic expressions involving +, –, , /, and operating on constants or attributes of tuples

select loan-number, branch-name, amount 100 from loan

would return a relation which is the same as the loan relations, except that the attribute amount is multiplied by a 100.

The from clause lists relations involved in the query corresponds to the Cartesian product operator

Find the Cartesian product borrower x loanselect from borrower, loanselect * from borrower, loan where borrower.loan-number= loan.loan-

number

The where clause specifies the conditions that the result must satisfy corresponds to the selection operator

Find all loan numbers for loans made at Fargo branch with loan amts > $1200select loan-numberfrom loanwhere branch-name = “Fargo” and amount > 1200

Comparison results can be combined using the logical connectives and, or, and not.

SQL includes a between comparison operator

E.g. Find the loan number of those loans with loan amounts between $90,000 and $100,000

(e.g., $90,000 Loan_amt $100,000)

select loan-numberfrom loanwhere amount between 90000 and 100000

SQL also includes an in operator used for set inclusion testing

E.g. Find loan number of loans with loan amts $90000, $100000, $ 110000 or $120000

select loan-numberfrom loanwhere amount in (90000, 100000, 110000, 120000)

Ambiguous Names and Aliasing SQL allows renaming relations and attributes using the as clause:

old-name as new-name

Find name, loan # and amt of all customers; rename loan-number as loan-id.

select customer-name, borrower.loan-number as loan-id, amountfrom borrower, loanwhere borrower.loan-number = loan.loan-number

Find the customer names, their loan numbers and loan amounts for all customers having a loan at some branch.select customer-name, T.loan-number, S.amountfrom borrower as T, loan as Swhere T.loan-number = S.loan-number

Find names of all branches that have greater assets than some branch located in Fargo.select distinct T.branch-namefrom branch as T, branch as Swhere S.branch-city =‘”Fargo” and T.assets > S.assets

String OperationsSQL includes a string-matching operator for comparisons on character strings.

(LIKE)

Patterns are described using two special characters:

percentage (%). The % character matches any substring. underscore (_). The _ character matches any character.

Find the names of all customers whose street includes the substring “Main”.

select customer-namefrom customerwhere customer-street like ‘%Main%’

Find names of all customers whose street includes the substring “Main” with length 5 (don’t use %)

select customer-namefrom customerwhere customer-street like ‘Main_’ or like ‘_Main’

SQL supports a variety of string operations such as concatenation (using “||”) converting from upper to lower case (and vice versa) finding string length, extracting substrings, etc.

Soundex operation Database developers have long struggled with the problem of matching words

that might not look alike, but actually sound alike…Useful for finding strings for which the sound is known but the precise spelling is not.

Soundex (argument)

Returns a 4 character code representing the sound of the words in the argument.

This result can be used to compare with the sound of other strings.

Argument can be character string, CHAR or VARCHAR not exceeding 4,000 bytes.

The result of the function is CHAR(4). The result is null if the argument is null

How does it work: A,E,I,O,U,Y,W,H are ignored along with double letters

Ignore everything after the 4th character

1st letter of the soundex code corresponds to the first letter of the argument

Every subsequent character in the name is looked up according to the scheme presented on the next slide and encoded as a digit between 1 and 6

if there are insufficient characters remaining in the name, the coding is padded with zeros.

Two adjacent identically coded letters Nonalphabetic characters terminate the soundex

evaluation 1 = B,P,F,V 2 = C,S,G,J,K,Q,X,Z 3 = D,T 4 = L 5 = M,N 6 = R All other letters (A,E,I,O,U,Y,W,H) ignored

Select Soundex ('Smith'), Soundex ('Smythe') S530 S530

There is also a related function called DIFFERENCE Used to compare strings based upon their Soundex values. Difference(Argument1,Argument2) Usually returns integer ranging in value from 0 (least similar) to 4 (most similar).

Suppose you overheard a male employee talking in hallway but didn‘t catch his name You might overhear a name that sounded like "Ann" but you're positive it was a male The DIFFERENCE function is ideal for this type of situation.  Probably begin by specifying a threshold of 4 to limit the number of results returned.

Select firstname, difference(firstname,'ann') as 'difference' from employees where difference(firstname,'ann')=4

FirstName Difference Ann 4 Anne 4 Unfortunately, none were male names. So, lower threshold to 2

Select firstname, difference(firstname,'ann') as 'difference' from employees where difference(firstname,'ann')>=2 FirstName Difference Ann 4 Andrew 2 Janet 2 Laura 2 Anne 4

looks like Andrew might be our man!

The order by clause

List in alphabetic order the names of all customers having a loan in Fargo branch

select distinct customer-namefrom borrower, loanwhere borrower loan-number - loan.loan-

number and branch-name = “Fargo”order by customer-name

We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default.

E.g. order by customer-name desc

Set Operations The set operations union, intersect, and except operate on

relations and correspond to the relational algebra operations and

Each automatically eliminates duplicates; to retain all duplicates use the corresponding versions union all, intersect all and except all

Suppose a tuple occurs m times in r and n times in s, then, it occurs:m + n times in r union all smin(m,n) times in r intersect all smax(0, m – n) times in r except all s

Find all customers who have a loan, an account, or both:

(select customer-name from depositor) union(select customer-name from borrower)

Find all customers who have both a loan and an account.

(select customer-name from depositor) intersect(select customer-name from borrower)

Find all customers who have an account but no loan

(select customer-name from depositor) except(select customer-name from borrower)

Divide operator

No explicit DIVIDE statement in SQL

For instance, if you need to find all students who are taking the three courses {7,6,5}

the answer would be E/C

E/C Implement in SQL: 1. Cartesian product of dividend with divisor

projected offE[S#]C 25 7 25 6 25 5 32 7 32 6 32 5 38 7 38 6 38 5

E[S#]C - E 32 5 38 7 38 6

2. Minus E

3. Project off the divisor, C (giving the S# not wanted)

(E[S#]C – E)[S#] 32 38

4. Subtract from original non-divisor projection (S#s you started with), therefore giving what you do want!)

E[S#] - (E[S#]C – E)[S#] 25 25 32 32 = 38 38

Aggregate Functions These functions (produce numbers)

avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values

Find the average account balance at the Fargo branch.

Find the distinct number of depositors in the bank.

Find the number of tuples in the customer relation.

select avg (balance)from accountwhere branch-name = “Fargo”

select count (*)from customer

select count (distinct customer-name)from depositor

The group by clause

Find the number of depositors for each branch.

Note: Attributes in select clause outside of aggregate functions must appear in group by list

select branch-name, count (customer-name)from depositor, accountwhere depositor.account-number = account.account-numbergroup by branch-name

The having clause Find the names of all branches where the

average account balance is more than $1,200.

Note: predicates in the having clause are applied after the formation of the aggregation groups whereas predicates in

the where clause are applied before forming groups

select branch-name, avg (balance)from accountgroup by branch-namehaving avg (balance) > 1200

Null Values It is possible for tuples to have a null value, denoted by

null, for some of their attributes

null signifies an unknown value or that a value does not exist.

The predicate is null can be used to check for null values.

E.g. Find all loan numbers which appear in the loan relation with null values for amount.

select loan-numberfrom loanwhere amount is null

The result of any arithmetic expression involving null is null E.g. 5 + null returns null

But, aggregate functions simply ignore nulls (except for count)

Any comparison with null returns unknown E.g. 5 < null or null <> null or null = null

Three-valued logic using the truth value unknown: OR:

(unknown or true) = true (unknown or false) = unknown (unknown or unknown) = unknown

AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown

NOT: (not unknown) = unknown

Result of where clause predicate is treated as false if it evaluates to unknown

Total all loan amountsselect sum (amount)from loan

Above statement ignores null amounts result is null if there is no non-null amount

All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.

Nested Sub-queries Some queries require that existing values in the DB be

fetched and then used in a comparison condition SQL provides a mechanism for the nesting of sub-

queries. A sub-query is a select-from-where expression that is

nested within another query (called outer query) A common use of sub-queries is to perform tests for set

membership, set comparisons, and set cardinality. Find all customers who have both an account and a loan at the

bank.

Find all customers who have a loan at the bank but do not have an account at the bank select customer-name

from borrowerwhere customer-name not in (select customer-name

from depositor)

select customer-namefrom borrowerwhere customer-name in (select customer-name from depositor)

some keyword - Existential

F <comp> some r t r s.t. (F <comp> t) <comp> can be:

056

(5< some ) = true

05

0) = false 5

05(5 some ) = true(read: 5 < some tuple in relation)

(5< some ) = true(5 = some

all keyword - Universal F <comp> all r t r (F <comp> t)

056

(5< all ) = false6

10 ) = true(5< all

45 ) = false(5 = all

46(5 all ) = true (since 5 4 and 5

6)

Find all branches that have greater assets than some branch located in Brooklyn.

Same query using > some clause select branch-name

from branchwhere assets > some (select assets from branch where branch-city = ‘Brooklyn’)

select distinct T.branch-namefrom branch as T, branch as Swhere S.branch-city = ‘Brooklyn’ and T.assets >

S.assets

Find the names of all branches that have greater assets than all branches located in Brooklyn.select branch-namefrom branchwhere assets > all

(select assets from branch where branch-city = “Brooklyn”)

exists keyword

Used to test for empty relations returns the value true if the argument sub-query is nonempty. exists r r Ø

Outer Query exists (inner query) not exists r r = Ø

Outer Query not exists (inner query)

Find branch names having greater assets than some branch located in Brooklyn select branch-name

from branch b1where EXISTS

(select * from branch b2 where b2.branch-city = “Brooklyn” and b1.assets>b2.assets)

Find names of all branches that have greater assets than all branches in Brooklyn select branch-name

from branch b1where NOT EXISTS

(select * from branch b2 where branch-city = “Brooklyn” and b1.assets<b2.assets)

Like saying give me branches such that no branch in Brooklyn has greater assets

The unique keyword

Used to test for absence of duplicate tuples

The unique keyword tests whether a sub-query has any duplicate tuples in its result.

Find all customers who have at most one account at the Fargo branch.

select T.customer-name from depositor as T where unique (

select R.customer-name from account, depositor as R where account.account.branch-name = ‘”Fargo”

and R.account-number = account. account-number and

T.customer-name = R.customer-name)

Views Provide a mechanism to hide certain data from the view of certain

users. To create a view we use the command:

create view v as <query expression> where: <query expression> is any legal expression The view name is represented by v

A view consisting of branches and their customerscreate view all-customer as (select branch-name, customer-name from depositor, account

where depositor.account-number = account.account-number) union (select branch-name, customer-name from borrower, loan where borrower.loan-number = loan.loan-number)

Query views just like we query tables. Find all customers of the Fargo branch select customer-name from all-customer where branch-name = ‘Fargo’

with keyword with clause allows views to be defined locally to a query, rather than globally. Find all accounts with the maximum balance

with max-balance (value) as select max (balance) from account select account-number from account, max-balance where account.balance = max-balance.value

SQL queries Can consist of up to 6 clauses where 1st 2 are mandatory

The order is important SELECT <att list>FROM <tbl list>[WHERE <condition>][GROUP BY <grouping att>][HAVING <group condition>][ORDER By <att list>]

To evaluate the query, we applyEvaluate FROM: produces Cartesian product, A, of tables in FROM listEvaluate WHERE: produces table, B, consisting of rows of A that satisfy WHEREEvaluate GROUP BY: partitions B into groups that agree on attr values in GROUP BY

Evaluate HAVING: eliminates groups in B that do not satisfy HAVING conditionEvaluate SELECT: produces table C containing a row for each group.

Attr in SELECT list limited to those in GROUP BY list and aggregates over groupEvaluate ORDER BY: orders rows of C

RESTRICTIONS SELECT Attr list, aggregates FROM relation list WHERE conditionGROUP BY group listHAVING group condORDER BY ordered attr list

Attr list must be subset of group listOrdered attr list must be a subset of attr list

Joined Relations Join operations take two relations and return as a result another

relation. The relations are typically specified in the from clause Join condition – defines which tuples in the two relations match, and

what attributes are present in the result of the join. Join type – defines how tuples in each relation that do not match any

tuple in the other relation (based on the join condition) are treated. Join Conditions

natural no condition, an implicit equi-join is used for each pair of attributes in R and S having the same name

using (A1, A2, ..., An) like natural but restricted to specified attributes only

on <predicate>

Join Types inner join (only those records from tables on both sides of the join that match the join

criteria… Inner joins are the most common type of join ) left outer join (Retrieve all records from table on the left side of the join and only those

records that match the join criteria from the table on the right side of the join) right outer join (Retrieve only those records from table on the left side of the join

condition that match join criteria but all records from right side of the join condition) full outer join (Retrieve all records from tables on both sides of the join condition

regardless of whether records match the join criteria

Relation loan Relation borrowercustomer-name loan-number

Jones

Smith

Hayes

L-170

L-230

L-155

amount

3000

4000

1700

branch-name

Downtown

Redwood

Perryridge

loan-number

L-170

L-230

L-260

loan inner joinborrower onloan.loan-number=borrower.loan-number

branch-name amount

Downtown

Redwood

3000

4000

customer-name loan-number

Jones

Smith

L-170

L-230

loan-number

L-170

L-230

Downtown

Redwood

Perryridge

3000

4000

1700

Jones

Smith

null

L-170

L-230

null

L-170

L-230

L-260

loan left outer joinborrower onloan.loan-number=borrower.loan-number

loan natural innerjoin borrower

Downtown

Redwood

3000

4000

Jones

Smith

L-170

L-230

loan natural right

outer join borrowerDowntown

Redwood

null

3000

4000

null

Jones

Smith

Hayes

L-170

L-230

L-155

loan full outer joinborrower using(loan-number)

Find all customers who have either an account or a loan (but not both) at the bank. (customer_number, account_number, loan_number)

branch-name amount

Downtown

Redwood

Perryridge

null

3000

4000

1700

null

customer-name

Jones

Smith

null

Hayes

loan-number

L-170

L-230

L-260

L-155

select customer-namefrom (depositor natural full outer join borrower)where account-number is null or loan-number is null

Relation loan Relation borrowercustomer-name loan-number

Jones

Smith

Hayes

L-170

L-230

L-155

amount

3000

4000

1700

branch-name

Downtown

Redwood

Perryridge

loan-number

L-170

L-230

L-260

insert clause Add a new tuple to account insert into account values (‘A-9732’, ‘Fargo’,1200) Add a new tuple to account with balance set to null

insert into account values (‘A-777’,‘Fargo’, null)Equiv: insert into account (branch-name, account-number) values (‘Fargo’, ‘A-9732’)

SELECT INTO statement is most often used to create backup copies of tables or for archiving records select column_name(s) into newtable [in external database] from source

E.g., make a backup copy of “Account" table: select * into Account_backup from Account

The IN clause can be used to copy tables into another database: select Account.* into Account in 'backup.mdb' from Accountc

If you only want to copy a few fields, list them after the SELECT statement: select customer-name into borrower_backup from Borrower

Add a where clause? E.g., create a " Borrower_backup" table with 1 column (customer-name), extract those with acct#=“1120” from “Accounts" table select customer-name into borrower_backup from borrower where loan-#=‘1120'

Selecting data from more than one table is also possible, e.g., create a new table “Borrower_Rec_backup" that contains data from the two tables Borrower and Loan: select customer-name,amount into borrower_rec_backup from (borrower inner join loan on borrower.loan-number=loan.loan-number)

MySQL Server doesn't support the Sybase SQL extension: SELECT ... INTO TABLE .... Instead, MySQL Server supports the standard SQL syntax INSERT INTO ... SELECT

Insert Into table_X (A1, A2,…,An) SELECT table_Y.A1, table_Y.A2,…, table_Y.An FROM table_YWHERE table_Y.Ai > 100

The select from where statement is fully evaluated before any of its results are inserted into the relation

Provide as a gift for all loan customers of the Fargo branch, a $200 savings account. Let loan number serve as account number for new savings account insert into account

select loan-number, branch-name, 200from loanwhere branch-name = ‘Fargo’

insert into depositorselect customer-name, loan-numberfrom loan, borrowerwhere branch-name = ‘Fargo’ and loan.account-number = borrower.account-number

LOAD DATA INFILE statement reads rows from a text file into a table at high speed

Load data infile 'file_name.txt' [replace | ignore] into table tbl_name [fields [terminated by '\t']][lines [terminated by '\n']] … a lot of other options

data.txt

1, William,Hock; 2, Sami,Hajjar; 4, George, Nixon;

Load data infile ‘data.txt' ignore into table Client fields terminated by ‘,'lines terminated by ‘;'

update clauseIncrease all accounts with balances over $10,000 by 6%, all other accounts

receive 5%. Write two update statements:

update accountset balance = balance 1.06where balance > 10000

update accountset balance = balance 1.05where balance 10000

Can be done better using the case statement (next)

Same query as before: Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%.

update accountset balance = case when balance <= 10000 then balance*1.05 else balance * 1.06end

Insert into Views

Create a view of all loan data in loan relation, hiding the amount attribute

create view branch-loan asselect branch-name, loan-numberfrom loan

Add a new tuple to branch-loan insert into branch-loan

values (‘Perryridge’, ‘L-307’)

This insertion must be represented by the insertion of the tuple (‘L-307’, ‘Perryridge’, null) into the loan relation

Most SQL implementations allow updates/inserts only on simple views (without aggregates) defined on a single relation

Updates on more complex views are difficult or impossible to translate, and hence disallowed.

delete clause

Delete all account records at the Fargo branch

delete from accountwhere branch-name = ‘Fargo’

Delete all accounts at every branch located in Moorhead city.

delete from accountwhere branch-name in (select branch-name

from branch where branch-city = ‘Moorhead’)

MiscellanyTransactions A transaction is a sequence of queries and I/U/D

statements executed as a single unit Transactions are terminated by one of

commit work: makes all updates of the transaction permanent in the database

rollback work: undoes all updates performed by the transaction.

Motivating example Transfer of money from one account to another involves two

steps: deduct from one account and credit to another

If one steps succeeds and the other fails, database is in an inconsistent state

Therefore, either both steps should succeed or neither should If any step of a transaction fails, all work done by

the transaction can be undone by rollback work. Rollback of incomplete transactions is done

automatically, in case of system failures

In most database systems, each SQL statement that executes successfully is automatically committed.

Each transaction would then consists of a single statement Further discussion is outside the scope of this presentation

Embedded SQL The SQL standard defines embeddings of SQL in a variety of

programming languages, e.g., Pascal, PL/I, Fortran, C, Cobol. A language to which SQL queries are embedded is referred

to as a host language, and the SQL structures permitted in the host language comprise embedded SQL.

EXEC SQL statement is used to identify embedded SQL to preprocessorEXEC SQL <embedded SQL statement > END-EXEC

References http://www.w3schools.com/sql/default.as

p Has tutorials and online tests Soundex paper: A. Richard Miller.

1990. What’s in a name? An MMSFORTH implementation of the Russell-SOUNDEX method. In 1990 Rochester FORTH Conference:Embedded Systems, Thomas Hess, ed., p. 101–3.