View
232
Download
0
Category
Preview:
Citation preview
Structured/System Query Language SQL
HistoryBasics of SQLData Definition LanguageQuery LanguageData Manipulation Language Miscellany
History Originally it is was called SEQUEL
(Structured English QUEry Language)
Designed and implemented at IBM Research
San Jose Research (now IBM Almaden Research Center)
as the interface for an experimental relational database system, SYSTEM R.
The language for IBM’s DB2
A number of variations of SQL have been implemented by most commercial DBMS vendors
A joint effort by ANSI (American National Standards Institute) and ISO (International Standards Organization) led 1st standard version (ANSI 1986) called SQL-1
Revised and expanded in SQL-92 (SQL-2) and
SQL-1999 (SQL-3) (newer?)
Basics Language developed specifically to query databases
It is a comprehensive database language (data definition, query and update)
Based on set and relational operations with certain modifications and enhancements
A typical SQL query has the form:
(PROJECTION) select A1, A2, ..., An
(CART. PROD.) from r1, r2, ..., rm(SELECTION) where P
Ais represent attributes
ris represent relations
P is a predicate.
The result of an SQL query is a relation
Data Definition Language (DDL) Introduction
Allows the specification of a set of relations along with information about each relation, including
The schema for the relations. The domain of values associated with each attribute in every
relation. Integrity constraints The set of indices to be maintained for each relation. Security and authorization information for each relation. The physical storage structure of each relation on disk.
SQL uses the terms schema, table, row and column for database, relation, tuple and attribute, respectively
The main DDL commands are create alter drop
Schema and Catalog Concepts
Prior to SQL-2, there was no concept of a relational database schema
All tables (relations) were considered part of the same schema
In SQL-2, the schema concept was incorporated in order to group together tables and other constructs (e.g. views) that belong to same DB application
Formally, an SQL schema is identified by a schema name and authorization as well as descriptions for each element in the schema
Examples of elements include tables and views Two ways to create a schema using create schema
Assigns a name and an authorization identifier but the elements are defined later
create schema Bank authorization xyz
A collection of schemas is known as a catalogue
An SQL relation is defined using the create table command:create table r (A1 D1, A2 D2,…, An Dn, integrity-constraint1,…, integrity-constraintk)
r is the name of the relationeach Ai is an attribute name in the schema of relation rDi is the data type of values in the domain of attribute Ai
E.g. create table branch (branch-name char(15), branch-city char(30), assets integer)
DATA TYPES char(n). Fixed length character string, with user-specified length n. varchar(n). Variable length character strings, with user-specified maximum length n. int. Integer (a finite subset of the integers that is machine-dependent). smallint. Small integer (a machine-dependent subset of the integer domain type). numeric(p,d). Fixed pt number, user-specified precision of p digits, with d digits to the right of dec. pt. real, double precision. Floating pt & double-precision floating pt numbers, machine-dependent precision float(n). Floating pt number, with user-specified precision of at least n digits. Null values allowed in all domain types. Declaring attribute not null prohibits null values for that attribute. create domain (SQL-92) creates user-defined types: create domain name as char(20) not null
date. Dates, containing a (4 digit) year, month and date E.g. date ‘2001-7-27’ (on MySQL ‘YYYY-MM-DD’)
time. Time of day, in hours, minutes and seconds. E.g. time ’09:00:30’ timestamp: date plus time of day E.g. timestamp ‘2001-7-27
09:00:30’ Interval: period of time E.g. Interval ‘1’ day
Subtracting a date/time/timestamp value from anothergives an interval value, which can be added todate/time/timestamp values
Can extract values ofl field() from date/time/timestamp E.g. extract (year from r.starttime)
Can cast string types to date/time/timestamp E.g. cast <string-valued-expression> as date
Integrity Constraints in Create Table not null (after domain) Unique (after domain) primary key (A1, ..., An) foreign Key
Create table Table_YYY (
A1 char(3),A2 int,A3 int UNIQUE,A4 float NOT NULL, PRIMARY KEY(A1,A2),FOREIGN KEY (A1) REFERENCES TABLE_XXX(A1) ON DELETE CASCADE ON UPDATE CASCADE);
check (P), where P is a predicate Example: Declare branch-name as the primary key for branch and
ensure that the values of assets are non-negative.
create table branch(branch-name char(15), branch-city char(30), assets integer,primary key (branch-name),check (assets >= 0))
The drop table and alter table keywords
The drop table command deletes all information about the dropped relation from the database.
The alter table command is used to add attributes to an existing relation.
alter table r add A D
where A is the name of the attribute to be added to relation r and D is the domain of A.
All tuples in the relation are assigned null as the value for the new attribute
The alter table command can also be used to drop attributes of a relation
alter table r drop A
where A is the name of an attribute of relation r
The select clause
The select clause list the attributes desired in the result of a query corresponds to the projection operator
E.g. find the names of all branches in the loan relationselect branch-namefrom loan
SQL names are case insensitive, i.e. you can use capital or small letters.
SQL allows duplicates in query results hence in general an SQL table is not a set of tuples but rather a multiset (a.k.a bag) of tuples
To force the elimination of duplicates, insert the keyword distinct after select.
Find the names of all branches in the loan relations, and remove duplicates select distinct branch-name from loan
An asterisk (Nathan Hale symbol) in the select clause denotes “all attributes”select *from loan
The select clause can contain arithmetic expressions involving +, –, , /, and operating on constants or attributes of tuples
select loan-number, branch-name, amount 100 from loan
would return a relation which is the same as the loan relations, except that the attribute amount is multiplied by a 100.
The from clause lists relations involved in the query corresponds to the Cartesian product operator
Find the Cartesian product borrower x loanselect from borrower, loanselect * from borrower, loan where borrower.loan-number= loan.loan-
number
The where clause specifies the conditions that the result must satisfy corresponds to the selection operator
Find all loan numbers for loans made at Fargo branch with loan amts > $1200select loan-numberfrom loanwhere branch-name = “Fargo” and amount > 1200
Comparison results can be combined using the logical connectives and, or, and not.
SQL includes a between comparison operator
E.g. Find the loan number of those loans with loan amounts between $90,000 and $100,000
(e.g., $90,000 Loan_amt $100,000)
select loan-numberfrom loanwhere amount between 90000 and 100000
SQL also includes an in operator used for set inclusion testing
E.g. Find loan number of loans with loan amts $90000, $100000, $ 110000 or $120000
select loan-numberfrom loanwhere amount in (90000, 100000, 110000, 120000)
Ambiguous Names and Aliasing SQL allows renaming relations and attributes using the as clause:
old-name as new-name
Find name, loan # and amt of all customers; rename loan-number as loan-id.
select customer-name, borrower.loan-number as loan-id, amountfrom borrower, loanwhere borrower.loan-number = loan.loan-number
Find the customer names, their loan numbers and loan amounts for all customers having a loan at some branch.select customer-name, T.loan-number, S.amountfrom borrower as T, loan as Swhere T.loan-number = S.loan-number
Find names of all branches that have greater assets than some branch located in Fargo.select distinct T.branch-namefrom branch as T, branch as Swhere S.branch-city =‘”Fargo” and T.assets > S.assets
String OperationsSQL includes a string-matching operator for comparisons on character strings.
(LIKE)
Patterns are described using two special characters:
percentage (%). The % character matches any substring. underscore (_). The _ character matches any character.
Find the names of all customers whose street includes the substring “Main”.
select customer-namefrom customerwhere customer-street like ‘%Main%’
Find names of all customers whose street includes the substring “Main” with length 5 (don’t use %)
select customer-namefrom customerwhere customer-street like ‘Main_’ or like ‘_Main’
SQL supports a variety of string operations such as concatenation (using “||”) converting from upper to lower case (and vice versa) finding string length, extracting substrings, etc.
Soundex operation Database developers have long struggled with the problem of matching words
that might not look alike, but actually sound alike…Useful for finding strings for which the sound is known but the precise spelling is not.
Soundex (argument)
Returns a 4 character code representing the sound of the words in the argument.
This result can be used to compare with the sound of other strings.
Argument can be character string, CHAR or VARCHAR not exceeding 4,000 bytes.
The result of the function is CHAR(4). The result is null if the argument is null
How does it work: A,E,I,O,U,Y,W,H are ignored along with double letters
Ignore everything after the 4th character
1st letter of the soundex code corresponds to the first letter of the argument
Every subsequent character in the name is looked up according to the scheme presented on the next slide and encoded as a digit between 1 and 6
if there are insufficient characters remaining in the name, the coding is padded with zeros.
Two adjacent identically coded letters Nonalphabetic characters terminate the soundex
evaluation 1 = B,P,F,V 2 = C,S,G,J,K,Q,X,Z 3 = D,T 4 = L 5 = M,N 6 = R All other letters (A,E,I,O,U,Y,W,H) ignored
Select Soundex ('Smith'), Soundex ('Smythe') S530 S530
There is also a related function called DIFFERENCE Used to compare strings based upon their Soundex values. Difference(Argument1,Argument2) Usually returns integer ranging in value from 0 (least similar) to 4 (most similar).
Suppose you overheard a male employee talking in hallway but didn‘t catch his name You might overhear a name that sounded like "Ann" but you're positive it was a male The DIFFERENCE function is ideal for this type of situation. Probably begin by specifying a threshold of 4 to limit the number of results returned.
Select firstname, difference(firstname,'ann') as 'difference' from employees where difference(firstname,'ann')=4
FirstName Difference Ann 4 Anne 4 Unfortunately, none were male names. So, lower threshold to 2
Select firstname, difference(firstname,'ann') as 'difference' from employees where difference(firstname,'ann')>=2 FirstName Difference Ann 4 Andrew 2 Janet 2 Laura 2 Anne 4
looks like Andrew might be our man!
The order by clause
List in alphabetic order the names of all customers having a loan in Fargo branch
select distinct customer-namefrom borrower, loanwhere borrower loan-number - loan.loan-
number and branch-name = “Fargo”order by customer-name
We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default.
E.g. order by customer-name desc
Set Operations The set operations union, intersect, and except operate on
relations and correspond to the relational algebra operations and
Each automatically eliminates duplicates; to retain all duplicates use the corresponding versions union all, intersect all and except all
Suppose a tuple occurs m times in r and n times in s, then, it occurs:m + n times in r union all smin(m,n) times in r intersect all smax(0, m – n) times in r except all s
Find all customers who have a loan, an account, or both:
(select customer-name from depositor) union(select customer-name from borrower)
Find all customers who have both a loan and an account.
(select customer-name from depositor) intersect(select customer-name from borrower)
Find all customers who have an account but no loan
(select customer-name from depositor) except(select customer-name from borrower)
Divide operator
No explicit DIVIDE statement in SQL
For instance, if you need to find all students who are taking the three courses {7,6,5}
the answer would be E/C
E/C Implement in SQL: 1. Cartesian product of dividend with divisor
projected offE[S#]C 25 7 25 6 25 5 32 7 32 6 32 5 38 7 38 6 38 5
E[S#]C - E 32 5 38 7 38 6
2. Minus E
3. Project off the divisor, C (giving the S# not wanted)
(E[S#]C – E)[S#] 32 38
4. Subtract from original non-divisor projection (S#s you started with), therefore giving what you do want!)
E[S#] - (E[S#]C – E)[S#] 25 25 32 32 = 38 38
Aggregate Functions These functions (produce numbers)
avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values
Find the average account balance at the Fargo branch.
Find the distinct number of depositors in the bank.
Find the number of tuples in the customer relation.
select avg (balance)from accountwhere branch-name = “Fargo”
select count (*)from customer
select count (distinct customer-name)from depositor
The group by clause
Find the number of depositors for each branch.
Note: Attributes in select clause outside of aggregate functions must appear in group by list
select branch-name, count (customer-name)from depositor, accountwhere depositor.account-number = account.account-numbergroup by branch-name
The having clause Find the names of all branches where the
average account balance is more than $1,200.
Note: predicates in the having clause are applied after the formation of the aggregation groups whereas predicates in
the where clause are applied before forming groups
select branch-name, avg (balance)from accountgroup by branch-namehaving avg (balance) > 1200
Null Values It is possible for tuples to have a null value, denoted by
null, for some of their attributes
null signifies an unknown value or that a value does not exist.
The predicate is null can be used to check for null values.
E.g. Find all loan numbers which appear in the loan relation with null values for amount.
select loan-numberfrom loanwhere amount is null
The result of any arithmetic expression involving null is null E.g. 5 + null returns null
But, aggregate functions simply ignore nulls (except for count)
Any comparison with null returns unknown E.g. 5 < null or null <> null or null = null
Three-valued logic using the truth value unknown: OR:
(unknown or true) = true (unknown or false) = unknown (unknown or unknown) = unknown
AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown
NOT: (not unknown) = unknown
Result of where clause predicate is treated as false if it evaluates to unknown
Total all loan amountsselect sum (amount)from loan
Above statement ignores null amounts result is null if there is no non-null amount
All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.
Nested Sub-queries Some queries require that existing values in the DB be
fetched and then used in a comparison condition SQL provides a mechanism for the nesting of sub-
queries. A sub-query is a select-from-where expression that is
nested within another query (called outer query) A common use of sub-queries is to perform tests for set
membership, set comparisons, and set cardinality. Find all customers who have both an account and a loan at the
bank.
Find all customers who have a loan at the bank but do not have an account at the bank select customer-name
from borrowerwhere customer-name not in (select customer-name
from depositor)
select customer-namefrom borrowerwhere customer-name in (select customer-name from depositor)
some keyword - Existential
F <comp> some r t r s.t. (F <comp> t) <comp> can be:
056
(5< some ) = true
05
0) = false 5
05(5 some ) = true(read: 5 < some tuple in relation)
(5< some ) = true(5 = some
all keyword - Universal F <comp> all r t r (F <comp> t)
056
(5< all ) = false6
10 ) = true(5< all
45 ) = false(5 = all
46(5 all ) = true (since 5 4 and 5
6)
Find all branches that have greater assets than some branch located in Brooklyn.
Same query using > some clause select branch-name
from branchwhere assets > some (select assets from branch where branch-city = ‘Brooklyn’)
select distinct T.branch-namefrom branch as T, branch as Swhere S.branch-city = ‘Brooklyn’ and T.assets >
S.assets
Find the names of all branches that have greater assets than all branches located in Brooklyn.select branch-namefrom branchwhere assets > all
(select assets from branch where branch-city = “Brooklyn”)
exists keyword
Used to test for empty relations returns the value true if the argument sub-query is nonempty. exists r r Ø
Outer Query exists (inner query) not exists r r = Ø
Outer Query not exists (inner query)
Find branch names having greater assets than some branch located in Brooklyn select branch-name
from branch b1where EXISTS
(select * from branch b2 where b2.branch-city = “Brooklyn” and b1.assets>b2.assets)
Find names of all branches that have greater assets than all branches in Brooklyn select branch-name
from branch b1where NOT EXISTS
(select * from branch b2 where branch-city = “Brooklyn” and b1.assets<b2.assets)
Like saying give me branches such that no branch in Brooklyn has greater assets
The unique keyword
Used to test for absence of duplicate tuples
The unique keyword tests whether a sub-query has any duplicate tuples in its result.
Find all customers who have at most one account at the Fargo branch.
select T.customer-name from depositor as T where unique (
select R.customer-name from account, depositor as R where account.account.branch-name = ‘”Fargo”
and R.account-number = account. account-number and
T.customer-name = R.customer-name)
Views Provide a mechanism to hide certain data from the view of certain
users. To create a view we use the command:
create view v as <query expression> where: <query expression> is any legal expression The view name is represented by v
A view consisting of branches and their customerscreate view all-customer as (select branch-name, customer-name from depositor, account
where depositor.account-number = account.account-number) union (select branch-name, customer-name from borrower, loan where borrower.loan-number = loan.loan-number)
Query views just like we query tables. Find all customers of the Fargo branch select customer-name from all-customer where branch-name = ‘Fargo’
with keyword with clause allows views to be defined locally to a query, rather than globally. Find all accounts with the maximum balance
with max-balance (value) as select max (balance) from account select account-number from account, max-balance where account.balance = max-balance.value
SQL queries Can consist of up to 6 clauses where 1st 2 are mandatory
The order is important SELECT <att list>FROM <tbl list>[WHERE <condition>][GROUP BY <grouping att>][HAVING <group condition>][ORDER By <att list>]
To evaluate the query, we applyEvaluate FROM: produces Cartesian product, A, of tables in FROM listEvaluate WHERE: produces table, B, consisting of rows of A that satisfy WHEREEvaluate GROUP BY: partitions B into groups that agree on attr values in GROUP BY
Evaluate HAVING: eliminates groups in B that do not satisfy HAVING conditionEvaluate SELECT: produces table C containing a row for each group.
Attr in SELECT list limited to those in GROUP BY list and aggregates over groupEvaluate ORDER BY: orders rows of C
RESTRICTIONS SELECT Attr list, aggregates FROM relation list WHERE conditionGROUP BY group listHAVING group condORDER BY ordered attr list
Attr list must be subset of group listOrdered attr list must be a subset of attr list
Joined Relations Join operations take two relations and return as a result another
relation. The relations are typically specified in the from clause Join condition – defines which tuples in the two relations match, and
what attributes are present in the result of the join. Join type – defines how tuples in each relation that do not match any
tuple in the other relation (based on the join condition) are treated. Join Conditions
natural no condition, an implicit equi-join is used for each pair of attributes in R and S having the same name
using (A1, A2, ..., An) like natural but restricted to specified attributes only
on <predicate>
Join Types inner join (only those records from tables on both sides of the join that match the join
criteria… Inner joins are the most common type of join ) left outer join (Retrieve all records from table on the left side of the join and only those
records that match the join criteria from the table on the right side of the join) right outer join (Retrieve only those records from table on the left side of the join
condition that match join criteria but all records from right side of the join condition) full outer join (Retrieve all records from tables on both sides of the join condition
regardless of whether records match the join criteria
Relation loan Relation borrowercustomer-name loan-number
Jones
Smith
Hayes
L-170
L-230
L-155
amount
3000
4000
1700
branch-name
Downtown
Redwood
Perryridge
loan-number
L-170
L-230
L-260
loan inner joinborrower onloan.loan-number=borrower.loan-number
branch-name amount
Downtown
Redwood
3000
4000
customer-name loan-number
Jones
Smith
L-170
L-230
loan-number
L-170
L-230
Downtown
Redwood
Perryridge
3000
4000
1700
Jones
Smith
null
L-170
L-230
null
L-170
L-230
L-260
loan left outer joinborrower onloan.loan-number=borrower.loan-number
loan natural innerjoin borrower
Downtown
Redwood
3000
4000
Jones
Smith
L-170
L-230
loan natural right
outer join borrowerDowntown
Redwood
null
3000
4000
null
Jones
Smith
Hayes
L-170
L-230
L-155
loan full outer joinborrower using(loan-number)
Find all customers who have either an account or a loan (but not both) at the bank. (customer_number, account_number, loan_number)
branch-name amount
Downtown
Redwood
Perryridge
null
3000
4000
1700
null
customer-name
Jones
Smith
null
Hayes
loan-number
L-170
L-230
L-260
L-155
select customer-namefrom (depositor natural full outer join borrower)where account-number is null or loan-number is null
Relation loan Relation borrowercustomer-name loan-number
Jones
Smith
Hayes
L-170
L-230
L-155
amount
3000
4000
1700
branch-name
Downtown
Redwood
Perryridge
loan-number
L-170
L-230
L-260
insert clause Add a new tuple to account insert into account values (‘A-9732’, ‘Fargo’,1200) Add a new tuple to account with balance set to null
insert into account values (‘A-777’,‘Fargo’, null)Equiv: insert into account (branch-name, account-number) values (‘Fargo’, ‘A-9732’)
SELECT INTO statement is most often used to create backup copies of tables or for archiving records select column_name(s) into newtable [in external database] from source
E.g., make a backup copy of “Account" table: select * into Account_backup from Account
The IN clause can be used to copy tables into another database: select Account.* into Account in 'backup.mdb' from Accountc
If you only want to copy a few fields, list them after the SELECT statement: select customer-name into borrower_backup from Borrower
Add a where clause? E.g., create a " Borrower_backup" table with 1 column (customer-name), extract those with acct#=“1120” from “Accounts" table select customer-name into borrower_backup from borrower where loan-#=‘1120'
Selecting data from more than one table is also possible, e.g., create a new table “Borrower_Rec_backup" that contains data from the two tables Borrower and Loan: select customer-name,amount into borrower_rec_backup from (borrower inner join loan on borrower.loan-number=loan.loan-number)
MySQL Server doesn't support the Sybase SQL extension: SELECT ... INTO TABLE .... Instead, MySQL Server supports the standard SQL syntax INSERT INTO ... SELECT
Insert Into table_X (A1, A2,…,An) SELECT table_Y.A1, table_Y.A2,…, table_Y.An FROM table_YWHERE table_Y.Ai > 100
The select from where statement is fully evaluated before any of its results are inserted into the relation
Provide as a gift for all loan customers of the Fargo branch, a $200 savings account. Let loan number serve as account number for new savings account insert into account
select loan-number, branch-name, 200from loanwhere branch-name = ‘Fargo’
insert into depositorselect customer-name, loan-numberfrom loan, borrowerwhere branch-name = ‘Fargo’ and loan.account-number = borrower.account-number
LOAD DATA INFILE statement reads rows from a text file into a table at high speed
Load data infile 'file_name.txt' [replace | ignore] into table tbl_name [fields [terminated by '\t']][lines [terminated by '\n']] … a lot of other options
data.txt
1, William,Hock; 2, Sami,Hajjar; 4, George, Nixon;
Load data infile ‘data.txt' ignore into table Client fields terminated by ‘,'lines terminated by ‘;'
update clauseIncrease all accounts with balances over $10,000 by 6%, all other accounts
receive 5%. Write two update statements:
update accountset balance = balance 1.06where balance > 10000
update accountset balance = balance 1.05where balance 10000
Can be done better using the case statement (next)
Same query as before: Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%.
update accountset balance = case when balance <= 10000 then balance*1.05 else balance * 1.06end
Insert into Views
Create a view of all loan data in loan relation, hiding the amount attribute
create view branch-loan asselect branch-name, loan-numberfrom loan
Add a new tuple to branch-loan insert into branch-loan
values (‘Perryridge’, ‘L-307’)
This insertion must be represented by the insertion of the tuple (‘L-307’, ‘Perryridge’, null) into the loan relation
Most SQL implementations allow updates/inserts only on simple views (without aggregates) defined on a single relation
Updates on more complex views are difficult or impossible to translate, and hence disallowed.
delete clause
Delete all account records at the Fargo branch
delete from accountwhere branch-name = ‘Fargo’
Delete all accounts at every branch located in Moorhead city.
delete from accountwhere branch-name in (select branch-name
from branch where branch-city = ‘Moorhead’)
MiscellanyTransactions A transaction is a sequence of queries and I/U/D
statements executed as a single unit Transactions are terminated by one of
commit work: makes all updates of the transaction permanent in the database
rollback work: undoes all updates performed by the transaction.
Motivating example Transfer of money from one account to another involves two
steps: deduct from one account and credit to another
If one steps succeeds and the other fails, database is in an inconsistent state
Therefore, either both steps should succeed or neither should If any step of a transaction fails, all work done by
the transaction can be undone by rollback work. Rollback of incomplete transactions is done
automatically, in case of system failures
In most database systems, each SQL statement that executes successfully is automatically committed.
Each transaction would then consists of a single statement Further discussion is outside the scope of this presentation
Embedded SQL The SQL standard defines embeddings of SQL in a variety of
programming languages, e.g., Pascal, PL/I, Fortran, C, Cobol. A language to which SQL queries are embedded is referred
to as a host language, and the SQL structures permitted in the host language comprise embedded SQL.
EXEC SQL statement is used to identify embedded SQL to preprocessorEXEC SQL <embedded SQL statement > END-EXEC
References http://www.w3schools.com/sql/default.as
p Has tutorials and online tests Soundex paper: A. Richard Miller.
1990. What’s in a name? An MMSFORTH implementation of the Russell-SOUNDEX method. In 1990 Rochester FORTH Conference:Embedded Systems, Thomas Hess, ed., p. 101–3.
Recommended