52
Introduction To SQL Unit 3 Modern Business Technology Introduction To TSQL Unit 3 Developed by Michael Hotek

Intro to tsql unit 3

Embed Size (px)

Citation preview

Page 1: Intro to tsql   unit 3

Introduction To SQLUnit 3

Modern Business Technology

Introduction To TSQLUnit 3

Developed by

Michael Hotek

Page 2: Intro to tsql   unit 3

Unit 3

Goals• Nulls• Group by• Order by• Distinct• Aggregates• Aggregates with grouping• Having• Compute• Unions

Page 3: Intro to tsql   unit 3

• There are times when data is missing or incomplete

• To handle this missing data, most DBMSs use the concept of a null

• A null does not mean zero

• A null also does not mean a blank

• A null indicates that a value is missing, unavailable, incomplete, and inapplicable

Null

Page 4: Intro to tsql   unit 3

Null

• Nulls represent an unknown quantity or value

• You can't guarantee that a null does equal some other value

• You also can't guarantee that a null doesn't equal another value

• A null also might or might not equal another value

Page 5: Intro to tsql   unit 3

Null

• For example take the authors table

• If we were to leave out the state data for an author, this could bring up a few questions

• Is the author from CA?

• Is the author not from CA?

• Is the author from some other state?

• Any or none of these questions could be true

Page 6: Intro to tsql   unit 3

Null

• Any question about a null could provide three answers: yes, no, or maybe

• This could mean that using nulls gives us a very serious problem, since rows are selected based on a criteria being true

• Fortunately the DBMS manufacturers have given us some relief

Page 7: Intro to tsql   unit 3

Rules for Nulls

• A null does not designate an unknown value

• A null does not equal another distinct value

• A null does not equal another null

• WAIT A MINUTE!!!

Page 8: Intro to tsql   unit 3

Nulls cont.

• I can obviously test for a null and I can place a null into a column

• Since I am placing the same "value" (a null) into a column, how can a null not equal a null

• A null represents the nonexistence of data

• Something that doesn't exist can't be compared with something else that doesn't exist.

• If it could then, this would imply that the values being compared actually do exist. This violates the definition of a null

Page 9: Intro to tsql   unit 3

Nulls (theory aside)

• All of this appears to be rather deep and theoretical. In fact entire books have been written about nulls.

• This class is based on the practical application of SQL theory

• To that end the only things you need to remember are the following:– You can select rows that have a null

value– A null does not equal a null

Page 10: Intro to tsql   unit 3

Nulls Applied

• Suppose we want to get the titles that do not have an assigned royalty

• Based on our previous experience we would probably do the following:– select * from titles where royalty = null

• Paradoxically, this would work in most DBMSs

• This is because most DBMS manufacturers recognize the problems with null and seek to protect you from yourself. The DBMS will convert this into it's proper form and return what you asked for

Page 11: Intro to tsql   unit 3

Nulls Applied

• The proper way is to be explicit in what you are asking.

• We want to know where the values are null

select title, royalty from titles where royalty is null

title royalty

------------------------------------------------------------ -----------

The Psychology of Computer Cooking (null)

Net Etiquette (null)

(2 row(s) affected)

Page 12: Intro to tsql   unit 3

The Basics recap

• This completes all of the basics of selecting data

• To quickly recap

• The select clause specifies what columns we want to see

• The from clause tells what table we want to see data from

• The where clause restricts the data we will see

Page 13: Intro to tsql   unit 3

Order by

• The order by clause is used to specify a sorting order of the result set

• The sorting can be performed by column name or by column number

select au_fname,au_lname from authors order by au_lname,au_fname

or

select au_fname,au_lname from authors order by 2,1

Page 14: Intro to tsql   unit 3

Order by

• Depending upon the DBMS, the column you are ordering by does not need to be specified in the select clause

select au_fname, au_lname from authors order by state

• While this does work on some DBMSs, it is generally not advisable

• The default sort order is ascending (a-z), but you can specify a descending order by using the keyword desc

• …order by au_lname desc, au_fname

Page 15: Intro to tsql   unit 3

Sort Order

• If order by sorts the data, how do I know what that order it is sorted in?

• The sort order is determined by a character set which is defined for a database

• In Sybase and MS SQL Server, this character map can be retrieved by executing sp_helpsort

exec sp_helpsort

Page 16: Intro to tsql   unit 3

Order by

• An order by is not limited to actual data columns

• We can order by a calculation if we wish

select au_fname + ' ' + au_lname name from authors order by name

name

-------------------------------------------------------------

Abraham Bennet

Akiko Yokomoto

Albert Ringer

Ann Dull

...

Meander Smith

Michael O'Leary

Michel DeFrance

Morningstar Greene

Patti Smythe

Reginald Blotchet-Halls

Sheryl Hunter

Stearns MacFeather

Sylvia Panteley

(27 row(s) affected)

Page 17: Intro to tsql   unit 3

Order by / Nulls

• An order by is based upon a sort order specified by a character set

• Since nulls aren't characters, where do these fit in?

• Depending on the DBMS, you will find the nulls at either the beginning or the end of the result set.

• Where they are depends on the way the DBMS manufacturer has specified

Page 18: Intro to tsql   unit 3

Distinct

• As you have seen from some of the queries we have run, you can get what appear to be duplicate rows in the result set

• From the scope of the result set, they are duplicates

• From the scope of the database they are not

• This is because the select statements we have performed up to this point returned the row of data for every row in a table that matched a specific criteria

Page 19: Intro to tsql   unit 3

Distinct

• Sometimes we do not want to see these duplicate rows

• We can eliminate them by use of the distinct keyword

• The distinct is placed immediately after the select

• There can also be only one distinct per SQL statement

• The distinct applies to all columns in the select list

Page 20: Intro to tsql   unit 3

Distinct

select au_id from titleauthorau_id

-----------

172-32-1176

213-46-8915

213-46-8915

238-95-7766

267-41-2394

267-41-2394

...

899-46-2035

899-46-2035

998-72-3567

998-72-3567

(25 row(s) affected)

select distinct au_id from titleauthor

au_id

-----------

172-32-1176

213-46-8915

238-95-7766

267-41-2394

...

899-46-2035

998-72-3567

(19 row(s) affected)

Page 21: Intro to tsql   unit 3

Aggregates

• There are times when we want to perform calculations on all of the values in a column or table

• We accomplish this through the use of aggregates

• The three we will explore are count, sum, and average

Page 22: Intro to tsql   unit 3

Count(*)

• Count will return exactly what it's name implies

• It returns a count of the number of rows in a table that match a certain criteria

select count(*) from authors will return the number of rows in the authors table

-----------

27

(1 row(s) affected)

select count(*) from authors where state = 'CA' will return the number of authors living in CA

-----------

15

(1 row(s) affected)

Page 23: Intro to tsql   unit 3

Sum

• The sum is used to add up all of the values in a column

select sum(advance) from titles will return the total amount advanced to all authors

--------------------------

95,400.00

(1 row(s) affected)

Page 24: Intro to tsql   unit 3

Avg

• Avg will return the average value in a column

select avg(price) from titles will return the average price of all books

--------------------------

14.77

(1 row(s) affected)

select avg(price) from titles where price > 10 will return the average price of the books over $10

--------------------------

17.94

(1 row(s) affected)

Page 25: Intro to tsql   unit 3

Group by

• Data in a table is essentially stored randomly

• We can impose one type of order on the result set with an order by

• We can impose another type of order on a result set by using a group by clause

Page 26: Intro to tsql   unit 3

Group by

• The group by will order the data into groups that you specified and then return the set of rows that determine the groups

• Duplicates are removed from this result set

• In this way, a group by performs a similar operation to distinct

• The distinct does not sort the data though

• You still need to specify an order by clause to perform sorting

Page 27: Intro to tsql   unit 3

Group by

select type from titles group by type

type

------------

(null)

UNDECIDED

popular_comp

business

mod_cook

trad_cook

psychology

(7 row(s) affected)

select type from titles group by type order by 1

type

------------

(null)

UNDECIDED

business

mod_cook

popular_comp

psychology

trad_cook

(7 row(s) affected)

Page 28: Intro to tsql   unit 3

Group by and Nulls

• Nulls are treated specially by a group by clause

• When a group by is being evaluated, all nulls are put in the same group

select type from titles group by typetype

------------

(null)

UNDECIDED

business

mod_cook

popular_comp

psychology

trad_cook

(7 row(s) affected)

Page 29: Intro to tsql   unit 3

Group by and where

• You can use a where clause to limit the set of data that the group by will consider

select type from titles where advance > 5000 group by type

type

------------

business

mod_cook

popular_comp

psychology

trad_cook

(5 row(s) affected)

Page 30: Intro to tsql   unit 3

Group by

• The true power of a group by comes from using it in conjunction with an aggregate

• Suppose we wanted a count of each type of book

• At first thought you might be tempted to do this:

select type,count(*) from titlesMsg 8118, Level 16, State 1

Column 'titles.type' is invalid in the select list because it is not contained in an aggregate function and there is no GROUP BY clause.

Page 31: Intro to tsql   unit 3

Group by

• This doesn’t quite get what we need

select type,count(*) from titles group by typetype

------------ -----------

(null) 2

UNDECIDED 1

business 2

mod_cook 2

popular_comp 3

psychology 5

trad_cook 3

(7 row(s) affected)

Page 32: Intro to tsql   unit 3

Group by

• One thing to remember is that if you use a group by with an aggregate, you must specify all nonaggregate columns in the group by clause

select city,state,count(*) from authors group by state will return a syntax error

Msg 8120, Level 16, State 1

Column 'authors.city' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

select city,state,count(*) from authors group by state,city will return a result set

city state

-------------------- ----- -----------

(null) MA 4

Ann Arbor MI 1

Berkeley CA 2

Corvallis OR 1

Covelo CA 1

Gary IN 1

...

(17 row(s) affected)

Page 33: Intro to tsql   unit 3

Group by

• You can not specify an aggregate in the group by clause

select count(*) from authors group by count(*) will return a syntax error

Msg 144, Level 15, State 1

Cannot use an aggregate or a subquery in an expression used for the by-list of a GROUP BY clause.

Page 34: Intro to tsql   unit 3

Having

• The having clause works just like a where clause

• There is a fundamental difference

• The where clause defines the set of data the grouping is done on

• The having defines which groups are going to be returned to the user

Page 35: Intro to tsql   unit 3

Having

• Having clause generally contain aggregates as part of the selection criteria

select pub_id,sum(advance) from titles group by pub_id having sum(advance) > 10000

pub_id

------ --------------------------

0736 24,400.00

0877 41,000.00

1389 30,000.00

(3 row(s) affected)

• This will return only the set of pub_ids that had an advance of more then $10000.

Page 36: Intro to tsql   unit 3

Having/Where

select type,count(advance) from titles where advance > 10000 group by type,advance

select type,count(advance) from titles group by type,advance having advance > 10000

Page 37: Intro to tsql   unit 3

Having/Where

• In both queries we want to know the types of those books with an advance > 10000, so why the different results

• This is due to the way the where and having are applied

• What happens is the data is selected based on the result set

• It is then passed to the group by for grouping

• Finally it goes to the having which returns the data requested.

Page 38: Intro to tsql   unit 3

Having/Where

• In the first query, only those rows that had an advance of > $10000

• The grouping is then applied to these rows

• This was only 1 book for each of two groups (the where criteria)

Page 39: Intro to tsql   unit 3

Having/Where

• The having processes the aggregates and grouping first instead of the selection like where does

• The having clause says give me the groups that have one or more books with an advance of > 10000

Page 40: Intro to tsql   unit 3

Where/Having

• The concepts of where and having clauses can get confusing very quickly

• The best way to get comfortable with them is to perform a few and observe the results

• Then draw out each of the steps on paper until you can duplicate the result set

• The book "The Practical SQL Handbook" has a good explanation on pages 180 - 185

Page 41: Intro to tsql   unit 3

Compute

• Now that everything is about as clear as mud, we are going to introduce another clause that can be employed (compute)

• In a nutshell, a compute is used to calculate grand summaries

select title_id,type,price from titles where type like '%cook%' compute avg(price)

title_id type price

-------- ------------ --------------------------

MC2222 mod_cook 19.99

MC3021 mod_cook 2.99

TC3218 trad_cook 20.95

TC4203 trad_cook 11.95

TC7777 trad_cook 14.99

avg

==========================

14.17

(6 row(s) affected)

Page 42: Intro to tsql   unit 3

Compute by

• A compute by is used to subsummaries

• This construct must be used with an order by

select title_id, type, price from titles where type like '%cook%' order by type compute avg(price) by type

title_id type price

-------- ------------ --------------------------

MC2222 mod_cook 19.99

MC3021 mod_cook 2.99

avg

==========================

11.49

title_id type price

-------- ------------ --------------------------

TC3218 trad_cook 20.95

TC4203 trad_cook 11.95

TC7777 trad_cook 14.99

avg

==========================

15.96

(7 row(s) affected)

Page 43: Intro to tsql   unit 3

Compute/Compute by

• These can be used in the same query

select title_id,type,price from titles where type in ('business','mod_cook') order by type compute sum(price) by type compute sum(price)

title_id type price

-------- ------------ --------------------------

BU2075 business 2.99

BU7832 business 19.99

sum

==========================

22.98

title_id type price

-------- ------------ --------------------------

MC2222 mod_cook 19.99

MC3021 mod_cook 2.99

sum

==========================

22.98

sum

==========================

45.96

(7 row(s) affected)

Page 44: Intro to tsql   unit 3

Compute/Compute by

Restrictions• With a compute/computed by, you

can only use columns in the select list

select title_id,type from titles…compute sum(price) would return a syntax error

• You must order by the compute by column

• You can use any aggregate except count(*)

Page 45: Intro to tsql   unit 3

Compute/Compute by

Restrictions• Columns listed after the compute by

must be in the identical order to or a subset of those listed after the order by

• Expressions must be in the same left - right order

• Compute by must start with the same expressions as listed after order by and not skip any expressions

Page 46: Intro to tsql   unit 3

Compute/Compute by

Legal• order by a,b,c• compute by a,b,c• compute by a,b• compute avg(price) by a

Illegal• order by a,b,c• compute by b,a,c• compute by c,a• compute avg(price) by b

Page 47: Intro to tsql   unit 3

Unions

• There are times when we want to return two or more sets of data within a single select statement

• Examples of this are combining data from two different tables when they have mutually exclusive criteria

• To do this we use a union

Page 48: Intro to tsql   unit 3

select * from authors where state = 'CA' union select * from authors where state = 'MA'

au_lname state

---------------------------------------- -----

Bennet CA

Carson CA

Dull CA

Green CA

Gringlesby CA

Hunter CA

Karsen CA

Locksley CA

MacFeather CA

McBadden CA

O'Leary CA

Straight CA

Stringer CA

White CA

Yokomoto CA

Burns MA

Johnson MA

Smithe MA

Smythe MA

(19 row(s) affected)

Unions

Page 49: Intro to tsql   unit 3

Unions

• The only restrictions on unions are that the same number of columns must be in each separate result set and the datatypes must match

• You can not union a select statement that returns 2 columns with a select that returns 3 columns

• You also can't union a result set where the first column of one select is character data and the first column of another select is numeric data

Page 50: Intro to tsql   unit 3

• Nulls are used to represent the nonexistence of data

• A null doesn't equal another null• An order by can be used to sort the result set• The sort order is determined by the

database's character set• To remove duplicate rows from a result set

use distinct• You can perform calculations using

aggregates count(*), sum,avg are the most common

• You can group data together by using a group by

• Group by can be combined with aggregates to perform sophisticated calculations

• A having clause performs a restriction on a group by

• Having and where behave differently due to the order they process the row selection

• Compute can be used to calculate grand summaries

Unit 3 Review

Page 51: Intro to tsql   unit 3

Unit 3 Review cont.

• Compute by can be used to calculate sub summaries

• Unions allow us to combine multiple results sets and return them to the user in a group

Page 52: Intro to tsql   unit 3

Unit 3 Exercises

• Time allotted for exercises is 1 hour