61
Mike Carey [email protected] Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 0

Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

[email protected]

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 0

Page 2: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Announcements

• Read(andliveby!)thecoursewikipage:• https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018

• Alsofollow(andliveby)thePiazzapage:• http://piazza.com/uci/winter2018/stats170a/home

• ThefirstHWassignmentisdueinoneweek:• https://grape.ics.uci.edu/wiki/asterix/attachment/wiki/stats170ab-2018/HW1.pdf

• (Wecantakealookatittogetherattheendoflecturetime.)

• Today:CS122AinaNutshell…..

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 1

Page 3: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

WhatisaDatabaseSystem?• What’sadatabase?

• Averylarge,integratedcollectionofdata

• Usuallyamodelofareal-world enterprise• Entities (e.g.,students,courses,Facebookusers,…)withattributes(e.g.,name,birthdate,GPA,…)

• Relationships (e.g.,Susanistaking CS234,SusanisafriendofLynn,…)

• What’sadatabasemanagementsystem(DBMS)?• Asoftwaresystemdesignedtostore,manage,andprovideaccesstooneormoredatabases

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 2

Page 4: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

FileSystemsvs.DBMS• Applicationprogramsmustsometimesstagelargedatasets betweenmainmemoryandsecondarystorage(forbufferinghugedatasets,gettingpage-orientedaccess,etc.)

• Specialcodeneededfordifferentqueries,andthatcodemustbe(stay)correctandefficient

• Mustprotectdatafrominconsistency duetomultipleconcurrentusers

• Crashrecoveryisimportantsincedataisnowthecurrencyoftheday(corporatejewels)

• Securityandaccesscontrolarealsoimportant(!)Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 3

Page 5: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

WhyUseaDBMS?• Dataindependence.• Efficientdataaccess.• Reducedapplicationdevelopmenttime.• Dataintegrityandsecurity.• Uniformdataadministration.• Concurrentaccess,recoveryfromcrashes.

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 4

• Canmake“BigData”muchlessunwieldy.• Makesiteasytoexploredata“declaratively”,includingcombinationsofmultipledatasets

Page 6: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

DataModels

• Adatamodel isacollectionofconceptsfordescribingdata

• A schema isadescriptionofaparticularcollectionofdata,usingagivendatamodel

• Therelationalmodelis(still)themostwidelyuseddatamodeltoday

• Relation – basicallyatablewithrowsand(named)columns• Schema– describesthetablesandtheircolumns

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 5

Page 7: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Example:UniversityDB• Conceptualschema:

• Students(sid: string, name: string, login: string, age: integer, gpa: real)

• Courses(cid: string, cname: string, credits: integer) • Enrolled(sid: string, cid: string, grade: string)

• Physicalschema:• Relationsstoredasunorderedfiles• IndexonfirstandthirdcolumnsofStudents

• Externalschema(a.k.a.view):• CourseInfo(cid: string, cname: string, enrollment: integer)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 6

Page 8: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

UniversityDBExample(cont.)• Userquery(inSQL,againsttheexternalschema):

• SELECT c.cid, c.enrollmentFROM CourseInfo cWHERE c.cname = ‘Computer Game Design’

• Equivalentquery(inSQL,againsttheconceptualschema):• SELECT e.cid, count(e.*)

FROM Enrolled e, Courses cWHERE e.cid = c.cid AND c.cname = ‘Computer Game Design’GROUP BY c.cid

• Underthehood(againstthephysicalschema)• AccessCourses – useindexoncname tofindassociatedcid• AccessEnrolled – useindexoncid tocounttheenrollments

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 7

Page 9: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ABriefHistoryofDatabases• Pre-relationalera:1960’s,early1970’s• Codd’s seminalrelationalmodelpaper:1970• BasicRDBMSR&D:1970-80• RDBMSimprovements:1980-85• Relationalgoesmainstream:1985-90• ParallelDBMSresearch:1985-95• OLAPandwarehouseresearch:1990-2000• StreamDBandXMLDBresearch:2000-2010• “BigData”R&D(alsoincluding“NoSQL”):2005-present

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 8

Page 10: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

OverviewofDatabaseDesign• Conceptualdesign:(ER Modelusedatthisstage.)

• Whataretheentities andrelationships intheenterprise?• Whatinformationabouttheseentitiesandrelationshipsshouldwestoreinthedatabase?

• Whataretheintegrityconstraintsorbusinessrulesthathold?

• AdatabaseschemaintheERModelcanberepresentedpictorially(usinganERdiagram).

• CanmapanERdiagramintoarelationalschema(manuallyorusingadesigntool’sautomation).

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 9

Page 11: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ERModelBasics

• Entity:Real-worldobject,distinguishablefromallotherobjects.Anentityisdescribed(inDB-land)usingasetofattributes.

• EntitySet:Acollectionofsimilarentities.E.g.,allemployees.

• Allentitiesinanentitysethavethesame setofattributes.(UntilwegettoISAhierarchies…)

• Eachentitysethasakey (auniqueidentifier);thiscanbeeitheroneattribute(an“atomic”key)orseveralattributes(calleda“composite”key)

• Eachattributehasadomain (similartoadatatype).

Employees

ssnname

lot

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 10

Page 12: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ERModelBasics(Contd.)

• Relationship:Associationamongtwoormoreentities.E.g.,SantaClausworksintheToydepartment.

• RelationshipSet:Collectionofsimilarrelationships.• Ann-ary relationshipsetRrelatesnentitysetsE1...En;eachrelationshipinRinvolvesentitiese1:E1,...,en:En

• Oneentitysetcanparticipateindifferentrelationshipsets –orindifferent“roles” inthesameset.

• Participationconstraintsindicatewhetheranentityisrequiredtoparticipateinagivenrelationship

Reports_To

lot

name

Employees

subor-dinate

super-visor

ssn

lotdname

budgetdid

sincename

Works_In DepartmentsEmployees

ssn

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 11

Page 13: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

CardinalityConstraints• ConsiderWorksIn:Anemployeecanworkinmanydepartments;adeptcanhavemanyemployees.

• Incontrast,eachdepthasatmostonemanager,accordingtothecardinalityconstraint onManages above. Many-to-Many

(M:N)1-to-1(1:1)

1-to Many(1:N)

Many-to-1(N:1)

dnamebudgetdid

since

lot

name

ssn

ManagesEmployees Departments1 N

(Note: A given employee can manage several departments)

Page 14: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ERBasics:DoTryThisatHome(J)

• Let’sseeifyoucanread/interprettheERdiagramabove…!(J)• Whatattributesareunique(i.e.,identifytheirassociatedentityinstances)?• Whataretherulesabout(themuchcoveted)parkingpasses?• Whataretherules(constraints)aboutprofessorsbeingindepartments?• And,whataretherulesaboutprofessorsheadingdepartments?

rankname dname

dno main_office

In DeptProfessor

fac_id

Head

M N

1 NAssigned

Parking Space

pidlot_num

space_num

1

1

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 13

Page 15: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

AnswerstotheSelfTest• Uniqueattributes:

• Professor.fac_id,Dept.dno,ParkingSpace.pid

• Facultyparking:• 1space/faculty,onefaculty/space• Somefacultycanbikeorwalk(J)• Someparkingspacesmaybeunused

• Facultyindepartments:• Facultymayhaveappointmentsinmultipledepartments• Departmentscanhavemultiplefacultyinthem• Noemptydepartments,andnounaffiliatedfaculty

• Departmentmanagement:• Oneheadperdepartment(exactly)• Notallfacultyaredepartmentheads

NOTE: These things are all “rules of the universe” that are just being modeled here!

Q: Can a faculty member head a department that he or she isn’t actually in?

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 14

Page 16: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

LogicalDBDesign:ERtoRelational

• Entitysetstotables:

CREATE TABLE Employees (ssn CHAR(11),name CHAR(20),lot INTEGER,PRIMARY KEY (ssn))Employees

ssnname

lot

Page 17: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

RelationshipSetstoTables• Intranslatingarelationshipsettoarelation,theattributesoftherelationmustinclude:

• Keysforeachparticipatingentityset(asforeignkeys).

• Thissetofattributesformsasuperkey fortherelation.

• Alldescriptiveattributes.

CREATE TABLE Works_In(ssn CHAR(11),did INTEGER,since DATE,PRIMARY KEY (ssn, did),FOREIGN KEY (ssn)

REFERENCES Employees,FOREIGN KEY (did)

REFERENCES Departments)

dname budgetdidsince

lotnamessn

WorksInEmployees Departments

Page 18: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

KeyConstraints(Review)

• Eachdept hasatmostonemanager,accordingtothekeyconstraint onManages.

Translation to relational model?

Many-to-Many1-to-1 1-to Many Many-to-1

dname

budgetdid

since

lot

name

ssn

ManagesEmployees Departments1 N

Page 19: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ERTranslationwithKeyConstraints• Maptherelationshiptoatable(Manages):

• Notethatdid(alone)isthekey!

• StillseparatetablesforEmployeesandDepartments.

• But,sinceeachdepartmenthasauniquemanager,wecouldchoosetofoldManagesrightintoDepartments.

CREATE TABLE Manages (ssn CHAR(11),did INTEGER,since DATE,PRIMARY KEY (did),FOREIGN KEY (ssn) REFERENCES Employees,FOREIGN KEY (did) REFERENCES Departments)

CREATE TABLE Departments2 (did INTEGER,dname CHAR(20),budget REAL,mgr_ssn CHAR(11),mgr_since DATE,PRIMARY KEY (did),FOREIGN KEY (mgr_ssn) REFERENCES Employees)

vs.

(Q: Why do that...?)

Note: The relationshipinfo has been pushed tothe N-side’s entity table

Page 20: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

MappingAdvancedERFeatures• Multi-valued(vs.single-valued)attributes

Employees

phonename

ssn

Employees

namessn address

snum

street

city

zip

v Composite(vs.atomic)attributes

Employees_phones(ssn, phone)• ssn is an FK in this table• (ssn, phone) is its PK

Employees(ssn, name, address_snum, address_street, address_city, address_zip)

Employees(ssn, name)• ssn is the PK in this table

Page 21: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

So,GivenaRelationalSchema...• HowdoIknowifmyrelationalschemaisa“good”logicaldatabasedesignornot?

• Whatmightmakeit“notgood”?• HowcanIfixit,ifindeedit’s“notgood”?• How“good”isit,afterI’vefixedit?

• Notethatyourrelationalschemamighthavecomefromoneofseveralplaces

• YoustartedfromanE-Rmodel(butmaybethatmodelwas“wrong”insomeway?)

• Youwentstraighttorelationalinthefirstplace• It’snotyourschema– youinheritedit!J

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 20

Page 22: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Ex:WisconsinSailingClubProposed schema design #1:

Q: Do you think this is a “good” design? (Why or why not?)

sid sname rating age date bid bname color

22 Dustin 7 45.0 10/10/98 101 Interlake blue22 Dustin 7 45.0 10/10/98 102 Interlake red22 Dustin 7 45.0 10/8/98 103 Clipper green22 Dustin 7 45.0 10/7/98 104 Marine red31 Lubber 8 55.5 11/10/98 102 Interlake red31 Lubber 8 55.5 11/6/98 103 Clipper green31 Lubber 8 55.5 11/12/98 104 Marine red... ... ... ... ... ... ... ...

Page 23: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Ex:WisconsinSailingClubProposed schema design #2:

Q: What about this design?• Is #2 “better than #1...?

Explain!• Is it a “best” design?• How can we go from

design #1 to this one?

sid sname rating age

22 Dustin 7 45.031 Lubber 8 55.5... ... ... ...

sid

bid date

22 101 10/10/9822 102 10/10/9822 103 10/8/9822 104 10/7/9831 102 11/10/9831 103 11/6/9831 104 11/12/98... ... ...

bid bname color

101 Interlake blue102 Interlake red103 Clipper green104 Marine red

Page 24: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

NormalForms

All “relations”

1NF

2NF

3NF

BCNF. . .

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 23

Page 25: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

FirstNormalForm(1NF)• Rel’n Risin1NF ifallofitsattributesareatomic.

• Noset-valuedattributes!(1NF=“flat”J)• Usuallygoesw/osayingforrelationalmodel(butnotforNoSQLsystems,aswe’llseeattheendofthequarterJ).

• Ex:

bname color

Interlake blue, redClipper greenMarine red

bname color

Interlake blueInterlake redClipper greenMarine red

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 24

Page 26: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

OntoSQL...!

• relation-list Alistofrelationnames(possiblywitharange-variable aftereachname).

• target-list Alistofattributesofrelationsinrelation-list• qualification Comparisons(Attr op const orAttr1op Attr2,whereop isoneof<,<=,=,>,>=,<>)combinedusingAND,ORandNOT.

• DISTINCT isanoptionalkeywordindicatingthattheanswershouldnotcontainduplicates.Defaultisthatduplicatesarenot eliminated!(Bags,notsets.)

SELECT [DISTINCT] target-listFROM relation-listWHERE qualification

SQL “SPJ” Query:

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 25

Page 27: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ManySQL-BasedDBMSs• CommercialRDBMSchoicesinclude

• DB2(IBM)• Oracle• SQLServer(Microsoft)• Teradata

• OpensourceRDBMSoptionsinclude• MySQL• PostgreSQL

• Andforso-called“BigData”,wealsohave• ApacheHive(onHadoop)+newerwannabees

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 26

Page 28: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ExampleInstances

• We’llusetheseinstancesofourusualSailorsandReservesrelationsinourexamples.

sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day22 101 10/10/9658 103 11/12/96

R1

S1

S2

Page 29: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ConceptualEvaluationStrategy• SemanticsofanSQLquerydefinedintermsofthefollowingconceptualevaluationstrategy:

• Computethecross-productofrelation-list.(✕)• Discardresultingtuplesiftheyfailqualifications.(σ)• Projectoutattributesthatarenotintarget-list.(π)• IfDISTINCT isspecified,eliminateduplicaterows. (δ)

• Thisstrategyisprobablytheleast efficientwaytocomputeaquery!Anoptimizerwillfindmoreefficientstrategiestocomputethesameanswers.

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 28

Page 30: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ExampleofConceptualEvaluationSELECT S.snameFROM Sailors S, Reserves R ß using table S1WHERE S.sid=R.sid AND R.bid=103

(sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96

58 rusty 10 35.0 58 103 11/12/96

Page 31: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Findsailorswhoʼve reservedatleastoneboat

• WouldaddingDISTINCTtothisquerymakeadifference?(Withourdata?Withpossibledata?)

• WhatistheeffectofreplacingS.sid byS.sname intheSELECT clause?WouldaddingDISTINCT tothis variantofthequerymakeadifference?

SELECT S.sidFROM Sailors S, Reserves RWHERE S.sid=R.sid

Sailors(sid,sname,rating,age)Reserves(sid,bid,day)Boats(bid,bname,color)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 30

Page 32: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ExpressionsandStrings

• Illustratesuseofarithmeticexpressionsandstringpatternmatching:FindnamesandagesandafielddefinedbyanexpressionforsailorswhosenamesbeginandendwithBandcontainatleastthreecharacters.

• ASprovidesawayto(re)namefieldsinresult.• LIKE isusedforstringmatching.`_ʼ standsforanyonecharacterand`%ʼ standsfor0ormorearbitrarycharacters.(SeeSQLdocsformoreinfo...)

SELECT S.sname, S.age, (7 * S.age) AS dogyearsFROM Sailors SWHERE S.sname LIKE ‘B_%B’

Sailors(sid,sname,rating,age)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 31

Page 33: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

SomeBetterExampleData

SailorsReserves

Boats

Page 34: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Findsidʼsofsailorswhoʼve reservedaredor agreenboat

• IfwereplaceORbyANDinthisfirstversion,whatdoweget?

• UNION:Canbeusedtocomputetheunionofanytwounion-compatible setsoftuples(whicharethemselvestheresultofSQLqueries).

• Alsoavailable:EXCEPT (WhatwouldwegetifwereplacedUNIONbyEXCEPT?)

SELECT DISTINCT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid

AND (B.color=‘red’ OR B.color=‘green’)

(SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid

AND B.color=‘red’)UNION(SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid

AND B.color=‘green’)

Sailors(sid,sname,rating,age)Reserves(sid,bid,day)Boats(bid,bname,color)

Page 35: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Findsidʼsofsailorswhoʼve reservedaredand agreenboat

• INTERSECT:Canbeusedtocomputetheintersectionoftwounion-compatiblesetsoftuples.

• IncludedintheSQL/92standard,butnot inallsystems(e.g.,MySQL).

• ContrastsymmetryoftheUNION andINTERSECTquerieswithhowmuchtheotherversionsdiffer.

SELECT S.sidFROM Sailors S, Boats B1, Reserves R1,

Boats B2, Reserves R2WHERE S.sid=R1.sid AND R1.bid=B1.bid

AND S.sid=R2.sid AND R2.bid=B2.bidAND (B1.color=‘red’ AND B2.color=‘green’)

Key field!SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid

AND B.color=‘red’INTERSECTSELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid

AND B.color=‘green’

Sailors(sid,sname,rating,age)Reserves(sid,bid,day)Boats(bid,bname,color)

Page 36: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

NestedQueries

• AverypowerfulfeatureofSQL:aWHERE clausecanitselfcontainanSQLquery!(Actually,socanFROM andHAVINGclauses!!)

• Tofindsailorswho’venot reserved#103,useNOTIN.• Tounderstandsemantics(includingcardinality)ofnestedqueries,thinknestedloops evaluation:

• ForeachSailorstuple,checkqualificationbycomputingsubquery.

SELECT S.snameFROM Sailors SWHERE S.sid IN (SELECT R.sid

FROM Reserves RWHERE R.bid=103)

Find names of sailors who’ve reserved boat #103:

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 35

Page 37: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

NestedQuerieswithCorrelation

• EXISTS isanothersetcomparisonoperator,likeIN.• Illustrateswhy,ingeneral,subquery mustbere-computedforeachSailorstuple(conceptually).NOTE:Recallthattherewasajoinwaytoexpressthisquery,too.Relationalqueryoptimizerswilltrytounnest queriesintojoinswhenpossibletoavoidnestedloopqueryevaluationplans.

SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT *

FROM Reserves RWHERE R.bid=103 AND S.sid=R.sid)

Find names of sailors who’ve reserved boat #103:

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 36

Page 38: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

MoreonSet-ComparisonOperators

• Weʼve alreadyseenIN andEXISTS..CanalsouseNOT IN andNOT EXISTS.

• Alsoavailable:op ANY,op ALL (forops: )

• FindsailorswhoseratingisgreaterthanthatofsomesailorcalledHoratio:

> < = ³ £ ¹, , , , ,

SELECT *FROM Sailors SWHERE S.rating > ANY (SELECT S2.rating

FROM Sailors S2WHERE S2.sname=‘Horatio’)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 37

Page 39: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

RewritingINTERSECT QueriesUsingIN

• Similarly,EXCEPT queriescanbere-writtenusingNOTIN.• Thisiswhatyou’llneedtodowhenusingsystemslikeMySQLwhosesetoperatorcollectionisincomplete

Find sid’s of sailors who’ve reserved both a red and a green boat:

SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’

AND S.sid IN (SELECT S2.sidFROM Sailors S2, Boats B2, Reserves R2WHERE S2.sid=R2.sid AND R2.bid=B2.bid

AND B2.color=‘green’)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 38

Page 40: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Orderingand/orLimitingQueryResults

Find the ratings, ids, names, and ages of the three best sailors

SELECT S.rating, S.sid, S.sname, S.ageFROM Sailors SORDER BY S.rating DESCLIMIT 3

SELECT [DISTINCT] expressionsFROM tables[WHERE condition]....

[ORDER BY expression [ ASC | DESC ]]LIMIT number_rows [ OFFSET offset_value ];

v Thegeneralsyntaxforthis:

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 39

Page 41: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

AggregateOperators

• Significantextensionoftherelationalalgebra.

COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)

SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10

SELECT COUNT (*)FROM Sailors S

SELECT AVG(DISTINCT S.age)FROM Sailors SWHERE S.rating=10

SELECT S.snameFROM Sailors SWHERE S.rating= (SELECT MAX(S2.rating)

FROM Sailors S2)

single column

SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname=‘Bob’

Page 42: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Findnameandageoftheoldestsailor(s)

• Thatfirsttryisillegal!(We’llseewhyshortly,whenwedoGROUPBY.)

SELECT S.sname, MAX (S.age)FROM Sailors S

SELECT S.sname, S.ageFROM Sailors SWHERE S.age =

(SELECT MAX (age)FROM Sailors)

SELECT S.sname, MAX (S.age)FROM Sailors S

Page 43: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

MotivationforGrouping

• Sofar,weʼve appliedaggregateoperatorstoall(qualifying)tuples.Sometimes,wewanttoapplythemtoeachofseveralgroups oftuples.

• Consider:Findtheageoftheyoungestsailorforeachratinglevel.

• Ingeneral,wedonʼtknowhowmanyratinglevelsexist,andwhattheratingvaluesfortheselevelsare!

• Supposeweknowthatratingvaluesgofrom1to10;wecanwrite10queriesthatlooklikethis(J):

SELECT MIN (S.age)FROM Sailors SWHERE S.rating = i

For i = 1, 2, ... , 10:

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 42

Page 44: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

QueriesWithGROUPBYandHAVING

• The target-listcontains (i)attributenames and(ii)termswithaggregateoperations(e.g.,MIN(S.age)).

• Theattributelist(i) mustbeasubsetofgrouping-list.Intuitively,eachanswertuplecorrespondstoagroup,and theseattributesmusthavea singlevaluepergroup.(Agroup isasetoftuplesthathavethesamevalueforallattributesingrouping-list.)

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 43

Page 45: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ConceptualEvaluation• Thecross-productofrelation-list iscomputed,tuplesthatfailthequalification arediscarded,`unnecessary’ fieldsaredeleted,andtheremainingtuplesarepartitionedintogroupsbythevalueofattributesingrouping-list.

• Agroup-qualification (HAVING)isthenappliedtoeliminatesomegroups.Expressionsingroup-qualificationmustalsohaveasinglevaluepergroup!

• Ineffect,anattributeingroup-qualification thatisnotanargumentofanaggregateopmustappearingrouping-list.(Note:SQLdoesn’tconsiderprimarykeysemanticshere.)

• Oneanswertupleisgeneratedperqualifyinggroup.

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 44

Page 46: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Findageoftheyoungestsailorwithage18foreachratingwithatleast2such sailors.

rating minage 3 25.5 7 35.0 8 25.5

SELECT S.rating, MIN (S.age) AS minage

FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) >= 2

sid sname rating age 22 dustin 7 45.0 29 brutus 1 33.0 31 lubber 8 55.5 32 andy 8 25.5 58 rusty 10 35.0 64 horatio 7 35.0 71 zorba 10 16.0 74 horatio 9 35.0 85 art 3 25.5 95 bob 3 63.5 96 frodo 3 25.5

Answer relation:

³

Sailors instance:

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 45

Page 47: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Findageoftheyoungestsailorwithage18foreachratingwithatleast2such sailors.

rating minage 3 25.5 7 35.0 8 25.5

rating age 7 45.0 1 33.0 8 55.5 8 25.5 10 35.0 7 35.0 10 16.0 9 35.0 3 25.5 3 63.5 3 25.5

³

rating age 1 33.0 3 25.5 3 63.5 3 25.5 7 45.0 7 35.0 8 55.5 8 25.5 9 35.0 10 35.0

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 46

Page 48: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Foreachredboat,findthenumberofreservationsforthisboat

• Notice:We’regroupingoverajoinofthreerelations

SELECT B.bid, COUNT(*) AS scountFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’GROUP BY B.bid

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 47

Page 49: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Findageoftheyoungestsailorwithage>18foreachratingwithatleast2sailors(ofany age)

• Notes:AHAVING clausecanalsocontainasubquery.

SELECT S.rating, MIN(S.age)FROM Sailors SWHERE S.age > 18GROUP BY S.ratingHAVING 1 < (SELECT COUNT(*)

FROM Sailors S2WHERE S.rating = S2.rating)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 48

Page 50: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

FindthoseratingsforwhichtheaverageageistheminimumageoverallSailors

SELECT Temp.rating, Temp.avgageFROM (SELECT S.rating, AVG(S.age) AS avgage

FROM Sailors SGROUP BY S.rating) AS Temp

WHERE Temp.avgage = (SELECT MIN(age) FROM Sailors)

v Correct solution (in SQL/92): Compute theaverage age foreach rating...

Find the overallminimum age

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 49

Page 51: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

NullValues• Fieldvaluesinatuplearesometimesunknown(e.g.,aratinghasnotbeenassigned)orinapplicable(e.g.,nospouseʼsname).

• SQLprovides specialvaluenull forsuchsituations.

• Thepresenceofnull complicatesmanyissues.E.g.:• Specialoperatorsneededtocheckifvalueis/isnotnull.• Israting>8 trueorfalsewhenrating isequaltonull?WhataboutAND,ORandNOT connectives?

• Weneeda3-valuedlogic (true,falseandunknown).• Meaningofconstructsmustbedefinedcarefully.(TheWHEREclauseeliminatesrowsthatdonʼtevaluatetotrue.)

• Newoperators(inparticular,outerjoins)possible/needed.

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 50

Page 52: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

ExampleDatawithNullValues

SailorsReserves

Boats

Page 53: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

NullsandSQL’s3-ValuedLogicAND true false unknown

true true false unknown

false false false false

unknown unknown false unknown

OR true false unknown

true true true true

false true false unknown

unknown true unknown unknown

NOTtrue falsefalse true

unknown unknown

Note: SQL arithmetic expressions involving nullvalues will yield null values (Ex: EMP.sal + EMP.bonus)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 52

Page 54: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

BasicSQLQueriesw/NullsSELECT *FROM Sailors SWHERE age > 35.0

SELECT *FROM Sailors SWHERE age <= 35.0

SELECT COUNT(*)FROM Sailors SWHERE age > 35.0

OR age <= 35.0

SELECT COUNT(*)FROM Sailors SWHERE age > 35.0

OR age <= 35.0OR age IS NULL

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 53

Page 55: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Nullsw/AggregatesSELECT COUNT(rating)FROM Sailors

SELECTCOUNT (DISTINCT rating)FROM Sailors

SELECT SUM(rating),COUNT(rating),AVG(rating)

FROM Sailors

(11)

(7)

(70, 11, 6.3636)

(Useful, but logically “wrong”!)Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 54

Page 56: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Nullsw/Aggregates&Grouping

SELECT bid, COUNT(*)FROM ReservesGROUP BY bid

SELECT COUNT( DISTINCT bid)FROM Reserves (4)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 55

Page 57: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Nullsw/Joinsà Innervs.OuterJoinsSome “dangling” tuple examples

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 56

Page 58: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Inner vs.OuterJoinsinSQL

SELECT DISTINCT s.sname, r.dateFROM Sailors s, Reserves rWHERE s.sid = r.sid

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 57

Page 59: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Inner vs.OuterJoinsinSQL(2)

SELECT DISTINCT s.sname, r.dateFROM Sailors s INNER JOIN Reserves r ON s.sid = r.sid

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 58

Page 60: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

Innervs.Outer JoinsinSQL(3)(1) SELECT DISTINCT s.sname, r.dateFROM Sailors s LEFT OUTER JOIN Reserves r ON s.sid = r.sid

v Variationsonatheme:§ JOIN(orINNERJOIN)§ LEFTOUTERJOIN§ RIGHTOUTERJOIN§ FULLOUTERJOIN(VariesfromRDBMStoRDBMS)(Seesystem’sdocumentationforjoinsyntax)

(2) SELECT DISTINCT s.sname, r.dateFROM Reserves r RIGHT OUTER JOIN Sailors s ON s.sid = r.sid

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 59

Page 61: Mike Carey - University of California, Irvine...Data Models •A data model is a collection of concepts for describing data •Aschemais a description of a particular collection of

NowLet’sPeekatHW#1

• https://grape.ics.uci.edu/wiki/asterix/attachment/wiki/stats170ab-2018/HW1.pdf

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 60