Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 0
Announcements
• Read(andliveby!)thecoursewikipage:• https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018
• Alsofollow(andliveby)thePiazzapage:• http://piazza.com/uci/winter2018/stats170a/home
• ThefirstHWassignmentisdueinoneweek:• https://grape.ics.uci.edu/wiki/asterix/attachment/wiki/stats170ab-2018/HW1.pdf
• (Wecantakealookatittogetherattheendoflecturetime.)
• Today:CS122AinaNutshell…..
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 1
WhatisaDatabaseSystem?• What’sadatabase?
• Averylarge,integratedcollectionofdata
• Usuallyamodelofareal-world enterprise• Entities (e.g.,students,courses,Facebookusers,…)withattributes(e.g.,name,birthdate,GPA,…)
• Relationships (e.g.,Susanistaking CS234,SusanisafriendofLynn,…)
• What’sadatabasemanagementsystem(DBMS)?• Asoftwaresystemdesignedtostore,manage,andprovideaccesstooneormoredatabases
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 2
FileSystemsvs.DBMS• Applicationprogramsmustsometimesstagelargedatasets betweenmainmemoryandsecondarystorage(forbufferinghugedatasets,gettingpage-orientedaccess,etc.)
• Specialcodeneededfordifferentqueries,andthatcodemustbe(stay)correctandefficient
• Mustprotectdatafrominconsistency duetomultipleconcurrentusers
• Crashrecoveryisimportantsincedataisnowthecurrencyoftheday(corporatejewels)
• Securityandaccesscontrolarealsoimportant(!)Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 3
WhyUseaDBMS?• Dataindependence.• Efficientdataaccess.• Reducedapplicationdevelopmenttime.• Dataintegrityandsecurity.• Uniformdataadministration.• Concurrentaccess,recoveryfromcrashes.
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 4
• Canmake“BigData”muchlessunwieldy.• Makesiteasytoexploredata“declaratively”,includingcombinationsofmultipledatasets
DataModels
• Adatamodel isacollectionofconceptsfordescribingdata
• A schema isadescriptionofaparticularcollectionofdata,usingagivendatamodel
• Therelationalmodelis(still)themostwidelyuseddatamodeltoday
• Relation – basicallyatablewithrowsand(named)columns• Schema– describesthetablesandtheircolumns
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 5
Example:UniversityDB• Conceptualschema:
• Students(sid: string, name: string, login: string, age: integer, gpa: real)
• Courses(cid: string, cname: string, credits: integer) • Enrolled(sid: string, cid: string, grade: string)
• Physicalschema:• Relationsstoredasunorderedfiles• IndexonfirstandthirdcolumnsofStudents
• Externalschema(a.k.a.view):• CourseInfo(cid: string, cname: string, enrollment: integer)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 6
UniversityDBExample(cont.)• Userquery(inSQL,againsttheexternalschema):
• SELECT c.cid, c.enrollmentFROM CourseInfo cWHERE c.cname = ‘Computer Game Design’
• Equivalentquery(inSQL,againsttheconceptualschema):• SELECT e.cid, count(e.*)
FROM Enrolled e, Courses cWHERE e.cid = c.cid AND c.cname = ‘Computer Game Design’GROUP BY c.cid
• Underthehood(againstthephysicalschema)• AccessCourses – useindexoncname tofindassociatedcid• AccessEnrolled – useindexoncid tocounttheenrollments
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 7
ABriefHistoryofDatabases• Pre-relationalera:1960’s,early1970’s• Codd’s seminalrelationalmodelpaper:1970• BasicRDBMSR&D:1970-80• RDBMSimprovements:1980-85• Relationalgoesmainstream:1985-90• ParallelDBMSresearch:1985-95• OLAPandwarehouseresearch:1990-2000• StreamDBandXMLDBresearch:2000-2010• “BigData”R&D(alsoincluding“NoSQL”):2005-present
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 8
OverviewofDatabaseDesign• Conceptualdesign:(ER Modelusedatthisstage.)
• Whataretheentities andrelationships intheenterprise?• Whatinformationabouttheseentitiesandrelationshipsshouldwestoreinthedatabase?
• Whataretheintegrityconstraintsorbusinessrulesthathold?
• AdatabaseschemaintheERModelcanberepresentedpictorially(usinganERdiagram).
• CanmapanERdiagramintoarelationalschema(manuallyorusingadesigntool’sautomation).
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 9
ERModelBasics
• Entity:Real-worldobject,distinguishablefromallotherobjects.Anentityisdescribed(inDB-land)usingasetofattributes.
• EntitySet:Acollectionofsimilarentities.E.g.,allemployees.
• Allentitiesinanentitysethavethesame setofattributes.(UntilwegettoISAhierarchies…)
• Eachentitysethasakey (auniqueidentifier);thiscanbeeitheroneattribute(an“atomic”key)orseveralattributes(calleda“composite”key)
• Eachattributehasadomain (similartoadatatype).
Employees
ssnname
lot
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 10
ERModelBasics(Contd.)
• Relationship:Associationamongtwoormoreentities.E.g.,SantaClausworksintheToydepartment.
• RelationshipSet:Collectionofsimilarrelationships.• Ann-ary relationshipsetRrelatesnentitysetsE1...En;eachrelationshipinRinvolvesentitiese1:E1,...,en:En
• Oneentitysetcanparticipateindifferentrelationshipsets –orindifferent“roles” inthesameset.
• Participationconstraintsindicatewhetheranentityisrequiredtoparticipateinagivenrelationship
Reports_To
lot
name
Employees
subor-dinate
super-visor
ssn
lotdname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 11
CardinalityConstraints• ConsiderWorksIn:Anemployeecanworkinmanydepartments;adeptcanhavemanyemployees.
• Incontrast,eachdepthasatmostonemanager,accordingtothecardinalityconstraint onManages above. Many-to-Many
(M:N)1-to-1(1:1)
1-to Many(1:N)
Many-to-1(N:1)
dnamebudgetdid
since
lot
name
ssn
ManagesEmployees Departments1 N
(Note: A given employee can manage several departments)
ERBasics:DoTryThisatHome(J)
• Let’sseeifyoucanread/interprettheERdiagramabove…!(J)• Whatattributesareunique(i.e.,identifytheirassociatedentityinstances)?• Whataretherulesabout(themuchcoveted)parkingpasses?• Whataretherules(constraints)aboutprofessorsbeingindepartments?• And,whataretherulesaboutprofessorsheadingdepartments?
rankname dname
dno main_office
In DeptProfessor
fac_id
Head
M N
1 NAssigned
Parking Space
pidlot_num
space_num
1
1
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 13
AnswerstotheSelfTest• Uniqueattributes:
• Professor.fac_id,Dept.dno,ParkingSpace.pid
• Facultyparking:• 1space/faculty,onefaculty/space• Somefacultycanbikeorwalk(J)• Someparkingspacesmaybeunused
• Facultyindepartments:• Facultymayhaveappointmentsinmultipledepartments• Departmentscanhavemultiplefacultyinthem• Noemptydepartments,andnounaffiliatedfaculty
• Departmentmanagement:• Oneheadperdepartment(exactly)• Notallfacultyaredepartmentheads
NOTE: These things are all “rules of the universe” that are just being modeled here!
Q: Can a faculty member head a department that he or she isn’t actually in?
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 14
LogicalDBDesign:ERtoRelational
• Entitysetstotables:
CREATE TABLE Employees (ssn CHAR(11),name CHAR(20),lot INTEGER,PRIMARY KEY (ssn))Employees
ssnname
lot
RelationshipSetstoTables• Intranslatingarelationshipsettoarelation,theattributesoftherelationmustinclude:
• Keysforeachparticipatingentityset(asforeignkeys).
• Thissetofattributesformsasuperkey fortherelation.
• Alldescriptiveattributes.
CREATE TABLE Works_In(ssn CHAR(11),did INTEGER,since DATE,PRIMARY KEY (ssn, did),FOREIGN KEY (ssn)
REFERENCES Employees,FOREIGN KEY (did)
REFERENCES Departments)
dname budgetdidsince
lotnamessn
WorksInEmployees Departments
KeyConstraints(Review)
• Eachdept hasatmostonemanager,accordingtothekeyconstraint onManages.
Translation to relational model?
Many-to-Many1-to-1 1-to Many Many-to-1
dname
budgetdid
since
lot
name
ssn
ManagesEmployees Departments1 N
ERTranslationwithKeyConstraints• Maptherelationshiptoatable(Manages):
• Notethatdid(alone)isthekey!
• StillseparatetablesforEmployeesandDepartments.
• But,sinceeachdepartmenthasauniquemanager,wecouldchoosetofoldManagesrightintoDepartments.
CREATE TABLE Manages (ssn CHAR(11),did INTEGER,since DATE,PRIMARY KEY (did),FOREIGN KEY (ssn) REFERENCES Employees,FOREIGN KEY (did) REFERENCES Departments)
CREATE TABLE Departments2 (did INTEGER,dname CHAR(20),budget REAL,mgr_ssn CHAR(11),mgr_since DATE,PRIMARY KEY (did),FOREIGN KEY (mgr_ssn) REFERENCES Employees)
vs.
(Q: Why do that...?)
Note: The relationshipinfo has been pushed tothe N-side’s entity table
MappingAdvancedERFeatures• Multi-valued(vs.single-valued)attributes
Employees
phonename
ssn
Employees
namessn address
snum
street
city
zip
v Composite(vs.atomic)attributes
Employees_phones(ssn, phone)• ssn is an FK in this table• (ssn, phone) is its PK
Employees(ssn, name, address_snum, address_street, address_city, address_zip)
Employees(ssn, name)• ssn is the PK in this table
So,GivenaRelationalSchema...• HowdoIknowifmyrelationalschemaisa“good”logicaldatabasedesignornot?
• Whatmightmakeit“notgood”?• HowcanIfixit,ifindeedit’s“notgood”?• How“good”isit,afterI’vefixedit?
• Notethatyourrelationalschemamighthavecomefromoneofseveralplaces
• YoustartedfromanE-Rmodel(butmaybethatmodelwas“wrong”insomeway?)
• Youwentstraighttorelationalinthefirstplace• It’snotyourschema– youinheritedit!J
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 20
Ex:WisconsinSailingClubProposed schema design #1:
Q: Do you think this is a “good” design? (Why or why not?)
sid sname rating age date bid bname color
22 Dustin 7 45.0 10/10/98 101 Interlake blue22 Dustin 7 45.0 10/10/98 102 Interlake red22 Dustin 7 45.0 10/8/98 103 Clipper green22 Dustin 7 45.0 10/7/98 104 Marine red31 Lubber 8 55.5 11/10/98 102 Interlake red31 Lubber 8 55.5 11/6/98 103 Clipper green31 Lubber 8 55.5 11/12/98 104 Marine red... ... ... ... ... ... ... ...
Ex:WisconsinSailingClubProposed schema design #2:
Q: What about this design?• Is #2 “better than #1...?
Explain!• Is it a “best” design?• How can we go from
design #1 to this one?
sid sname rating age
22 Dustin 7 45.031 Lubber 8 55.5... ... ... ...
sid
bid date
22 101 10/10/9822 102 10/10/9822 103 10/8/9822 104 10/7/9831 102 11/10/9831 103 11/6/9831 104 11/12/98... ... ...
bid bname color
101 Interlake blue102 Interlake red103 Clipper green104 Marine red
NormalForms
All “relations”
1NF
2NF
3NF
BCNF. . .
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 23
FirstNormalForm(1NF)• Rel’n Risin1NF ifallofitsattributesareatomic.
• Noset-valuedattributes!(1NF=“flat”J)• Usuallygoesw/osayingforrelationalmodel(butnotforNoSQLsystems,aswe’llseeattheendofthequarterJ).
• Ex:
bname color
Interlake blue, redClipper greenMarine red
bname color
Interlake blueInterlake redClipper greenMarine red
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 24
OntoSQL...!
• relation-list Alistofrelationnames(possiblywitharange-variable aftereachname).
• target-list Alistofattributesofrelationsinrelation-list• qualification Comparisons(Attr op const orAttr1op Attr2,whereop isoneof<,<=,=,>,>=,<>)combinedusingAND,ORandNOT.
• DISTINCT isanoptionalkeywordindicatingthattheanswershouldnotcontainduplicates.Defaultisthatduplicatesarenot eliminated!(Bags,notsets.)
SELECT [DISTINCT] target-listFROM relation-listWHERE qualification
SQL “SPJ” Query:
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 25
ManySQL-BasedDBMSs• CommercialRDBMSchoicesinclude
• DB2(IBM)• Oracle• SQLServer(Microsoft)• Teradata
• OpensourceRDBMSoptionsinclude• MySQL• PostgreSQL
• Andforso-called“BigData”,wealsohave• ApacheHive(onHadoop)+newerwannabees
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 26
ExampleInstances
• We’llusetheseinstancesofourusualSailorsandReservesrelationsinourexamples.
sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0sid sname rating age28 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0
sid bid day22 101 10/10/9658 103 11/12/96
R1
S1
S2
ConceptualEvaluationStrategy• SemanticsofanSQLquerydefinedintermsofthefollowingconceptualevaluationstrategy:
• Computethecross-productofrelation-list.(✕)• Discardresultingtuplesiftheyfailqualifications.(σ)• Projectoutattributesthatarenotintarget-list.(π)• IfDISTINCT isspecified,eliminateduplicaterows. (δ)
• Thisstrategyisprobablytheleast efficientwaytocomputeaquery!Anoptimizerwillfindmoreefficientstrategiestocomputethesameanswers.
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 28
ExampleofConceptualEvaluationSELECT S.snameFROM Sailors S, Reserves R ß using table S1WHERE S.sid=R.sid AND R.bid=103
(sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Findsailorswhoʼve reservedatleastoneboat
• WouldaddingDISTINCTtothisquerymakeadifference?(Withourdata?Withpossibledata?)
• WhatistheeffectofreplacingS.sid byS.sname intheSELECT clause?WouldaddingDISTINCT tothis variantofthequerymakeadifference?
SELECT S.sidFROM Sailors S, Reserves RWHERE S.sid=R.sid
Sailors(sid,sname,rating,age)Reserves(sid,bid,day)Boats(bid,bname,color)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 30
ExpressionsandStrings
• Illustratesuseofarithmeticexpressionsandstringpatternmatching:FindnamesandagesandafielddefinedbyanexpressionforsailorswhosenamesbeginandendwithBandcontainatleastthreecharacters.
• ASprovidesawayto(re)namefieldsinresult.• LIKE isusedforstringmatching.`_ʼ standsforanyonecharacterand`%ʼ standsfor0ormorearbitrarycharacters.(SeeSQLdocsformoreinfo...)
SELECT S.sname, S.age, (7 * S.age) AS dogyearsFROM Sailors SWHERE S.sname LIKE ‘B_%B’
Sailors(sid,sname,rating,age)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 31
SomeBetterExampleData
SailorsReserves
Boats
Findsidʼsofsailorswhoʼve reservedaredor agreenboat
• IfwereplaceORbyANDinthisfirstversion,whatdoweget?
• UNION:Canbeusedtocomputetheunionofanytwounion-compatible setsoftuples(whicharethemselvestheresultofSQLqueries).
• Alsoavailable:EXCEPT (WhatwouldwegetifwereplacedUNIONbyEXCEPT?)
SELECT DISTINCT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid
AND (B.color=‘red’ OR B.color=‘green’)
(SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’)UNION(SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘green’)
Sailors(sid,sname,rating,age)Reserves(sid,bid,day)Boats(bid,bname,color)
Findsidʼsofsailorswhoʼve reservedaredand agreenboat
• INTERSECT:Canbeusedtocomputetheintersectionoftwounion-compatiblesetsoftuples.
• IncludedintheSQL/92standard,butnot inallsystems(e.g.,MySQL).
• ContrastsymmetryoftheUNION andINTERSECTquerieswithhowmuchtheotherversionsdiffer.
SELECT S.sidFROM Sailors S, Boats B1, Reserves R1,
Boats B2, Reserves R2WHERE S.sid=R1.sid AND R1.bid=B1.bid
AND S.sid=R2.sid AND R2.bid=B2.bidAND (B1.color=‘red’ AND B2.color=‘green’)
Key field!SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘red’INTERSECTSELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid
AND B.color=‘green’
Sailors(sid,sname,rating,age)Reserves(sid,bid,day)Boats(bid,bname,color)
NestedQueries
• AverypowerfulfeatureofSQL:aWHERE clausecanitselfcontainanSQLquery!(Actually,socanFROM andHAVINGclauses!!)
• Tofindsailorswho’venot reserved#103,useNOTIN.• Tounderstandsemantics(includingcardinality)ofnestedqueries,thinknestedloops evaluation:
• ForeachSailorstuple,checkqualificationbycomputingsubquery.
SELECT S.snameFROM Sailors SWHERE S.sid IN (SELECT R.sid
FROM Reserves RWHERE R.bid=103)
Find names of sailors who’ve reserved boat #103:
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 35
NestedQuerieswithCorrelation
• EXISTS isanothersetcomparisonoperator,likeIN.• Illustrateswhy,ingeneral,subquery mustbere-computedforeachSailorstuple(conceptually).NOTE:Recallthattherewasajoinwaytoexpressthisquery,too.Relationalqueryoptimizerswilltrytounnest queriesintojoinswhenpossibletoavoidnestedloopqueryevaluationplans.
SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT *
FROM Reserves RWHERE R.bid=103 AND S.sid=R.sid)
Find names of sailors who’ve reserved boat #103:
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 36
MoreonSet-ComparisonOperators
• Weʼve alreadyseenIN andEXISTS..CanalsouseNOT IN andNOT EXISTS.
• Alsoavailable:op ANY,op ALL (forops: )
• FindsailorswhoseratingisgreaterthanthatofsomesailorcalledHoratio:
> < = ³ £ ¹, , , , ,
SELECT *FROM Sailors SWHERE S.rating > ANY (SELECT S2.rating
FROM Sailors S2WHERE S2.sname=‘Horatio’)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 37
RewritingINTERSECT QueriesUsingIN
• Similarly,EXCEPT queriescanbere-writtenusingNOTIN.• Thisiswhatyou’llneedtodowhenusingsystemslikeMySQLwhosesetoperatorcollectionisincomplete
Find sid’s of sailors who’ve reserved both a red and a green boat:
SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’
AND S.sid IN (SELECT S2.sidFROM Sailors S2, Boats B2, Reserves R2WHERE S2.sid=R2.sid AND R2.bid=B2.bid
AND B2.color=‘green’)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 38
Orderingand/orLimitingQueryResults
Find the ratings, ids, names, and ages of the three best sailors
SELECT S.rating, S.sid, S.sname, S.ageFROM Sailors SORDER BY S.rating DESCLIMIT 3
SELECT [DISTINCT] expressionsFROM tables[WHERE condition]....
[ORDER BY expression [ ASC | DESC ]]LIMIT number_rows [ OFFSET offset_value ];
v Thegeneralsyntaxforthis:
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 39
AggregateOperators
• Significantextensionoftherelationalalgebra.
COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)
SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10
SELECT COUNT (*)FROM Sailors S
SELECT AVG(DISTINCT S.age)FROM Sailors SWHERE S.rating=10
SELECT S.snameFROM Sailors SWHERE S.rating= (SELECT MAX(S2.rating)
FROM Sailors S2)
single column
SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname=‘Bob’
Findnameandageoftheoldestsailor(s)
• Thatfirsttryisillegal!(We’llseewhyshortly,whenwedoGROUPBY.)
SELECT S.sname, MAX (S.age)FROM Sailors S
SELECT S.sname, S.ageFROM Sailors SWHERE S.age =
(SELECT MAX (age)FROM Sailors)
SELECT S.sname, MAX (S.age)FROM Sailors S
MotivationforGrouping
• Sofar,weʼve appliedaggregateoperatorstoall(qualifying)tuples.Sometimes,wewanttoapplythemtoeachofseveralgroups oftuples.
• Consider:Findtheageoftheyoungestsailorforeachratinglevel.
• Ingeneral,wedonʼtknowhowmanyratinglevelsexist,andwhattheratingvaluesfortheselevelsare!
• Supposeweknowthatratingvaluesgofrom1to10;wecanwrite10queriesthatlooklikethis(J):
SELECT MIN (S.age)FROM Sailors SWHERE S.rating = i
For i = 1, 2, ... , 10:
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 42
QueriesWithGROUPBYandHAVING
• The target-listcontains (i)attributenames and(ii)termswithaggregateoperations(e.g.,MIN(S.age)).
• Theattributelist(i) mustbeasubsetofgrouping-list.Intuitively,eachanswertuplecorrespondstoagroup,and theseattributesmusthavea singlevaluepergroup.(Agroup isasetoftuplesthathavethesamevalueforallattributesingrouping-list.)
SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 43
ConceptualEvaluation• Thecross-productofrelation-list iscomputed,tuplesthatfailthequalification arediscarded,`unnecessary’ fieldsaredeleted,andtheremainingtuplesarepartitionedintogroupsbythevalueofattributesingrouping-list.
• Agroup-qualification (HAVING)isthenappliedtoeliminatesomegroups.Expressionsingroup-qualificationmustalsohaveasinglevaluepergroup!
• Ineffect,anattributeingroup-qualification thatisnotanargumentofanaggregateopmustappearingrouping-list.(Note:SQLdoesn’tconsiderprimarykeysemanticshere.)
• Oneanswertupleisgeneratedperqualifyinggroup.
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 44
Findageoftheyoungestsailorwithage18foreachratingwithatleast2such sailors.
rating minage 3 25.5 7 35.0 8 25.5
SELECT S.rating, MIN (S.age) AS minage
FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) >= 2
sid sname rating age 22 dustin 7 45.0 29 brutus 1 33.0 31 lubber 8 55.5 32 andy 8 25.5 58 rusty 10 35.0 64 horatio 7 35.0 71 zorba 10 16.0 74 horatio 9 35.0 85 art 3 25.5 95 bob 3 63.5 96 frodo 3 25.5
Answer relation:
³
Sailors instance:
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 45
Findageoftheyoungestsailorwithage18foreachratingwithatleast2such sailors.
rating minage 3 25.5 7 35.0 8 25.5
rating age 7 45.0 1 33.0 8 55.5 8 25.5 10 35.0 7 35.0 10 16.0 9 35.0 3 25.5 3 63.5 3 25.5
³
rating age 1 33.0 3 25.5 3 63.5 3 25.5 7 45.0 7 35.0 8 55.5 8 25.5 9 35.0 10 35.0
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 46
Foreachredboat,findthenumberofreservationsforthisboat
• Notice:We’regroupingoverajoinofthreerelations
SELECT B.bid, COUNT(*) AS scountFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND B.color=‘red’GROUP BY B.bid
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 47
Findageoftheyoungestsailorwithage>18foreachratingwithatleast2sailors(ofany age)
• Notes:AHAVING clausecanalsocontainasubquery.
SELECT S.rating, MIN(S.age)FROM Sailors SWHERE S.age > 18GROUP BY S.ratingHAVING 1 < (SELECT COUNT(*)
FROM Sailors S2WHERE S.rating = S2.rating)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 48
FindthoseratingsforwhichtheaverageageistheminimumageoverallSailors
SELECT Temp.rating, Temp.avgageFROM (SELECT S.rating, AVG(S.age) AS avgage
FROM Sailors SGROUP BY S.rating) AS Temp
WHERE Temp.avgage = (SELECT MIN(age) FROM Sailors)
v Correct solution (in SQL/92): Compute theaverage age foreach rating...
Find the overallminimum age
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 49
NullValues• Fieldvaluesinatuplearesometimesunknown(e.g.,aratinghasnotbeenassigned)orinapplicable(e.g.,nospouseʼsname).
• SQLprovides specialvaluenull forsuchsituations.
• Thepresenceofnull complicatesmanyissues.E.g.:• Specialoperatorsneededtocheckifvalueis/isnotnull.• Israting>8 trueorfalsewhenrating isequaltonull?WhataboutAND,ORandNOT connectives?
• Weneeda3-valuedlogic (true,falseandunknown).• Meaningofconstructsmustbedefinedcarefully.(TheWHEREclauseeliminatesrowsthatdonʼtevaluatetotrue.)
• Newoperators(inparticular,outerjoins)possible/needed.
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 50
ExampleDatawithNullValues
SailorsReserves
Boats
NullsandSQL’s3-ValuedLogicAND true false unknown
true true false unknown
false false false false
unknown unknown false unknown
OR true false unknown
true true true true
false true false unknown
unknown true unknown unknown
NOTtrue falsefalse true
unknown unknown
Note: SQL arithmetic expressions involving nullvalues will yield null values (Ex: EMP.sal + EMP.bonus)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 52
BasicSQLQueriesw/NullsSELECT *FROM Sailors SWHERE age > 35.0
SELECT *FROM Sailors SWHERE age <= 35.0
SELECT COUNT(*)FROM Sailors SWHERE age > 35.0
OR age <= 35.0
SELECT COUNT(*)FROM Sailors SWHERE age > 35.0
OR age <= 35.0OR age IS NULL
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 53
Nullsw/AggregatesSELECT COUNT(rating)FROM Sailors
SELECTCOUNT (DISTINCT rating)FROM Sailors
SELECT SUM(rating),COUNT(rating),AVG(rating)
FROM Sailors
(11)
(7)
(70, 11, 6.3636)
(Useful, but logically “wrong”!)Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 54
Nullsw/Aggregates&Grouping
SELECT bid, COUNT(*)FROM ReservesGROUP BY bid
SELECT COUNT( DISTINCT bid)FROM Reserves (4)
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 55
Nullsw/Joinsà Innervs.OuterJoinsSome “dangling” tuple examples
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 56
Inner vs.OuterJoinsinSQL
SELECT DISTINCT s.sname, r.dateFROM Sailors s, Reserves rWHERE s.sid = r.sid
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 57
Inner vs.OuterJoinsinSQL(2)
SELECT DISTINCT s.sname, r.dateFROM Sailors s INNER JOIN Reserves r ON s.sid = r.sid
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 58
Innervs.Outer JoinsinSQL(3)(1) SELECT DISTINCT s.sname, r.dateFROM Sailors s LEFT OUTER JOIN Reserves r ON s.sid = r.sid
v Variationsonatheme:§ JOIN(orINNERJOIN)§ LEFTOUTERJOIN§ RIGHTOUTERJOIN§ FULLOUTERJOIN(VariesfromRDBMStoRDBMS)(Seesystem’sdocumentationforjoinsyntax)
(2) SELECT DISTINCT s.sname, r.dateFROM Reserves r RIGHT OUTER JOIN Sailors s ON s.sid = r.sid
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 59
NowLet’sPeekatHW#1
• https://grape.ics.uci.edu/wiki/asterix/attachment/wiki/stats170ab-2018/HW1.pdf
Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 60