9
CS 133: Databases Fall 2019 Lec 01 – 09/03 Introduction & Relational Model Prof. Beth Trushkowsky http://www.puntogeek.com/wp-content/uploads/2012/12/21168.strip_.gif Data! Data is everywhere Banking, airline reservations Social media, clicking anything on the internet “facebook friend wheel”, https://www.flickr.com/photos/antjeverena/ http://www.bigdataanalyticstoday.com/category/infographics/ Need systems to manage the data Goals for Today What is a database anyway? Important DBMS features and challenges! Course logistics Relational data model Why it’s great What it looks like (intro to SQL) So, what is a database? From the textbook: Database: a collection of data, typically describing the activities of one or more related organizations Database system, DataBase Management System (DBMS): software designed to assist in maintaining and utilizing large collections of data

CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

CS133:Databases

Fall2019

Lec01–09/03

Introduction&RelationalModel

Prof.BethTrushkowsky

http://www.puntogeek.com/wp-content/uploads/2012/12/21168.strip_.gif

Data!

•  Dataiseverywhere–  Banking,airlinereservations–  Socialmedia,clickinganythingontheinternet

“facebookfriendwheel”,https://www.flickr.com/photos/antjeverena/

http://www.bigdataanalyticstoday.com/category/infographics/

Needsystemsto

managethedata

GoalsforToday

•  Whatisadatabaseanyway?

•  ImportantDBMSfeatures

–  andchallenges!

•  Courselogistics

•  Relationaldatamodel

– Whyit’sgreat

– Whatitlookslike(introtoSQL)

So,whatisadatabase?

Fromthetextbook:

•  Database:acollectionofdata,typicallydescribingtheactivitiesofoneormorerelated

organizations

•  Databasesystem,DataBaseManagementSystem

(DBMS):softwaredesignedtoassistin

maintainingandutilizinglargecollectionsofdata

Page 2: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

DBMSdesiderata

•  Askquestions(queries)aboutdata•  Addandupdatedata•  Persistthedata(keepitaround)

•  E.g.,bankingapplication– Query:WhatisAlice’sbalance?

– Update:Alicedeposits$100– Persist:Alicehopeshermoneyisstillthereaftera

poweroutage…

Soundseasy!

•  Storedataintextfiles– Accountsseparatedbynewlines– Fieldsseparatedbycommas

•  Query:whatisAlice’sbalance?

Account,Branch,Name,Balance

45,Claremont,Alice,200

67,Claremont,Bob,100000

78,Pasadena,Carl,987654

.

.

.

Abstractingdatamanagement

•  Cancomeupwithtrickstooptimizea

particularquery/application

– Endupredoingthisworkfornewapps

RelationalDBMStotherescue

PhysicalIndependence

•  Applicationsneednotknowhowdataisphysicallystructuredandstored

•  Instead,havelogicaldatamodel•  LeavetheimplementationdetailsandoptimizationtoDBMS

EdgarF.Codd

Turingaward,1981

[Thereshouldbe]aclear

boundarybetweenthelogicalandphysicalaspectsofdatabasemanagement

1

1

http://en.wikiquote.org/wiki/E._F._Codd

Page 3: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

RelationalDBMStotherescue

•  Relationaldatamodel:dataisstoredinrelations

•  Example:Bankinginfoaccount branch name balance

45 Claremont Alice 200

67 Claremont Bob 100000

78 Pasadena Carl 987654

•  Adeclarativequerylanguage–  Specifywhatanswersaqueryshouldreturn,butnothowthequeryisexecuted

–  E.g.,SQL,Datalog(subsetofProlog)

Query:whatisAlice’sbalance?

SELECT balance FROM Banking WHERE name = “Alice”;

RelationalModel:

LevelsofAbstraction

•  Conceptual/Logicalschema

Students(sid:string,name:string,login:string,gpa:real)

Courses(cid:string,cname:string,credits:integer)

Enrolled(sid:string,cid:string,grade:string)

•  Physicalschema

–  Storetherelationsasunsortedfiles–  CreateindexesonStudents.sidandCourses.cid

•  Externalschema(“views”)

–  vieweachcourse’senrollment

CourseInfo(cid:string,enrollmnt:integer)

Entitiesand

relationships!

CREATE VIEW CourseInfo AS SELECT cid, COUNT (*) as enrollmnt FROM Enrolled GROUP BY cid;

DBAdesigns

these!

Allowcustomized

dataaccess

Describesdatain

termsofdatamodel

Specifies

storagedetails

DataIndependence

•  Logicaldataindependence

–  Protectedfromchanges

inconceptualschema

•  Physicaldataindependence

–  Protectedfromchanges

inphysicalschemaPhysicalschema

Conceptualschema

View1 View2 Viewn

ModernDBMSFeatures

•  Logicaldatamodel

– Wefocusonrelationalinthiscourse

•  Maytouchonothers,e.g.,XML,Document

– Dataindependence!

•  Declarativelanguage– Queries– Updates

•  PersistenceButwait,there’smore…

Page 4: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

ConcurrentAccess

•  Bankingexample:ATMwithdrawalpseudocodegetbalance;

ifbalance>amount

withdrawamount;

newBalance=balance-amount;

writebalance=newBalance;

•  AliceandBobshareanaccount.–  AlicegoestooneATM,withdraws$100

–  BobgoestoanotherATM,withdraws$50

•  Initialbalance=$400•  Finalbalance?

ExamplefromJunYang

ConcurrentAccess

Alicewithdraws$100

getbalance;

ifbalance>amount

withdrawamount;

newBalance=balance-amount;

writebalance=newBalance;

Bobwithdraws$50

getbalance;

ifbalance>amount

withdrawamount;

newBalance=balance-amount;

writebalance=newBalance;

$400

$400

$350

Finalbalance=$300!!

$300Whatcan

wedo?

ExamplefromJunYang

SystemFailures

•  Bankingexample:balancetransferdecrementaccountXby$100

incrementaccountYby$100

•  Whatifpowergoesoutafterfirstinstruction?

•  DBMSbuffersandupdatessomedatainmemory

beforewritingtodisk

– whatifpowergoesoutbeforewritetodisk?

•  Keepalogofupdates,undo/redouponrecovery

ExamplefromJunYang

ModernDBMSFeatures(cntd)

•  Logicaldatamodel

•  Declarativelanguage•  Persistence

•  Concurrentaccess•  Faulttolerance•  Performance!

– Lotsofqueries– Lotsofdata

Page 5: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

SimplifiedRDBMSArchitecture

Datarecords

Diskmanagement

Buffermanagement

Accessmethods

Application

Queryoptimizer

Queryexecutor

Concernedwith

concurrency

controland

recovery

Declarative

Query

Queryresults

CourseOverview

•  DesignprinciplesbehindDBMS!

•  “Bottom-up”orderoftopicstoshowroleof

abstractionandalgorithmsforefficiency/optimization–  Physicaldataorganization–  RelationalalgebraandSQL– Queryevaluationandoptimization

–  Transactions,concurrencycontrol,recovery– Databasedesign

CourseObjectives

•  Provideasolidbackgroundindatabasemanagement

systemdesignprinciples

•  Promoteunderstandingoftheseprinciplesthroughhands-

onexercisesimplementingtheinternalsofarelationaldatabasemanagementsystem

•  Furtherdevelopstudents'abilitytoreasonaboutalgorithm

andsoftwaredesign,optimization,andtradeoffsgenerallyapplicableincomputerscience

Labs:SimpleDB

•  Labs1-4:Implementkeyfeaturesofa(simplified)

DBMSinJava

–  Files,Storage–  RelationalOperators– QueryOptimizer

–  LockingwithTransactions

•  Lab5:databasedesign

Lab1:Gettingstarted“due”nextWednesday

Page 6: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

GradeComponents(seesyllabus)

•  Weeklyproblemsets 14% 70pts

•  (5)Labs 40% 200pts

•  Midterm 20% 100pts

•  Final 20% 100pts

•  Participation 6% 30pts

Administrivia

•  Coursewebsite:

https://www.cs.hmc.edu/~beth/courses/cs133/current

–  Syllabus,calendar,labdescriptions

•  Textbook:DatabaseManagementSystems3rdEdition,byRamakrishnanandGehrke

•  Piazzaforquestionsaboutlabs,problemsets,etc.:

piazza.com/hmc/fall2019/cs133/home

•  Assignmentsubmission

–  ProblemsetsàSakai

–  LabassignmentsàGradescope

•  Grutors

–  IvyLiu

TheRelationalModel

•  ManyRDBMSvendors,includingopen-source

– Oracle– MySQL

–  PostgreSQL–  SQLite– DB2–  SQLServer– …

•  We’lltouchonotherdatamodelsaswell

KeyConcepts:RelationalModel

•  Database:collectionofrelations

•  Relation:listofattributes

•  Relationshavesetsoftuples

•  Schema(metadata)

–  Specificationofhowdataistobestructuredlogically

–  Containsattributetypes–  Definedatset-up

CID Name Dept

121 SoftwareDev CS

70 DataStructures CS

Courses

StudentsSID name login gpa

45Alice alicious 3.4

67Bob bobtastic 3.9

Page 7: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

RelationalModel:Synonyms

Moreformal ……… Lessformal

Relation TableTuple Row RecordAttribute Column FieldDomain Type

StructuredQueryLanguage(SQL)

•  Datadefinitionlanguage(DDL)–  Definetheschema(create,change,deleterelations)

–  Specifyconstraints,userpermissions

•  Datamodificationlanguage(DML)

–  Finddatathatmatchescriteria

–  Add,remove,updatedata

–  TheDBMSisresponsibleforefficientevaluation!

•  Co-inventedbyDonChamberlin(HMC‘66)!

Photo:http://researcher.watson.ibm.com/

researcher/view.php?person=us-dchamber

ARelationInstance

•  Aninstanceofarelationisitscontentsatagiventime

– cardinality:#tuples– arity:#attributes

StudentsSID Name Login SSN GPA

45 Alice alicious 000-00-0000 3.4

67 Bob bobtastic 000-00-0001 3.9

78 Carl carl 000-00-0010 2.5

SQL:CreatingRelations

•  CreateStudentsrelation:

•  DomaininfoistypeofIntegrityconstraint(IC)

–  IC:aconditiononthedatabaseschema,restrictsdata

thatcanbestored

CREATE TABLE Students ( sid CHAR(20),

name CHAR(20), login CHAR(100), SSN CHAR(12), gpa FLOAT);

CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(2));

CreateEnrolledrelation:

Page 8: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

AddingandRemovingTuples

•  Insertasingletuple

INSERT INTO Students (sid, name, login, SSN, gpa) VALUES (45, ‘Alice’, ‘alicious’, ‘000-00-0000, 3.4);

•  Deletetuplesthatsatisfycondition(predicate)

DELETE FROM Students S WHERE S.name = ‘Alice’;

IntegrityConstraints:Keys

•  Superkeyisasetoffield(s)that– Uniquelyidentifiesatuple–  Candidatekey:doessominimally

–  Primarykey:achosencandidatekey

Students

SID Name Login SSN GPA

45Alice alicious 000-00-0000 3.4

67Bob bobtastic 000-00-0001 3.9

78Carl carl 000-00-0010 2.5

Primarykey

IntegrityConstraints:ForeignKeys

•  Referentialintegrity,logical“pointer”– Setoffieldsinonerelationrefertoprimarykeyof

another

SID CID Grade

45CS133 A

45CS121 B

78CS5 A

Primarykey Foreignkey

SID Name Login SSN GPA

45Alice alicious 000-00-0000 3.4

67Bob bobtastic 000-00-0001 3.9

78Carl carl 000-00-0010 2.5

Students Enrolled

INSERT INTO Enrolled(sid,cid,grade) VALUES (43,CS133,D); ?

DefiningKeyConstraints

•  Specifiedinschemadefinition

CREATE TABLE Enrolled (

sid CHAR(20), cid CHAR(20), grade CHAR(2),

PRIMARY KEY (sid,cid),

FOREIGN KEY (sid) REFERENCES Students

);

CREATE TABLE Students ( sid CHAR(20),

name CHAR(20),

login CHAR(10),

SSN CHAR(20),

gpa FLOAT,

PRIMARY KEY(sid),

UNIQUE (SSN) );

Page 9: CS 133: Databases Data!beth/courses/cs133/current/... · • Provide a solid background in database management system design principles • Promote understanding of these principles

•  Possiblymanycandidatekeys(specifiedusingUNIQUE),oneof

whichischosenastheprimarykey.

•  Keysmustbeusedcarefully!

•  Example:

“Foragivenstudentandcourse,thereisasinglegrade.”

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid))

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))

vs.

PrimaryandCandidateKeysinSQL

“Studentscantakeonlyonecourse,andnotwostudentsinacoursereceivethesamegrade.”

SQL:SingleRelationQueries

SELECT name FROM StudentsWHERE gpa > 3.7;

name

Bob

SELECT * FROM Students SWHERE S.gpa > 3.7;

sid name login gpa

67Bob bobtastic 3.9

SID name login gpa45 Alice alicious 3.4

67 Bob bobtastic 3.9

78 Carl carl 2.5

Students

QueryExecution:Teaser

Queryoptimizertransformsadeclarativequery

intoapipelineofdataflowoperatorscalledaqueryexecutionplanSELECT name FROM StudentsWHERE gpa > 3.7;

Students

Filter

(gpa>3.7)

Project

(name)

Iterators!!

SQLiteDemo

Alsosee: “Resources”oncoursewebsiteandwww.sqlite.org

Thedatabaseisinthefileyouspecify.

Thefileiscreatedifitdoesn’texist.

SQLstatementsendwithasemicolon.

Capitalizationlooksnice,butnotrequired.

Thesetwosettingsformodeandheadermakequeryresultseasiertoread.