Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
1
OCL2 Oracle 10g:SQL & PL/SQLSession #1
Matthew P. Johnson
CISDD, CUNY
January, 2005
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
2
Personnel Instructor: Matthew P. Johnson
mattpj-at-gmail.com
TA: Ratna Priya Moganti rmoganti7-at-yahoo.com
Admin: Dawn Kleinberger dkleinberger-at-gc.cuny.edu
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
3
Communications Web page:
http://pages.stern.nyu.edu/~mjohnson/oracle/
syllabus
course policies
reading assignments
etc.
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
4
Acknowledgements Thanks to Ramesh at NYU, Ullman, et al.,
Raghu and Johannes, Dan Suciu, Arthur Keller, David Kuijt for course materials
See classpage for other related, antecedent DBMS courses
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
5
What is a Database? A very large, integrated collection of data. Models real-world enterprise.
Entities students, courses, instructors, TAs
Relationships George is currently taking OCL2 Dick is currently teaching OCL2 Condi is currently TA-ing OCL2 but took it last semester
Database Management System (DBMS): large software package designed to store and manage databases
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
6
Databases are everywhere: Ordering a pizza Databases involved?1. Pizza Hut’s DB
stores previous orders by customer stores previous credit cards used
2. Credit card records huge databases of (attempted) purchases location, date, amount, parties
3. Got approved by credit-report companies4. phone company’s records
Local Usage Details (“Pull his LUDs, Lenny.”)
5. Caller ID ensures reported address matches destination
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
7
Your wallet is full of DB records Driver’s license Credit cards Medical insurance card Social security card Gym membership Individual checks Dollar bills (w/serial numbers) Maybe even photos (ids on back)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
8
Databases are everywhere Q: Websites backed by DBMSs?
retail: Amazon, etc. data-mining: “Page You Made”
search engines: Google, etc. directories: Internic, etc. searchable DBs: IMDB, tvguide.com, etc.
Q: Non-web examples of DBMSs? airline bookings criminal/terrorist: TIA NYPD’s CompStat
all serious crime stats by precinct Retailers: Wal-Mart, etc.
when to re-order, purchase patterns, data-mining Genomics!
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
9
Example of a Traditional DB App Suppose we are building a system to store
the information about: checking accounts savings accounts account holders state of each of each person’s accounts
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
10
Can we do it without a DBMS?Sure we can! Start by storing the data in files:
checking.txt savings.txt customers.txt
Now write C or Java programs to implement specific tasks
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
11
Doing it without a DBMS... Transfer $100 from George’s savings acc to
checking:
Read ‘savings.txt’Find & update the record “George”
balance -= 100
Write ‘savings.txt’Read ‘checking.txt’Find & update the record “George”
balance += 100
Write ‘checking.txt’
Read ‘savings.txt’Find & update the record “George”
balance -= 100
Write ‘savings.txt’Read ‘checking.txt’Find & update the record “George”
balance += 100
Write ‘checking.txt’
Write a C program to do the following:
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
12
Problems without a DBMS...1. System crashes:
Q: What is the problem? A: George lost his $100 Same problem even if reordered
2. Simultaneous access by many users George and Dick visit ATMs at same Lock checking.txt before each use– what is the problem?
Read ‘savings.txt’Find&update the rec “George.”Write ‘savings.txt’Read ‘checking.txt’Find&update the rec “George”Write ‘checking.txt’
Read ‘savings.txt’Find&update the rec “George.”Write ‘savings.txt’Read ‘checking.txt’Find&update the rec “George”Write ‘checking.txt’ CRASH !
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
13
Problems without an DBMS...
3.Large data sets (say 50GB) Why is this a problem?
No indices Finding “George” in huge flatfile is expensive
Modifications intractable without better data structures “George” “Georgie” is very expensive Deletions are very expensive
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
14
Problems without a DBMS...
5.Security? File system may be insecure File system security may be coarse
6.Application programming interface (API)? suppose need other apps to access DB
7.How to interact with other DBMSs?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
15
General problems to solve In building our own system, many Qs arise:
how do we store the data? (file organization, etc.) how do we query the data? (write programs…) make sure that updates don’t mess things up?
leave the DB “consistent” provide different views on the data?
e.g., ATM user’s view v. bank teller’s view how do we deal with crashes?
Too hard! Go buy Oracle! Q: How does a DBMS solve these problems? A: Long story; see other courses/books
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
16
Big issue: Transaction processing Grouping of several queries (or other
database actions) into one transaction ACID properties
Atomicity all or nothing
Consistency constraints on relationships
Isolation concurrency control Simulated solipsism
Durability Crash recovery
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
17
Atomicity & Durability Saw how George lost $100 with makeshift DBMS Real DBMSs prevent this outcome
xacts are all or nothing
One idea: Keep a log (history) of all actions in set of xacts Durability: Use log to redo or undo certain ops in
crash recovery Atomicity: don’t really commit changes until end
Then, all at once
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
18
Isolation Concurrent execution is essential for performance.
Frequent, slow disk accesses don’t waste CPU – keep running
Interleaving actions of different user programs can lead to inconsistency:
e.g., two programs simultaneously withdraw from the same account
DBMS ensures such problems don’t arise: users can pretend they are using a single-user system
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
19
Isolation Contrast with a file in two Notepads
Strategy: ignore multiple users whichever saves last wins first save is overwritten
Contrast with a file in two Words Strategy: blunt isolation One can edit To the other it’s read-only
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
20
Consistency Each xant (on a consistent DB) must leave it
in a consistent state can define integrity constraints checks the defined claims about the data remain
true
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
21
Data Models Every DBMS uses some data model:
collection of concepts for describing data Schema: description of partic set of data,
using some data model Relational data model: most widely used (by
far) data model Oracle, DB2, SQLServer, other SQL DBMSs main concept: relation ~ table of rows & columns a rel’s schema defines its fields
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
22
Example: university database Conceptual schema:
Students(ssn: string, name: string, login: string, age: int, gpa: real)
Courses(cid: string, cname: string, credits: int) Enrolled(sid:string, cid:string, grade: string)
Physical schema: Relations stored as unordered text files. Indices on first column of each rel
External Schema (View): Course_info(ssn: string, name: string) My_courses(cname: string, grade: string)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
23
How the programmer sees the DBMS Start with DDL to create tables:
Continue with DML to populate tables:
CREATE TABLE Students (Name CHAR(30)SSN CHAR(9) PRIMARY KEY NOT NULL,Category CHAR(20)
) . . .
CREATE TABLE Students (Name CHAR(30)SSN CHAR(9) PRIMARY KEY NOT NULL,Category CHAR(20)
) . . .
INSERT INTO StudentsVALUES(‘Howard, ‘123456789’, ‘undergraduate’). . . .
INSERT INTO StudentsVALUES(‘Howard, ‘123456789’, ‘undergraduate’). . . .
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
24
How the programmer sees the DBMS Tables:
Still implemented as files, but behind the scenes can be quite complex
SSN Name Category 123-45-6789 Howard undergrad 234-56-7890 Wesley grad … …
Students:
CID CName C20.0046 Databases C20.0056 Advanced Software
Courses:
“data independence” = separate logical view from physical implementation
SSN CID semester 123-45-6789 C20.0046 Spring,
2004 123-45-6789 C20.0056 Spring,
2004 234-56-7890 C20.0046 Fall, 2003 …
Takes:
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
25
Querying: Structured Query Language Find all the students who have taken OCL2:
SELECT SSNFROM TakesWHERE CID=“OCL2”
Find all the students who OCL2 last fall: SELECT SSN
FROM TakesWHERE CID=“OCL2” AND Semester=“Fall, 2003”
Find the students’ names: SELECT Name
FROM Students, TakesWHERE Students.SSN=Takes.SSN AND CID=“OCL2” AND Semester=“Fall, 2003”
Query processor does this efficiently
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
26
Database Industry Relational databases are a great success of
theoretical ideas based on most “theoretical” type of math there is: set theory
DBMS companies are among the largest software companies in the world
Oracle, IBM (with DB2), Microsoft (SQL Server, Microsoft Access), Sybase
Also opensource: MySQL, PostgreSQL, etc. $20B+ industry XML (“semi-structured data”) also important
New lingua franca for exchanging data
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
27
The Study of DBMS Several aspects:
Modeling and design of databases DBMS programming: querying and update DBMS implementation
This course covers all three though more time on first two
Also will look at some more advanced areas XML, data-mining, LDAP?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
28
Databases are used by DB app programmers
desktop app programmers web developers
Database administrators (DBAs) design schemas security/authorization crash recovery tuning
Everyone else (perhaps indirectly) “You may not be interested in databases, but
databases are interested in you.” - Trotsky
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
29
Course outline Two biggest topics:
SQL PL/SQL
But also:
Database design: Entity/Relationship models Modeling constraints
The relational model: Relational algebra Transforming E/R models to relational schemas
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
30
Outline (Continued) SQL views and triggers Connecting to a Oracle from programming
languages Web apps XML
May change as course progresses partly in response to audience
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
31
Textbook Oracle Database 10g PL/SQL 101
by Christopher Allen Hardcover: 416 pages Publisher: McGraw-Hill/
Osborne Media ISBN: 0072255404 1 edition (August 10, 2004) Distributed in class
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
32
SQL Readings Optional reference: SQL in a Nutshell
Online (free) SQL tutorials include:
SQL for Web Nerds (http://philip.greenspun.com/sql/)
A Gentle Introduction to SQL (http://sqlzoo.net/)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
33
On-going Feedback Don’t wait until the class is over to give
feedback on improving it too late for you then!
Send mail if you have questions or concerns
“We’re in touch, so you be in touch.”
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
34
Summary DBMS used to maintain, query large datasets
Benefits include recovery from system crashes, concurrent access, data integrity, security, and quick application development
Database skills are critical in financial services, marketing and other business areas!
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
35
So what is this course about, really?A bit of everything! Languages: SQL, XPath, XQuery Data modeling Some theory!
Functional dependencies, normal forms e.g., how to find most efficient schema for data
Lots of implementation and hacking in end-of-course project
Business DBMS examples/cases Most importantly: how to meet real-world needs
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
36
For right now: written survey Email to mattpj-at-gmail.com: name email previous cs/is/math/logic courses/background previous programming experience
Perl? PHP? HTML?
Job: programmer, DBA, etc. why taking class
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
37
Agenda Last time: intro, RDBMS, ACID test This time: E/R model1. Identify entity sets, relations and attributes2. One-one, one-many, many-many relations3. Simple ER diagrams to model a situation4. 3-way relationships; Converting to binary5. Entities with multiple roles6. Subclasses Design issues1. Principles of faithfulness & simplicity in ER diagrams2. Redundancy3. Whether an element should be an attribute or entity set4. Replacing a relationships with entity sets
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
38
DB development path
the
WorldE/R
design
Relational
schema
Relational
DB
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
39
Entity Relationship (E/R) Model A popular data model – useful to database
designers Graphical representation of miniworld Helps design the database, not implement it E/R design is translated to a relational design
relational design then implemented in an RDBMS Elements of model
Entities Entity Sets Attributes Relationships (!= relations!)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
40
Elements of E/R Model: Entity Sets Entity: like an object
e.g. President Bush Particular instance of a concept
Entity set: set of one sort of entities or a concept e.g. World leaders Generally, same set of attributes
Represented by a rectangle A “good” entity set – you decide
Common properties Correspond to class of phys. or bus. objects
(People, products, accounts, grades, etc.)
World Leader
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
41
Elements of E/R Model: Attributes Properties of entities in entity set
Like fields in a struct Like columns in a table/spreadsheet Like data members in an object
Values in some domain (e.g., ints, strings) Represented by ovals: Assumed atomic
But could have limited structure Ints, strings, etc.
ID Name
Student
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
42
Elements of E/R Model: Relationships Connect two or more entity sets
e.g. students enroll in courses Binary relationships: connect two entity sets –
most common Multiway relationships: connect several entity
sets Represented by diamonds
Students Enroll Courses
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
43
Elms of E/R Model: Rel’ships (cont’d) Students Enroll in courses Courses are Held in rooms The E/R data model:
Students Enroll Courses
Held
Rooms
NameID
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
44
A little set theory A mathematical set a collection of members A set is defined by its members
“Are you in or are you out?” No other structure, no order, no duplicates allowed
Sets specified by listing: {1, 2, 3, …} = N {1, 2, George Bush} (not useful in DBMS)
Or by “set-builder” notation: { x in N: 2 divides x} = ? { x in Presidents | reelected(x)} = ? {2x: x in N} = ?
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
45
A little set theory One set can be a subset of another (which is then a
superset of it) ReelectedPresidents is a subset of Presidents Also, RP is a proper subset of Pres – some lost reelection
Given two sets X and Y, the cross product or Cartesian product isX x Y = {(x,y): x in X, y in Y}= the set of all ordered pairs in which the first comes from X and the second comes from Y
Important: (x,y) != {x,y} In an order pair or tuple
Order matters Duplicates are allowed
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
46
A little set theory Mathematically, a relation(ship) between X and Y is
just a subset of X x Y= all those pairs (x,y) s.t. x is related to y
Example: owner-of O on People, Cats O(MPJ, Gödel) holds
The equals relation E on N, N: E(3,3) holds because 3 = 3 E(3,4) does not hold E is still a set: E = {(1,1), (2,2), (3,3), …}
Father of relation F on People, People: F(GHWB, GWB) holds F(GWB, GHWB) does not hold Relations aren’t necessarily symmetric
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
47
Many-many
Multiplicity of Relationships
Many-one One-one
Representation of relationships No arrow: many-to-many Sharp arrow: many-to-one Rounded arrow: “exactly one”
“key constraint” One-one:
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
48
Multiplicity of Relationships
Students Enrolls Courses
Many-to-many:
Student Live Residence hall
Many to one: a student lives in <= 1 residence hall
Many to exactly one: a student must live in a residence hall
Student Live Residence hall
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
49
Multiplicity, set-theoretically Assume: no vars below are equal Many-one means:
if (x1,y1) in R then (x1,y2) cannot be in R One-many means:
if (x1, y1) in R then (x2,y1) cannot be in R One-one means:
if (x1,y1) in R, then neither (x1,y2) nor (x2,y1) can be in R
Notice: one-one is stronger than many-one One-one implies both many-one and one-
many
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
50
E/R Diagram
Students CoursesEnrolls
ID
Name
ID
Name
Assisting
TAID
Name
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
51
E/R Diagrams Works if each TA is a TA of all students.
Connection student-TA is only through the course
But what if students were divided into sections, each section with a separate TA? Then, a student in OCL2 would be related to only
one of the TAs for OCL2. Which one?
3-way relationship is helpful here
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
52
Multiway Relationships
Students
Courses
TAs
Enrolls
Students Courses TAsAlberto OCL2 DonaldRichard OCL2 PaulBernard OCL2 Colin… … …
Enrolls entries:
NB: Enrolls determines TA:
(student, course) at most one TA
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
53
Converting multiway relships to binary Some models limit relationships to binary Multiway relationship – equivalent collection of binary,
many to one relationships Replace relationship with connecting entity set
Students
Courses
TAs
EnrollsStudent-of
Course-of
TA- ofNB: Enrolls has no attributes!
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
54
Second multiway e.g.: renting movies Scenario: a Customer Rents a Movie from a
VideoStore on a certain date
Q: Which entity does date belong to? A: To the fact of the renting Relationships can have attributes
always (implicitly) many-one
Rental
VideoStore
Customer
Movie
date
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
55
Second multiway e.g.: renting movies But they don’t have to Relationship attributes can be replaced with
(trivial) new entities
Rental
VideoStore
Customer
Movie
date
Date
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
56
Where can we draw arrows?
(store, video, customer) date ? Date is a relship att, implicitly determined
(store, video, date) customer ? (store, date, customer) video ? (video, date, customer) store ?
Second multiway e.g.: renting movies
Rental
VideoStore
Customer
Movie
date
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
57
Arrow-drawing Q: Why does it matter? Round arrow benefit:
Obvious: One item takes less space than many Less obvious: easier to access one item x than set of
one item {x} In programming: an int v. a linked list with just one int
Regular arrow benefit: Mapping to a set of either one elm or none seems bad But not implemented this way Always one element, but that value may be NULL
Lesson: it pays to identify your relship’s multiplicity
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
58
Second multiway e.g.: renting movies
Convert to binary?Rental
VideoStore
Customer
Movie
date
Rental
Customer
Store
Movie
StoreOf
MovieOf
BuyerOf
date
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
59
Roles in relationships Entity set appears more than once in a relship
Generally distinct entities Each appearance is in a different role Edges labeled by roles
Pre-req
Prereq
Successor
Course
Course (Pre-req)
Course (Successor)
Accounting Finance-I
Finance-I Derivatives
Finance-I Finance-II
Calculus Derivatives
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
60
Subclasses in the E/R model Some entities are special cases of other Conversely: some are generalizations
Humans are specialized mammals Grad students are specialized students
And, in turn, specialized mammals NB: These aren’t examples but subclasses
Subclass A isa B Represented by a triangle Always one-to-one, though arrows omitted Root is more general Multiple inheritance is allowed! A single entity may consist of all components (sets of
fields) in arbitrary ESs and their ancestors
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
61
Subclasses
Movies
Cartoons Murder-Mysteries
isa isaVoices
Weapon
stars
length title year
Lion King
Component
“Lion King”: atts of Movies; relship Voices
“Roger Rabbit”: atts of Movies; relship Voices; att weapon
Roger Rabbit
TX Chainsaw Massacre
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
62
E/R inheritance v. OO inheritance In a OOP class hierarchy, children also inherit
“attributes” from parents But an object is an instance of one class
In E/R, an entity may be composed of components from multiple, not-directly-related ESs Roger Rabbit is composed of components from Cartoons,
Murder Mysteries, and Movies We could create a Cartoon Murder Mysteries ES if there
were any atts specific to them
So the real difference: In E/R, can have implicit multiple inheritance between any set of IS-A-connected nodes (sharing a root)
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
63
Design Principles Faithfulness Avoiding redundancy Simplicity Choice of relationships Picking elements
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
64
Faithfulness Is the relationship many-many or many-one? Are the attributes appropriate? Are the relationships applicable to the
entities? Examples
Courses & instructors maybe many-one, maybe many-many
Bosses & subordinates maybe one-many, maybe many-many
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
65
Simplicity Einstein: Theories as simple as possible, but not
simpler. Use as few elements as possible
Minimum required relations No unnecessary attributes (will you be using this
attribute?) Eliminate “spinning wheels”
Example: how can we simplify this?
Movies Ownings StudiosOwned-by Owns
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
66
Avoiding redundancy Say everything exactly once
Minimize database storage requirements More important: prevent possible update errors
simplest but not only e.g.: modify data one place but not the other – more later
Example: Spot the redundancy
Studios MoviesOwn
StudioName
Name
Length
Name
Address
Matthew P. Johnson, OCL2, CISDD CUNY, January 2005
67
Next time We’ll finish E/R models and begin the
relational model Readings will be posted