View
217
Download
1
Category
Tags:
Preview:
Citation preview
Fall 2004ECE569 Lecture 02.1
ECE 569 Database System Engineering
Fall 2004
Yanyong Zhang www.ece.rutgers.edu/~yyzhang
Course URL www.ece.rutgers.edu/~yyzhang/fall04
Fall 2004ECE569 Lecture 02.2
Today’s topic
Relational data model Different data models exist: relational, network,
hierarchical, object-oriented
Powerful, simple, declarative
Relational algebra
Fall 2004ECE569 Lecture 02.3
Overview of Database Design
Conceptual design (ER model is used at this stage) What are the entities and relationships in the enterprise?
What information about these entities and relationships should we store in the database?
What are the integrity constraints or business rules that hold
A database ‘schema’ in the ER model can be represented pictorially
Can map an ER diagram into a relational schema
Fall 2004ECE569 Lecture 02.4
Entity-Relationship model
To describe the conceptual scheme.
An entity is a thing that exists and is distinguishable.
Entity sets consist of a group of all “similar” entities.
Each entity has certain attributes.
A relationship among entity sets is an ordered list of entity sets.
Relationship set: collection of similar relationships An n-ary relationship set R relates n entity sets E1, …, En; each
relationship in R involves entities e1 E1, …, en En
- Same entity set could participate in different relationship sets, or in different “roles” in same set
Fall 2004ECE569 Lecture 02.5
Example of entity-relationship model
Billed
Account Made To
Room#
Name
Address
Balance
Patient
Payment Amount
Date
Diagnosed
Disease NameTreatment
Vital Sign
Pulse
Blood
TimeFrom
Fall 2004ECE569 Lecture 02.6
Data Model
A data model is a mathematical formalism with two parts:
A notation for describing data
A set of operations used to manipulate that data
Fall 2004ECE569 Lecture 02.7
Why Study the Relational Model?
Most widely used model Vendors: IBM, informix, Microsoft,Oracle,Sybase,etc.
“legacy systems” in older models e.g., IBM’s IMS
Recent competitor: object-oriented model ObjectStore, Versant, Ontos
A synthesis emerging: object-relational model
- Informix Universal server, UniSQL, O2, Oracle, DB2
Fall 2004ECE569 Lecture 02.8
Relational Database: Definitions
Relational database: a set of relations
Relation: made up of 2 parts Instance: a table, with rows and columns
- # Rows = cardinality, # columns = degree
Schema: specifies name of relation, plus name and type of each column
- e.g., students(sid:string, name:string, login:string,age:integer, gpa:real)
Can think of a relation as a set of rows or tuples (i.e., all rows are distinct)
Fall 2004ECE569 Lecture 02.9
Set-theoretic notion of a relation
A domain D is a set of values colors = {red, blue, green}
age = set of positive integers less than 20
last_name = {a-zA-Z}+
A relation over domains D1,D2,…,Dn is a subset of D1D2D3…Dn.
Example R(colors, age) {(red,1), (blue,1),(green,1),
(red,2), (blue,2),(green,2), …}
Fall 2004ECE569 Lecture 02.10
Set-theoretic notion of a relation (cont)
We are only interested in finite relations.
The members of a relation are called tuples.
Each relation that is a subset of the product of k domains is k-degree.
A relation can be represented as a table, where each row is a tuple and each column corresponds to one domain.
PERSON (NAME, AGE)
Name Age
Zhang 27
Merril 2
Stewart 40
Fall 2004ECE569 Lecture 02.11
Set-theoretic notion of a relation (cont)
Because a relation is a set of tuples, the order of tuples in the table is insignificant (but the order of domains does matter).
Name Age
Zhang 27
Merril 2
Stewart 40
Name Age
Zhang 27
Stewart 40
Merril 2
Age Name
27 Zhang
2 Merill
40 Stewart
Fall 2004ECE569 Lecture 02.12
An alternative definition – set of mappings
A relation schema R = {A1,A2,…,An} (no order imposed on attributes)
A relation instance of R is a finite set of mappings {1, 2, …, m} where
2: R D
where D = null dom(A1) dom(A2)… dom(An).
Each tuple i maps each attribute in tuple i to a value.
Note that attributes must be given names that are unique in a relation schema.
Under this definition, all three instances above are equivalent.
Fall 2004ECE569 Lecture 02.13
Keys
A set S of attributes is a candidate key for R if No semantically correct instance of R can include two
tuples such that [s] = [s], and
No proper subset of S satisfies (a)
A set of attributes satisfying (a) but not necessarily (b) is a superkey.
A relation may have multiple candidate keys. One can be designated as the primary key.
Fall 2004ECE569 Lecture 02.14
Example
Schema for a relation of degree four patient(name,address,balance_due,room#)
An instance of patient with three tuples that satisfies the key constraint (name is a candidate key)
patient Name Address Balance_due Room #
Zhang 1964 highland dr $5,230 518
Stewart 2342 K St. $25 203
Brinkley 2342 K St. $10,200 203
No other candidate key is possible (assuming that this is a semantically correct instance of patient).
Fall 2004ECE569 Lecture 02.15
Example – Healthcare DB
patient(name, address, balance_due, room#)
payments(name, amount, date)
vital_signs(name, pulse, blood, date, time)
diagnosis(patient_name, disease_name)
disease(disease_name, treatment)
treats(patient, doctor)
Fall 2004ECE569 Lecture 02.16
Example instance of students relation
Sid Name Login Age
Gpa
53666 Jones jones@cse 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
Fall 2004ECE569 Lecture 02.17
Relational Query Language
A major strength of the relational model: supports simple, powerful querying of data
Queries can be written intuitively, and the DBMS is responsible for efficient evaluation
Key: precise semantics for relational queries
Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change
Fall 2004ECE569 Lecture 02.18
The SQL Query Language
Developed by IBM (system R) in the 1970s
Need for a standard since it is used by many vendors
Standards SQL-86
SQL-89 (minor revision)
SQL-92 (major revision)
SQL-99 (major extensions, current standard)
Fall 2004ECE569 Lecture 02.19
The SQL Query Language
To find all 18 year old students, we can write
To find just names and logins, replace the first line:
Sid Name Login Age Gpa
53666 Jones jones@cse 18 3.4
53688 Smith smith@eecs 18 3.2
select *from student S
where s.age=18
select s.name, s.login
Fall 2004ECE569 Lecture 02.20
Logical DB design
Representing entity-relationship diagrams An entity set E can be represented by a relation whose
relation scheme consists of all the attributes of the entity set.
Each tuple of the relation represents one entity in the current instance of E.
A relationship R among E1, E2, …, Ek is represented by a relation whose relation scheme consists of the attributes in the keys for each of E1, E2, …, Ek.
Fall 2004ECE569 Lecture 02.21
Relational Query language
Query language: Allow manipulation and retrieval of data from a database
Relational model supports simple, powerful QLs: Strong formal foundation based on logic
Allows for much optimization
Query language != programming language
Fall 2004ECE569 Lecture 02.22
Formal Relational Query Languages
Two mathematical query languages form the basis for “real” languages (e.g., SQL) and for implementation:
Relational algebra: more operational, very useful for representing execution plans
Relational calculus: let users describe what they want, rather than how to compute it (Non-operational, declarative)
Fall 2004ECE569 Lecture 02.23
Relational Algebra
A set of five basic operations that map one or more relations to a new relation.
Union: R S t tR or tS The set of tuples in R or S or both
The degrees of R and S must match
A B C
a b c
a d c
e a b
D E
d e
b c
F G H
d e b
a b c
a b b
d b bRS
T
a b c
a d c
e a b
d e b
a b b
d b b
R T
Fall 2004ECE569 Lecture 02.24
Relational Algebra (cont)
Difference: R - S t tR and tS The set of tuples in R, but not in S
The degrees of R and S must match
A B C
a b c
a d c
e a b
D E
d e
b c
F G H
d e b
a b c
a b b
d b bRS
T
a d c
e a b
R - T
Fall 2004ECE569 Lecture 02.25
Relational Algebra (cont)
Cartesian Product: R X S
(a1, a2, …, ar+s) (a1, a2, …, ar) R and (ar+1, ar+2, …, ar+s) S
R x S is of degree r+s where r=degree(R) and s=degree(s)
a tuple t is in R x S if its first r components match those of a tuple in R and its last s components from a tuple in S
A B C
a b c
a d c
e a b
D E
d e
b c
F G H
d e b
a b c
a b b
d b bRS
T
A B C D E
a b c d e
a b c b c
a d c d e
a d c b c
e a b d e
e a b b c
R x S
R.A R.B R.C S.D S.E
Fall 2004ECE569 Lecture 02.26
Relational Algebra (cont)
Projection: i1, i2, …, im (R) (ai1, ai2, …, aim) (a1, a2, …, an) R
i1, i2, …, im (R) is of degree m
A tuple t in the result is obtained by removing and/or rearranging the attributes of a tuple of R
A B C
a b c
a d c
e a b
D E
d e
b c
F G H
d e b
a b c
a b b
d b bRS
T
G F
e d
b a
b d
2,1 (T)
Fall 2004ECE569 Lecture 02.27
Relational Algebra (cont)
Selection: F(R) t t R and F(t)
F(R) is of the same degree as R
All tuples in R that satisfy predicate F are included in result
F is a formula involving
- Constants or components of the tuple – component i is denoted $I
- Comparison operators, <, =, >, , ,
- Logical operators and, or, not
F G H
d e b
a b c
a b b
d b b
T
F G H
d e b
a b b
d b b
$1=d or $2=$3 (T)
Fall 2004ECE569 Lecture 02.28
Examples
Find the names of all patients with a balance of more than $10,000.
Find the names of all patients that have a pulse less than 50 or have been diagnosed with hypertension.
Fall 2004ECE569 Lecture 02.29
Examples (cont)
Find all patients whose treatment includes the “application of leeches”
Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.
Fall 2004ECE569 Lecture 02.30
Examples (cont)
Print the name of the patient with the highest pulse recorded today
Find all patients that had pulse measured today
Find the set of all ‘losers’, i.e., those whose pulse is less than or equal to that of some other patient
Subtract (b) from (a) and project out patient name
Fall 2004ECE569 Lecture 02.31
Examples (cont)
List all patients that suffer from at least one illness that no other patient suffers from
Fall 2004ECE569 Lecture 02.32
Additional Operations
Intersection: R S t tR and tS The set of tuples in both R and S
The degrees of R and S must match
R S = R – (R – S) SR
R – ( R – S)
R – S
Fall 2004ECE569 Lecture 02.33
Additional Operations (cont)
Theta-join: R ij S $i $r+j(R S) where r = degree(R) and <, =, >, , ,
When is ‘=‘ operation is called equijoin.
Find patient with highest pulse
Fall 2004ECE569 Lecture 02.34
Additional Operations (cont)
Natural Join: R S An equijoin for all attributes that R and S have in
common.
The shared columns originating in S are projected out
R S i1, i2, …, im R.A1 = S. A1 and … and R.Ak = S.Ak (R x S) where i1, i2, … im is the list of all attributes of R x S, in order, except the attributes S.A1, …, S.Ak.
Example: Names of patients being treated with leeches.
Fall 2004ECE569 Lecture 02.35
Sample Queries
Consider the following set of relations
frequents (drinker, bar)
serves (bar, beer)
likes (drinker, beer)
List the drinkers that frequent at least one bar that serves a beer they like
Construct the relation should_visit(drinker,bar) consisting of all tuples <d, b> where bar b serves a beer that drinker d likes.
Fall 2004ECE569 Lecture 02.36
Sample Queries (cont)
List the drinkers that frequent only bars that serve some beer they like. (Assume every drinker frequents at least one bar.)
Print the drinkers that frequent no bar that serves a beer that they like
Fall 2004ECE569 Lecture 02.37
Questions on Healthcare DB
Retrieve all pairs of patients that shared a room at the same time
List all patients that finished paying their bill before one of their roomates began paying theirs.
Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.
Fall 2004ECE569 Lecture 02.38
Components of Data-Intensive Systems
Three separate types of functionalities Data management
Application logic
Presentation
The system architecture determines whether these three components reside on a single system (“tier”) or are distributed across several tiers
Fall 2004ECE569 Lecture 02.39
Single-tier architecture
All functionality combined into a single tier, usually on a mainframe
Users access through dumb terminals
Advantages: Easy maintenance and administration
Disadvantages: Today, users expect graphical user interface
Centralized computation of all of them is too much for a central system
Fall 2004ECE569 Lecture 02.40
Client-Server Architecture
Thin client: work division Client implements only the graphical user interface
Server implements business logic and data management
Thick client: work division
- Client implements both the graphical user interface and the business logic
- Server implements data management
Disadvantages:
- No central place to update the business logic
- Security issues: server needs to trust clients
- Scalability: does not scale more than several 100s of clients
Fall 2004ECE569 Lecture 02.41
The three-tier architecture
Client program (web browser)
Application Server(Tomcat, Apache)
Database System(DB2)
Recommended