View
223
Download
0
Category
Preview:
Citation preview
1
CS351 Introduction to Database systems
quan.wang@wsu.edu
2
About me
Quan Wang– developer at Oracle Corp.
• JDBC• Web Services• Cloud
3
Content
Introduction– Why databases?
Data Model
4
Why databases?
It used to be about boring stuff: employee records, bank records, etc.
Today, the field covers all the largest sources of data, with many new ideas. Web search. Data mining. Scientific and medical databases. Integrating information.
5
Why databases?
You may not notice it, but databases are behind almost everything you do on the Web. Google searches. Queries at Amazon, eBay, etc.
6
Why databases?
Databases often have unique concurrency-control problems. Many activities (transactions) at the
database at all times. Must not confuse actions, e.g., two
withdrawals from the same account must each debit the account.
7
Why databases?
Changing landscape System R: Oracle, MySQL, IBM, MS NoSQL: MangoDB NewSQL: VoltDB Cloud: Amazon RDS, Google Bigquery Bigdata: Hadoop, MapR
8
Why databases?
Job perspectives– Startups– Massive industry
9
Content
Introduction– Why database?– What to cover?
Data Model
10
What to cover?
Database – a set of inter-related data
pertaining to some organization– airline, university, store
Database Management System (DBMS)
– one or more databases – algorithms accessing the databases
11
12
What to cover?
Database Theory– relational model and algebra– norm forms
Design of databases.– normalization– E/R, UML
13
What to cover?
Database programming– SQL, stored procedures.
Integration?– ODBC, JDBC– O-R Mapping
14
What not covered?
Not covered– DB tuning– DB administration(DBA)– DB implementation– Programming languages
• May need to learn languages and the DB related aspects
15
Content
Introduction– Why database?– What to cover?– Logistics
Data Model
16
Logistics
Email: quan.wang@wsu.edu Office hour
– Tuesday & Thursday – 1 pm to 2pm– 201L
17
Logistics
No TA No detailed comments on
assignments Distribute answer keys when
appropriate Check discrepancies Raise issues
18
Logistics
Course page: – Http://encs.vancouver.wsu.edu/~
quwang– weekly schedule, assignments
Submission page– http://lms.wsu.edu – Use drop box for assignments and
project submission
19
Logistics Assignments 30% Team project 20% Midterms 25% Final 25%
20
Logistics
Team project– mini-ebay– Max 3 people each team– Single grade for each team
Assignments and project – Late submission 10% penalty
21
Logistics
Databases– MySQL– SQLite
• /usr/bin/sqlite3
22
Logistics
Exams– Lectures, textbook, and assigned
readings– Only areas covered in the
lectures– Not curved
23
Content
Introduction Why database? What to cover? Logistics
Data Model– Why not flat files?
24
Why not flat files?
datasavings.dat
NAME,ADDR,BALANCE,OPENED,INTEREST
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
operations – check balance– deposit or withdraw– change address
25
Why not flat files?
dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
problems – code coupled with data format
• changing delimiter would require code change
26
Why not flat files?
dataSavings.data:
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
Checkings.data:
David, 789 NE Sany Blvd, 1500, 08/20/1998, 0.001!
Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0
problems – redundancy leads to
inconsistency
27
Why not flat files?
dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
problems – concurrency
• withdraw calls for file lock– withdraw $100 from the same account at
the same time, synchronized
– withdraw $100 from different accounts at the same time, synchronized unnecessarily
28
Why not flat files?
dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
problems – user interface
• file based algorithms
29
Why not flat files?
summary – code coupled with data– redundancy leading to
inconsistency – low concurrency– unfriendly user interface
30
Why not flat files?
database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency
31
Content
Introduction Data Model
– why not flat files?– relational data model
• relations• schema• keys
32
What is a Data Model?
1. Underlying structure of a database.2. Mathematical representation of data.
Examples: relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs.
3. Operations on data.4. Constraints.
33
A Relation is a Table
name manfWinterbrew Pete’sBud Lite Anheuser-Busch
Beers
Attributes(columnheaders)
Tuples(rows)
Relation name
34
Schemas
Relation schema = relation name and attribute list. Optionally: types of attributes. Example: Beers(name, manf) or
Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all
relation schemas in the database.
35
Why Relations?
Very simple model. Often matches how we think
about data. Abstract model that underlies SQL,
the most important database language today.
37
Example
Sells( bar, beer, price ) Bars( name, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00
38
Database Schemas in SQL
SQL is primarily a query language, for getting information from a database.
But SQL also includes a data-definition component, DDL, for describing database schemas.
39
Creating (Declaring) a Relation
Simplest form is:CREATE TABLE <name> (<list of elements>);
To delete a relation:DROP TABLE <name>;
40
Elements of Table Declarations
Most basic element: an attribute and its type.
The most common types are: INT or INTEGER (synonyms). REAL or FLOAT (synonyms). CHAR(n ) = fixed-length string of n
characters. VARCHAR(n ) = variable-length string
of up to n characters.
41
Example: Create Table
CREATE TABLE Sells (
bar CHAR(20),
beer VARCHAR(20),
price REAL
);
42
SQL Values
Integers and reals are represented as you would expect.
Strings are too, except they require single quotes. Two single quotes = real quote, e.g., ’Joe’’s Bar’.
Any value can be NULL.
43
Dates and Times
DATE and TIME are types in SQL. The form of a date value is:
DATE ’yyyy-mm-dd’ Example: DATE ’2007-09-30’ for
Sept. 30, 2007.
44
Times as Values
The form of a time value is:TIME ’hh:mm:ss’
with an optional decimal point and fractions of a second following. Example: TIME ’15:30:02.5’ = two
and a half seconds after 3:30PM.
45
Declaring Keys
An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE.
Either says that no two tuples of the relation may agree in all the attribute(s) on the list.
There are a few distinctions to be mentioned later.
46
Declaring Single-Attribute Keys
Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute.
Example:CREATE TABLE Beers (
name CHAR(20) UNIQUE,
manf CHAR(20)
);
47
Declaring Multiattribute Keys
A key declaration can also be another element in the list of elements of a CREATE TABLE statement.
This form is essential if the key consists of more than one attribute. May be used even for one-attribute
keys.
48
Example: Multiattribute Key
The bar and beer together are the key for Sells:CREATE TABLE Sells (
bar CHAR(20),
beer VARCHAR(20),
price REAL,
PRIMARY KEY (bar, beer)
);
49
PRIMARY KEY vs. UNIQUE
1. There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes.
2. No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.
50
Content
Introduction Data Model
– why not flat files?– relational data model
relations schema keys
– why relations?
51
Why relations?
datasavings.dat
NAME,ADDR,BALANCE,OPENED,INTEREST
David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!
Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006
52
Terminalogy
DDL: Data Definition Language CREATE TABLE
DML: Data Manipulation Language INSERT
SQL: Structured Query Language SELECT...FROM...WHERE
53
Why relations?
schema Savings( name CHAR(50),
addr CHAR(50),
balance REAL,
opened DATE,
interest REAL);
54
Why relations?
relation
operations – check balance– deposit or withdraw– change address
name addr balance opened interest
David 123 NW Flander St 500.75 2010 0.005
Lisa 432 Alberta St 4000.00 2001 0.006
55
Why relations?
relation
advantage – code coupled with data format?
name addr balance opened interest
David 123 NW Flander St 500.75 2010 0.005
Lisa 432 Alberta St 4000.00 2001 0.006
name addr balance opened interest
David 123 NW Flander St 500.75 2010 0.005
Lisa 432 Alberta St 4000.00 2001 0.006
56
Why relations?
relation
advantage – concurrency
• simultaneous withdraw calls
name addr balance opened interest
David 123 NW Flander St 500.75 2010 0.005
Lisa 432 Alberta St 4000.00 2001 0.006
57
Why relations?
relations
advantage – redundancy?
Savings(name, addr, balance, opened, interest)
Checkings(name, addr, balance, opened, interest)
58
Why relations?
relations
advantage – redundancy?
name addr balance opened interest
David 123 NW Flander St 500.75 2010 0.005
Lisa 432 Alberta St 4000.00 2001 0.006
name addr balance opened interest
David 123123 Sandy Blvd 1500 1998 0.001
Lisa 432 Alberta St 3500 1990 0
Savings
Checkings
59
Why relations?
relations
advantage – redundancy?
Customers(cust_id, name, addr)
Savings(cust_id, balance, opened, interest)
Checkings(cust_id, balance, opened, interest)
60
Why relations?
relations
advantage – redundancy?
cust_id name addr
c1 David 123 NW Flander St
c2 Lisa 432 Alberta Stcust_id
balance opened interest
c1 1200 1998 0.001
c2 3500 1990 0
cust_id
balance opened interest
c1 500.75 2010 0.005
c2 4000.00 2001 0.006
Customers
Savings
Checkings
61
Why relations?
relation
advantage – user interface
• natural language like support desirable
name addr balance opened interest
David 123 NW Flander St 500.75 2010 0.005
Lisa 432 Alberta St 4000.00 2001 0.006
62
Why relations?
relation
advantage – user interface SELECT name, addr
FROM savingsWHERE balance>2000
NAME ADDR BALANCE OPENED INTEREST
David 123 NW Flander St
500.75 2010 0.005
Lisa 432 Alberta St
4000.00 2001 0.006
63
Why relations?
relation
advantage – user interface
SELECT name, addr, balance FROM savingsORDER BY balance
NAME ADDR BALANCE OPENED INTEREST
David 123 NW Flander St
500.75 2010 0.005
Lisa 432 Alberta St
4000.00 2001 0.006
64
Content
Introduction Data Model
– why not flat files?– relational data model– why relations?
65
Content
Introduction Data Model
– why not flat files?– relational data model– why relations?– semi-structured data models
• XML and JSON
66
<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>
XML
67
JSON{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}
68
XML and JSON, side-by-side
<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>
{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}
69
XML Schema<xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Title" type="xs:string" /> <xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Date" type="xs:gYear" /> <xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType></xs:element>
70
JSON Schema{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}
71
String type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}
<xs:element name="Title" type="xs:string" />
72
List type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}
<xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType></xs:element>
73
Year type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}
<xs:element name="Date" type="xs:gYear" />
74
Enumeration type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}
<xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType></xs:element>
75
Semi-Structured Data Models
XML Adoption– Web Services (SOAP)– XML DB
JSON Adoption– Web Services (REST)– MongoDB
76
Content
Introduction Data Model
– why not flat files?– relational data model– why relations?– semi-structured data models– history
77
History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s
Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present
Recommended