48
1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

Embed Size (px)

Citation preview

Page 1: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

1

An Introduction to DBMS Technology

Yelena YeshaOlga Streltchenko

Page 2: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

2

Presentation Overview

Database functionalityRelational Data ModelSQLWeb-Database Connectivity

Page 3: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

3

Transaction

A transaction is an exchange of: Information; Goods; Services; Currency/currencies;

Transaction properties Atomicity: a transaction must be all or

nothing. Consistency: a transaction takes the system

from one consistent state to another. Isolation. Durability.

Page 4: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

4

Databases and Information Economy

Shift from computation to information; corporate computing; personal computing and the Internet; scientific computing.

Growing importance of transaction-orientation and information retrieval.

The database field concentrates on the efficient management of large amounts of persistent, reliable shared data.

Page 5: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

5

Database versus File System

Database management systems (DBMS) is a software that provides transaction support by implementing Data independence; Data access efficiency; Concurrency control ; Data integrity; Reliability; Security; Data distribution and heterogeneity.

Page 6: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

6

Data Independence and Access Efficiency

DBMS allows to avoid rewriting all access routines every time the data format changes or data is added/modified/deleted. insulate applications from data storage details. Logical independence: protection from changes

in logical structure of data. Physical independence: protection from

changes in physical structure of data. DBMS maintains data structures and

implements algorithms allowing to avoid linear search indexing: search in O(log n); fast access even on complex data queries.

Page 7: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

7

Concurrency Control and Data Integrity

Interleaving actions of different applications boosts performance.

DBMS insures semantically correct access to the same data by concurrent applications two programs accessing the same data at the

same time can result in an inconsistent update;

implement sharing in a controlled manner. Data semantics may require certain

constraints to be satisfied. DBMS guarantees that application programs

comply with the constraints when adding/modifying the data.

Page 8: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

8

Reliability and Security

DBMS provides techniques for recovery from software and hardware failures guarantee survival of the data across

catastrophes.DBMS prevents unauthorized users

from accessing/modifying data or denying service to other users.

Page 9: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

9

Data Distribution and HeterogeneityCentralization is the enemy of scalability

a vast number of modern applications are distributed.

Data sharing in a distributed environment is a challenge.

Heterogeneity applies to networks, hardware, operating systems, programming languages, data formats, etc.

Distributed applications must mask the differences.

Need distributed data management.

Page 10: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

10

Categories of Data Models

High-level or conceptual entities, attributes, relationships.

Representational or implementation or logical relational, network hierarchical, object-

oriented, object-relational.Physical or low-level

data storage.

Page 11: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

11

Levels of Abstraction in a Database Schema versus Instance

schema = description of the data that captures data types, relationships, constraints on the

data;l meta-data (data about data), knowledge,

e.g., Employees(EmpName, EmpNo, Dept, Sal)

is independent of any application program changes infrequently

instance = set of records/tuples/rows for that schema, the actual data in the database at a given time

time-varying e.g., <Jane, 201, Shoe, 1M>,<Susan, 302, Toy,

1M>

Page 12: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

12

3-schema Architecture

Physical level description of a database: how things are stored on disk:

files, record structures, indices, data structures for disk blocks, methodology for dealing with too long records, etc.

Conceptual level description of a database The description of application data (its

schema) using one of the traditional data models.

Page 13: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

13

3-Schema Architecture (cont'd)

View-level description of a database What users of a particular application see

their own customized schema, e.g., for payroll, for the ticket agent, for a simulation program.

Multiple levels helps with data independence; helps with maintenance.

Many views, single logical and physical schema.

Levels of abstraction give data independence.

Page 14: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

14

The Entity-Relational Model

Entity: a distinguishable object.Entity set: a set of entities all of the

same type.Attribute: a single property of an entity;

simple vs composite; single-valued vs multi-valued; stored vs derived; null values.

Domain: set of values permitted for that attribute.

Page 15: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

15

The E-R Model (2)

Relationship: an association between two or more entities.

Relationship set: a set of relationships all of the same type

There is no correct schema for a batch of data. Which schema is best depends on the application.

Many basic data modelling choices depend on an understanding of the application.

Page 16: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

16

Data Model

Data model: notation for describing data, plus a set of operations used to manipulate that data. a set of primitives for defining the structure of a

DB; a set of operations for specifying the retrievals

and updates on a DB; relational, hierarchical, network, object-oriented.

Page 17: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

17

The Relational Model (Codd 1970)

The relational data model is the most important data model currently existing.

Value-oriented, i.e., allows operations on relations whose results are relations, thus enables to combine operations. As opposed to object-oriented models, in which

Operations cannot be applied to the result of other operations;

The result of an operation may be a new data type, and operations may not be available for this type.

Page 18: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

18

Definitions: Domain & Relation

A domain is a set of atomic values.A relation is a finite subset of the cartesian

product of a finite list of domains; relation is a set of tuples; order of tuples is irrelevant and no relation has 2 identical tuples; each tuple value is atomic

no composite attributes;no multi-valued attributes.

Page 19: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

19

Relational Model (cont’d)

Everything is represented by relations– Formally: Given sets D1, D2, ....Dn (not necessarily distinct), a relation R D1 X D2 X ...X Dn

– Di 's are the domains and n is the arity (degree) of R– elements of R are called tuples– number of tuples in R is the cardinality of R

relational data model helps to view a relation as a table Observe the following properties:

Page 20: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

20

Relational Model (cont’d) Everything is represented by relations;

Given sets D1, D2, ....Dn (not necessarily distinct), a relation R D1 X D2 X ...X Dn;

Di 's are the domains and n is the arity (degree) of R; elements of R are called tuples; number of tuples in R is the cardinality of R.

Relational data model helps to view a relation as a table: each row represents a tuple (record); each column represents an attribute (field).

Properties: no two rows are identical; the ordering of tuples is unimportant; the ordering of columns is important.

Page 21: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

21

Keys

Let R be a relation schema and K R. K is a superkey of R if it can uniquely identify any

tuple in any r(R). There are no tuples t and t' such that t[K] = t'[K}.

K is a candidate key if K is a minimal superkey. There is no K' K such that K' is also a superkey of r(R)

A primary key is one of the candidate keys, remaining candidate keys are alternate keys;

Every relation has a key. A key is a property of a relation schema, not a

relation.

Page 22: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

22

Integrity Constraints Relational database schema is a set of relation

schemas and a set of integrity constraints Integrity constraint: condition that must be true

for any instance of a database. Integrity constraints are expected to hold on

every database instance of the schema Integrity constraints

Structural:key constraint: uniqueness of keys;entity integrity constraint: no primary key value can be null;referential integrity constraint: reference from relation R

to relation S must refer to an existing tuple of S Semantic.

Page 23: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

23

Foreign Keys

A set of attributes (FK) of R is a foreign key if the attributes in FK have the same domain as the

primary key (PK) attributes of another relation S, and

for each instance of R, the values of FK occur as a value of PK for some instance in S, or is null.

In the relational model, the only way an entity can reference another entity is through the value of the primary key of the second entity.

Foreign keys don't have to be unique or non-null, but if one component is null, then all components must be null.

Page 24: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

24

E-R to Relations (I.e., Defining Relations)

Done using DDL (Data Definition Language)

Name whole database schemaDeclare domains for attributesDefine relations:

name attribute names and domains primary and other keys foreign keys

Page 25: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

25

Translating from E-R

Represent entity set E by a relation whose attributes are all the E-R attributes of E. Then each tuple represents one entity of E.

To represent relation R between entity sets E1, …, Ek, create relation R with key attributes of E1, …, key attributes of Ek, as attributes (rename duplicates). Each tuple of the relation represents one combination of entities that are related to one another.

You might have some redundant relations, which you can delete.

Page 26: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

26

Schema Normalization

Formal theory of database designbased on grouping attributes in a

particular way using attribute dependencies to achieve ‘good’ schemas

1NF, 2NF, 3NF, BCNF, 4NF, …Goal:

don’t store redundant information can represent everything (otherwise, the

schema is useless!)

Page 27: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

27

Query and Update Languages

DDL : data definition language used by DBA to define schemas, create views, create

indicesDML : data manipulation language

used by sophisticated casual user to query data or update data

Page 28: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

28

Relational Query Languages Query languages allow manipulation and retrieval of data from

a database. Relational model supports simple, powerful query languages;

strong formal foundation based on logic; allows for optimization.

Two mathematical languages form the basis for relational languages (e.g., SQL) and for implementation: Relational Algebra: More operational, useful for representing

execution plans; Relational Calculus: Lets users describe what they want, rather

than how to compute it (non-operational, declarative). Basic operations:

selection, projection, cross-product, set-difference, union, intersection, join, division

Page 29: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

29

SQL

SQL (Structured Query Language) is the query language for the System R developed at IBM San Jose [Astraham, Gray, Lindsay, Selinger ..]

SQL is now the query language for IBM's DB2 and the de-facto standard on most commercial RDBMS.

SQL is a comprehensive language providing statements for data definition, query and update. Hence it is both DDL and DML.

Page 30: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

30

SQL (cont’d) SQL allows to create views, it can be embedded in a

general-purpose programming language (C or PASCAL).

SQL has one basic statement for retrieving data from the database: the SELECT statement

SELECT <attribute list> FROM <table list> WHERE <condition>

Standards: SQL or SQL1 (ANSI 1986); SQL2 or SQL-92 (ANSI 1992); SQL3 underway: extends SQL with OO and other concepts.

Page 31: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

31

SQL Data Types Numeric

Integers of various ranges: INTEGER (or INT), SMALLINT; Real numbers of various precision: FLOAT, REAL, DOUBLE

PRECISION; Formatted numbers: DECIMAL(i,j) or DEC(i,j) or

NUMERIC(i,j). Character Strings

Fixed length n: CHAR(n) or CHARACTER(n); Variable length of maximum n: VARCHAR(n) or CHAR

VARYING(n) (default n =1). Bit strings:

Fixed length n: BIT(n); Varying length of maximum n: VARBIT(n) or BIT VARYING(n).

Page 32: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

32

SQL Data Types (cont’d)

Date & Time [SQL2]: DATE (10 positions): YYYY-MM-DD; TIME (8 positions): HH:MM:SS; TIMESTAMP:date, time with 6 fractions of seconds and

optional time zone TIME(i) defines i decimal fractions of seconds

(8+1+i positions): HH:MM:SS:ddd...d; TIME WITH TIME ZONE includes the displacement from

standard universal time zone [+13:00 to -12:59] (6 additional positions): HH:MM:SS+/-HH:MM

INTERVAL: Year/Month or Day/TIME

Page 33: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

33

Data Definition Language DDL is used to define the (schema of) database

to create a database schema; to create a domain; to create, drop. alter a table; to create, remove an index [defunct in SQL2]; to create or drop a view; to define integrity constraints; to define access privileges to users (Oracle: CONNECT,

RESOURCE, DBA); to GRANT or REVOKE privileges ON/TO object/user

SQL2 supports multiple schemas CREATE SCHEMA name AUTHORIZATION user;– CREATE SCHEMA EMPLOYEE AUTHORIZATION yesha;

Page 34: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

34

SQL Schema EMP(Name,SSN,DNO,BirthPlace) DEPT(DName,DNO,MGRSSN) PROJECT(PName,PNO,PLocation,DNum) WORKSON(ESSN,PNO,Hours)

CREATE SCHEMA 'COMPANY';

CREATE TABLE EMP( EName name_dom NOT NULL, SSN CHAR(9) NOT NULL, DNO INTEGER NOT NULL, BirthPlace city_dom, PRIMARY KEY(SSN), FOREIGN KEY (DNO) REFERENCES DEPT (DNO));

Page 35: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

35

Drop

DROP command can be used to remove– a schema: DROP SCHEMA Company CASCADE; DROP SCHEMA Company RESTRICT

CASCADE option removes everything: tuples, tables, domains, ...

RESTRICT option removes the schema if it has no elements in it

– a table: DROP TABLE EMP CASCADE; DROP SCHEMA EMP RESTRICT

CASCADE option removes the table and all references to it RESTRICT option removes the table if it is not referenced

Page 36: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

36

Alter

The ALTER allows to:– alter the domain of an attribute ALTER TABLE Student – ALTER GPA NUMBER(4,2);– set or drop default value of an attribute ALTER TABLE Student ALTER GPA DROP DEFAULT; ALTER TABLE Student ALTER GPA SET DEFAULT 0.00;– add a new attribute to a relation ALTER TABLE Student ALTER Admission DATE;– drop an attribute (not in SQL1) ALTER TABLE Student DROP GPA [CASCADE/RESTRICT];

Page 37: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

37

Data Manipulation Language

SELECT tuple queries aggregate queries

INSERTDELETEUPDATECREATE VIEW

Page 38: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

38

Data Query: SELECT Used for retrieval. Used to specify subqueries for retrieval and

for the other operations. Not the sigma or select operator of the

relational algebra. The general form of a SELECT statement:

SELECT <attribute list>FROM <table list>WHERE <condition>GROUP BY <attribute list>HAVING <condition>ORDER BY <attribute,{ASC/DESC} pair>

Page 39: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

39

Relational Operators in SQL

Projection: SELECT A,B FROM RSelection: SELECT * FROM R WHERE F

The WHERE condition can be a Boolean combination of conditions.

Product of two tables, A X B:SELECT R., S.FROM R, S

Page 40: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

40

Data Update

Examples: Create a new instance of an entity

explicitly with all details:INSERT table VALUES (v1, v2, ..., vn)

inserts a tuple (v1, v2, ..., vn) into table Unspecified columns get their default

value if available,NULL if allowed (and no default is defined), (fails otherwise):INSERT table(c1, c2, cm) VALUES (v1,

v2, ..., vm) inserts a tuple with m columns set to (v1, v2, ..., vm).

Page 41: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

41

Data Update (cont’d)

Examples: UPDATE table SET c1 = v1, ..., cm = vm [WHERE

expr] Update all instances matching some local

criterion: UPDATE table SET ? WHERE ? Remove instances matching some local

criterion: DELETE table WHERE ? Remove instances matching a non-local criterionDELETE table FROM table, table2 WHERE AND ?

Page 42: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

42

SQL Views

An SQL view is a table derived from other tables base (physical) tables—ultimately

depend on these defining tables other views

Views are usually virtual sometimes materialized

Page 43: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

43

View Specification

Views are specified with the paradigmCREATE VIEW viewAS SELECT …

The SELECT … part is the defining query

In a view definition, we can define column names of view use aggregation (GROUP BY)

Page 44: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

44

View Resolution

Views are computed by a sort of macro expansion

when needed—view resolution query modification: compute fresh view materialization: store always up to date: modifications to the

defining tables are automatically reflected in the view

Page 45: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

45

Updating Views

A view tuple may have as its source a combination of base tuples, because of joins aggregations

Therefore, updating a view is nontrivial to determine may potentially lead to more than one update

on the defining relations is often not allowed

Page 46: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

46

Updating Views (2)

Restricted kinds of views may be updated. The subselect must have only one defining table columns including a candidate key of

the table only column names, not expressions no joins, e.g., no correlative subqueries no DISTINCT keyword no aggregates

Page 47: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

47

CHECK OPTION

Specified for updatable views Checks whether an INSERT or UPDATE

to a view would immediately cause the tuple to disappear from the view

the new tuple should satisfy the WHERE clause of the view definition

Page 48: 1 An Introduction to DBMS Technology Yelena Yesha Olga Streltchenko

48

SQL Constraints

SQL allows declarative constraints such as

CREATE ASSERTION assertion-nameCHECK (enhanced subselect query)

The DBMS checks if any ASSERTION is ever violated