V 1.0 DBMAN 8 Data access layers Files and Indices Relational algebra Relational calculus Random...

Preview:

Citation preview

V 1.0

DBMAN8

Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras

szabo.zsolt@nik.uni-obuda.hu 1

V 1.0

DBMAN8

Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras

szabo.zsolt@nik.uni-obuda.hu 2

V 1.0

Data representation• What are the boundaries of the questions the database

can answer?• Models the real world: mini world with limitations• 3 levels:

– Conceptual model: A world described by the DB– Implementation/representation model: a model

understandable for the DBMS (structured records, tables, fields, etc.)

– Physical model: DBMS implemented on the computer (files, programs)

szabo.zsolt@nik.uni-obuda.hu 3

V 1.0

Structure of the DBMS• The user has permission for the smaller part of the DB:

View

Conceptual model

Implementation model

Physical model

View1 View2 View3

szabo.zsolt@nik.uni-obuda.hu 4

V 1.0

Example: university DB• View: Teachers can see info about their courses (= DQL)• Conceptual model (= tables)

– Student (sid: string, name: string, age: integer, cumulative average: real)

– Subject (subid: string, sname: string, credit: integer)– Registration (sid: string, subid: string, mark: integer)

• Implementation model (= DDL)– Create table subject (

subid varchar(10) not null primary key, sname varchar (50) not null, credit int not null )

• Physical model: files containing unsorted dataszabo.zsolt@nik.uni-obuda.hu 5

V 1.0

Structure of the DBMS• Layers

Query optimization and-execution

Relational operators

Files and permissions

Buffering

Handling storage

Handling concurrency and

recovery control is taken

into consideration

DBOS

szabo.zsolt@nik.uni-obuda.hu 6

V 1.0

Steps of a query1. User ”asks” of the DBMS (SQL query)2. DBMS checks the permission in the schema3. DBMS checks the permission in the subschema4. DBMS asks the OS to execute the I/O operation5. OS looks for the asked record6. OS imports the record into the system buffer7. OS notifies the DBMS8. Record is taken the the user workspace9. DBMS notifies the user about the recieved data

+1. How to get the actual data chunks from the HDD?szabo.zsolt@nik.uni-obuda.hu 7

V 1.0

DBMAN8

Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras

szabo.zsolt@nik.uni-obuda.hu 8

V 1.0

Records in files• DBMS handles records and files• Files: collection of pages/blocks containing records• They must support

– DML (insert, update, delete)– Read records (identified by rid)– Read all the records (satisfying some conditions)

• How to store the blocks?

szabo.zsolt@nik.uni-obuda.hu 9

V 1.0

Way of storage - RAID• Redundant Array of Inexpensive/Independent Data• Connecting disks logically, storing data redundantly• Aims:

– Minimizing data loss, increase reliability– Increasing capacity by more smaller/cheaper disks– Increase data access performance– Increase flexibility (can be replaced during usage)

• Data striping: Data is partitioned into striping units and the partitions are distributed on several disks

• Redundancy: Data is strored redundantly so that reconstruction of data in case of disk failure is possible

szabo.zsolt@nik.uni-obuda.hu 10

V 1.0

Levels of RAID – Level 0 – JBOD

• Non redundant• If one of the disks fails, data is lost• Parallel reading/writing• If the capacity of the disks is different

then the performance depends on the worst disk

szabo.zsolt@nik.uni-obuda.hu 11

V 1.0

RAID Levels – Level 1 – Mirror

• Mirrored, the data is the same on every disk

• If one of the disks fails then data can be reconstructed

• Parallel reading with increased velocity• Parallel writing with normal velocity• If the capacity of the disks is different

then the performance depends on the worst disk

• Does not use data striping

szabo.zsolt@nik.uni-obuda.hu 12

V 1.0

RAID 0+1 and RAID 1+0 (RAID 10)

• RAID 0+1: speed of RAID 0 and redundancy of RAID 1

• Min 4 disks

• RAID 10: first mirroring, then connecting

• If a disk fails, only that RAID 1 is involved

szabo.zsolt@nik.uni-obuda.hu 13

V 1.0

RAID Levels – Level 2

• Uses data striping (unit=1 bit) but some of the disks are used to store error-correcting codes

• ECC: redundant bits calculated from data bits (compress)

• In the strip the corresponding strip’s error correcting code is stored.

• Not used any more (HDDs handle error correction)

szabo.zsolt@nik.uni-obuda.hu 14

V 1.0

RAID Levels – Level 3• Bit-Interleaved Parity• Cannot identify the failed disk (disk controllers do that)• One check disk with parity information• The failed disk’s data can be recovered• Can process only one I/O at a time• Strips=1 bit

szabo.zsolt@nik.uni-obuda.hu 15

V 1.0

RAID Levels – Level 4• Block-Interleaved Parity• Like RAID 3, with strips as disk blocks• Supports serving multiple users• Parity disk needs to be updated at every write, can be

bottle neck• In case of disk failure, reading speed reduces

szabo.zsolt@nik.uni-obuda.hu 16

V 1.0

RAID Levels – Level 5• Block-Interleaved Distributed Parity• Rotating parity: parity is not stored on a single check

disk, but uniformly over all disks• Parallel read and write• Similar to RAID 3 and 4 depending on the size of strips• If a disks fails, it has to be replaced inmediately

otherwise if another fails, all data will be lost

szabo.zsolt@nik.uni-obuda.hu 17

V 1.0

RAID Levels – Level 6• High possibility of the failure of a second disk during disk

recovery use an extra disk• Needs 2 check disks• Able to recover from up to two simultaneous disk

failures• Read and write speed is equal to RAID 5

szabo.zsolt@nik.uni-obuda.hu 18

V 1.0

Unordered (heap) files• Okay, now we can access blocks... How to access the

actually needed data block from the files?

• Simplest file structure: heap file• For the record-level operations DBMS must register

– pages in the file– free space in the page– records in the page

• Sloooooooooooow!

szabo.zsolt@nik.uni-obuda.hu 19

V 1.0

Heap file as a linked list

• Address of the header page and the name of the heap file must be stored in a known location

• Every page contains two pointers in addition• Typical problems of a linked list: slow!

Header

Page

DataPage

DataPage

DataPage

DataPage

DataPage

DataPage Pages with

free space

Full pages

szabo.zsolt@nik.uni-obuda.hu 20

V 1.0

Directory-based heap file• Maintain directory

of pages• DBMS stores the address

of the first pageof each heap file

• Directory=collection of pages(e.g. chained list)

• Counter for every page: amount of free space/entry• Can be faster... Structure of the directory = key factor!• Directory = index file ... We can use more than one index

files for one table!

szabo.zsolt@nik.uni-obuda.hu 21

V 1.0

Example, library

1. Search the books of Asimov2. Search for the book Foundation

szabo.zsolt@nik.uni-obuda.hu 22

V 1.0

Access methods• B trees (B+ trees, B* trees...)• Hash-based structures• We already learned the basics of those from

Programming II.

• Let’s recap: Hash vs Symmetric vs assymetric encryptionUsage in DB: table joins (equality check), partitioning index structures, encryption

• Let’s recap: B tree - basic structure, always balanced, search, insert (split node), delete (merge nodes)Usage in DB: key-based LUTs, custom ordered indices(note: char indices are SLOW – use hash indexes!)

szabo.zsolt@nik.uni-obuda.hu 23

V 1.0

B-tree variants• In the B+ tree, copies of the keys are stored in the

internal nodes; the keys and records are stored in leaves; in addition, a leaf node may include a pointer to the next leaf node to speed sequential access

• The B* tree balances more neighboring internal nodes to keep the internal nodes more densely packed. This variant requires non-root nodes to be at least 2/3 full instead of ½

• Instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. When both nodes are full, then the two nodes are split into three. Deleting nodes very complex

szabo.zsolt@nik.uni-obuda.hu 24

V 1.0

Indexes• To speed up not supported operations• Collection of data entries to speed up search• Rid=pointer to the entries

szabo.zsolt@nik.uni-obuda.hu 25

V 1.0

Index properties: Clusered vs. unclustered• Clustered

– Ordering of data is similar to ordering of indexes (data sorted by the search key on every page)

– Expensive• Unclustered

– Random ordering of data

szabo.zsolt@nik.uni-obuda.hu 26

V 1.0 szabo.zsolt@nik.uni-obuda.hu 27

V 1.0

Index properties: Dense vs. sparse• Dense:

– It contains at least one data entry for every search key that appears in the data file

– Useful optimization techniques rely on it• Sparse

– Contains one entry for each page in the data file– Much smaller

szabo.zsolt@nik.uni-obuda.hu 28

V 1.0

Example

szabo.zsolt@nik.uni-obuda.hu 29

V 1.0

Index properties: Primary vs. secondary• Primary

– Index on a set of fields that includes primary key• Secondary

– Not primary index• Unique

– Contains a key candidate

szabo.zsolt@nik.uni-obuda.hu 30

V 1.0

Index properties: Simple vs composite• Contains several fields

szabo.zsolt@nik.uni-obuda.hu 31

V 1.0

DBMAN8

Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras

szabo.zsolt@nik.uni-obuda.hu 32

V 1.0

Basic operations

szabo.zsolt@nik.uni-obuda.hu 33

V 1.0

Consequent operations

a1 b1

a1 b2

a1 b3

a1 b4

a2 b1

a2 b3

a3 b2

a3 b3

a3 b4

a4 b1

a4 b2

a4 b3

b1

b2

b3

a1

a4

szabo.zsolt@nik.uni-obuda.hu 34

V 1.0

Quotient examples

sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

pno p2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

szabo.zsolt@nik.uni-obuda.hu 35

V 1.0

Joins

szabo.zsolt@nik.uni-obuda.hu 36

V 1.0

All symbols

szabo.zsolt@nik.uni-obuda.hu 37

V 1.0

DBMAN8

Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras

szabo.zsolt@nik.uni-obuda.hu 38

V 1.0

Goal

szabo.zsolt@nik.uni-obuda.hu 39

V 1.0

Atomic formulas

szabo.zsolt@nik.uni-obuda.hu 40

V 1.0

Formulas

szabo.zsolt@nik.uni-obuda.hu 41

V 1.0

Example

szabo.zsolt@nik.uni-obuda.hu 42

V 1.0

Example no. 2

szabo.zsolt@nik.uni-obuda.hu 43

V 1.0

Example no. 3

szabo.zsolt@nik.uni-obuda.hu 44

V 1.0

To conclude relational algebra/calculus• Relational algebra and relational calculus can express the

same• Declarative part is convenient in the queries (user-

friendly)• The algebra (way of calculation) is the task of the DB, it is

hidden from the user

szabo.zsolt@nik.uni-obuda.hu 45

V 1.0

DBMAN8

Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras

szabo.zsolt@nik.uni-obuda.hu 46

V 1.0

SYSTEM PRIVILEGES - ORACLE

• CREATE SESSION – REQUIRED TO LOG IN• CREATE TABLE• CREATE VIEW• CREATE PROCEDURE• CREATE USER• ALTER ANY TABLE• ALTER ANY TRIGGER• SELECT ANY TABLE• DROP ANY TABLE• DROP USER• ALL PRIVILEGES

szabo.zsolt@nik.uni-obuda.hu 47

V 1.0

ROLES, USERS

• CREATE USER {name} IDENTIFIED BY {password};• DROP USER {name};

• Role ~ Group of privileges• CREATE ROLE {name};• DROP ROLE {name};

• Authentication: user+pass / user+pass+host / LDAP / PAM / Windows-based

szabo.zsolt@nik.uni-obuda.hu 48

V 1.0

SETTING SYSTEM PRIVILEGES• GRANT {privileges} TO {role/user};• "WITH GRANT OPTION"• GRANT {role} TO {user};• REVOKE {privileges / role} FROM {user};

szabo.zsolt@nik.uni-obuda.hu 49

V 1.0

OBJECT-PRIVILEGES• SELECT• INSERT• UPDATE• DELETE• ALTER• EXECUTE (for PL/SQL)• READ (for files)• REFERENCES (for constraints)• INDEX (for CREATE INDEX)

szabo.zsolt@nik.uni-obuda.hu 50

V 1.0

SETTING OBJECT-PRIVILEGES• GRANT {privileges} ON {object} TO {user} [WITH GRANT

OPTION];• REVOKE {privileges / ALL} ON {object} FROM {user};

szabo.zsolt@nik.uni-obuda.hu 51

V 1.0

SQL: Kleene’s three-valued logic

• If compared with NULL, it is usually NULL except when UNION or INTERSECT is used

A AND BB

True Unknown False

A

True True Unknown False

Unknown Unknown Unknown False

False False False False

A OR BB

True Unknown False

A

True True True True

Unknown True Unknown Unknown

False True Unknown False A NOT A

True False

Unknown Unknown

False True

szabo.zsolt@nik.uni-obuda.hu 52

V 1.0

Basic terms – should NOT be new!• Elements: , , 𝑎 𝑏 𝑐• Sets: , , 𝐴 𝐵 𝐶• Defining a set:

– enumeration: ={ , , }, thus 𝐴 𝑎 𝑏 𝑐 𝑎∈𝐴– rules: ={ | ≥100 ≤1000}𝐵 𝑥 𝑥 ∧𝑥

• Subset: , 𝐴⊂𝐵 if :∀𝑎∈𝐴 𝑎∈𝐵• Ordered set (vector): = , , 𝑞 ⟨𝑎 𝑏 𝑐⟩• Attributes: key vs secondary attributes• SQL = DML, DDL, DQL, DCL!

szabo.zsolt@nik.uni-obuda.hu 53

V 1.0

Finding the key

szabo.zsolt@nik.uni-obuda.hu 54

V 1.0 szabo.zsolt@nik.uni-obuda.hu 55

szabo.zsolt@nik.uni-obuda.hu 56

Recommended