Upload
maud-jacobs
View
216
Download
0
Embed Size (px)
Citation preview
V 1.0
DBMAN8
Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras
V 1.0
DBMAN8
Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras
V 1.0
Data representation• What are the boundaries of the questions the database
can answer?• Models the real world: mini world with limitations• 3 levels:
– Conceptual model: A world described by the DB– Implementation/representation model: a model
understandable for the DBMS (structured records, tables, fields, etc.)
– Physical model: DBMS implemented on the computer (files, programs)
V 1.0
Structure of the DBMS• The user has permission for the smaller part of the DB:
View
Conceptual model
Implementation model
Physical model
View1 View2 View3
V 1.0
Example: university DB• View: Teachers can see info about their courses (= DQL)• Conceptual model (= tables)
– Student (sid: string, name: string, age: integer, cumulative average: real)
– Subject (subid: string, sname: string, credit: integer)– Registration (sid: string, subid: string, mark: integer)
• Implementation model (= DDL)– Create table subject (
subid varchar(10) not null primary key, sname varchar (50) not null, credit int not null )
• Physical model: files containing unsorted [email protected] 5
V 1.0
Structure of the DBMS• Layers
Query optimization and-execution
Relational operators
Files and permissions
Buffering
Handling storage
Handling concurrency and
recovery control is taken
into consideration
DBOS
V 1.0
Steps of a query1. User ”asks” of the DBMS (SQL query)2. DBMS checks the permission in the schema3. DBMS checks the permission in the subschema4. DBMS asks the OS to execute the I/O operation5. OS looks for the asked record6. OS imports the record into the system buffer7. OS notifies the DBMS8. Record is taken the the user workspace9. DBMS notifies the user about the recieved data
+1. How to get the actual data chunks from the [email protected] 7
V 1.0
DBMAN8
Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras
V 1.0
Records in files• DBMS handles records and files• Files: collection of pages/blocks containing records• They must support
– DML (insert, update, delete)– Read records (identified by rid)– Read all the records (satisfying some conditions)
• How to store the blocks?
V 1.0
Way of storage - RAID• Redundant Array of Inexpensive/Independent Data• Connecting disks logically, storing data redundantly• Aims:
– Minimizing data loss, increase reliability– Increasing capacity by more smaller/cheaper disks– Increase data access performance– Increase flexibility (can be replaced during usage)
• Data striping: Data is partitioned into striping units and the partitions are distributed on several disks
• Redundancy: Data is strored redundantly so that reconstruction of data in case of disk failure is possible
V 1.0
Levels of RAID – Level 0 – JBOD
• Non redundant• If one of the disks fails, data is lost• Parallel reading/writing• If the capacity of the disks is different
then the performance depends on the worst disk
V 1.0
RAID Levels – Level 1 – Mirror
• Mirrored, the data is the same on every disk
• If one of the disks fails then data can be reconstructed
• Parallel reading with increased velocity• Parallel writing with normal velocity• If the capacity of the disks is different
then the performance depends on the worst disk
• Does not use data striping
V 1.0
RAID 0+1 and RAID 1+0 (RAID 10)
• RAID 0+1: speed of RAID 0 and redundancy of RAID 1
• Min 4 disks
• RAID 10: first mirroring, then connecting
• If a disk fails, only that RAID 1 is involved
V 1.0
RAID Levels – Level 2
• Uses data striping (unit=1 bit) but some of the disks are used to store error-correcting codes
• ECC: redundant bits calculated from data bits (compress)
• In the strip the corresponding strip’s error correcting code is stored.
• Not used any more (HDDs handle error correction)
V 1.0
RAID Levels – Level 3• Bit-Interleaved Parity• Cannot identify the failed disk (disk controllers do that)• One check disk with parity information• The failed disk’s data can be recovered• Can process only one I/O at a time• Strips=1 bit
V 1.0
RAID Levels – Level 4• Block-Interleaved Parity• Like RAID 3, with strips as disk blocks• Supports serving multiple users• Parity disk needs to be updated at every write, can be
bottle neck• In case of disk failure, reading speed reduces
V 1.0
RAID Levels – Level 5• Block-Interleaved Distributed Parity• Rotating parity: parity is not stored on a single check
disk, but uniformly over all disks• Parallel read and write• Similar to RAID 3 and 4 depending on the size of strips• If a disks fails, it has to be replaced inmediately
otherwise if another fails, all data will be lost
V 1.0
RAID Levels – Level 6• High possibility of the failure of a second disk during disk
recovery use an extra disk• Needs 2 check disks• Able to recover from up to two simultaneous disk
failures• Read and write speed is equal to RAID 5
V 1.0
Unordered (heap) files• Okay, now we can access blocks... How to access the
actually needed data block from the files?
• Simplest file structure: heap file• For the record-level operations DBMS must register
– pages in the file– free space in the page– records in the page
• Sloooooooooooow!
V 1.0
Heap file as a linked list
• Address of the header page and the name of the heap file must be stored in a known location
• Every page contains two pointers in addition• Typical problems of a linked list: slow!
Header
Page
DataPage
DataPage
DataPage
DataPage
DataPage
DataPage Pages with
free space
Full pages
V 1.0
Directory-based heap file• Maintain directory
of pages• DBMS stores the address
of the first pageof each heap file
• Directory=collection of pages(e.g. chained list)
• Counter for every page: amount of free space/entry• Can be faster... Structure of the directory = key factor!• Directory = index file ... We can use more than one index
files for one table!
V 1.0
Example, library
1. Search the books of Asimov2. Search for the book Foundation
V 1.0
Access methods• B trees (B+ trees, B* trees...)• Hash-based structures• We already learned the basics of those from
Programming II.
• Let’s recap: Hash vs Symmetric vs assymetric encryptionUsage in DB: table joins (equality check), partitioning index structures, encryption
• Let’s recap: B tree - basic structure, always balanced, search, insert (split node), delete (merge nodes)Usage in DB: key-based LUTs, custom ordered indices(note: char indices are SLOW – use hash indexes!)
V 1.0
B-tree variants• In the B+ tree, copies of the keys are stored in the
internal nodes; the keys and records are stored in leaves; in addition, a leaf node may include a pointer to the next leaf node to speed sequential access
• The B* tree balances more neighboring internal nodes to keep the internal nodes more densely packed. This variant requires non-root nodes to be at least 2/3 full instead of ½
• Instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. When both nodes are full, then the two nodes are split into three. Deleting nodes very complex
V 1.0
Indexes• To speed up not supported operations• Collection of data entries to speed up search• Rid=pointer to the entries
V 1.0
Index properties: Clusered vs. unclustered• Clustered
– Ordering of data is similar to ordering of indexes (data sorted by the search key on every page)
– Expensive• Unclustered
– Random ordering of data
V 1.0 [email protected] 27
V 1.0
Index properties: Dense vs. sparse• Dense:
– It contains at least one data entry for every search key that appears in the data file
– Useful optimization techniques rely on it• Sparse
– Contains one entry for each page in the data file– Much smaller
V 1.0
Index properties: Primary vs. secondary• Primary
– Index on a set of fields that includes primary key• Secondary
– Not primary index• Unique
– Contains a key candidate
V 1.0
DBMAN8
Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras
V 1.0
Consequent operations
a1 b1
a1 b2
a1 b3
a1 b4
a2 b1
a2 b3
a3 b2
a3 b3
a3 b4
a4 b1
a4 b2
a4 b3
b1
b2
b3
a1
a4
V 1.0
Quotient examples
sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4
pno p2
pnop2p4
pnop1p2p4
snos1s2s3s4
snos1s4
snos1
V 1.0
DBMAN8
Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras
V 1.0
To conclude relational algebra/calculus• Relational algebra and relational calculus can express the
same• Declarative part is convenient in the queries (user-
friendly)• The algebra (way of calculation) is the task of the DB, it is
hidden from the user
V 1.0
DBMAN8
Data access layersFiles and IndicesRelational algebraRelational calculusRandom theoretical extras
V 1.0
SYSTEM PRIVILEGES - ORACLE
• CREATE SESSION – REQUIRED TO LOG IN• CREATE TABLE• CREATE VIEW• CREATE PROCEDURE• CREATE USER• ALTER ANY TABLE• ALTER ANY TRIGGER• SELECT ANY TABLE• DROP ANY TABLE• DROP USER• ALL PRIVILEGES
V 1.0
ROLES, USERS
• CREATE USER {name} IDENTIFIED BY {password};• DROP USER {name};
• Role ~ Group of privileges• CREATE ROLE {name};• DROP ROLE {name};
• Authentication: user+pass / user+pass+host / LDAP / PAM / Windows-based
V 1.0
SETTING SYSTEM PRIVILEGES• GRANT {privileges} TO {role/user};• "WITH GRANT OPTION"• GRANT {role} TO {user};• REVOKE {privileges / role} FROM {user};
V 1.0
OBJECT-PRIVILEGES• SELECT• INSERT• UPDATE• DELETE• ALTER• EXECUTE (for PL/SQL)• READ (for files)• REFERENCES (for constraints)• INDEX (for CREATE INDEX)
V 1.0
SETTING OBJECT-PRIVILEGES• GRANT {privileges} ON {object} TO {user} [WITH GRANT
OPTION];• REVOKE {privileges / ALL} ON {object} FROM {user};
V 1.0
SQL: Kleene’s three-valued logic
• If compared with NULL, it is usually NULL except when UNION or INTERSECT is used
A AND BB
True Unknown False
A
True True Unknown False
Unknown Unknown Unknown False
False False False False
A OR BB
True Unknown False
A
True True True True
Unknown True Unknown Unknown
False True Unknown False A NOT A
True False
Unknown Unknown
False True
V 1.0
Basic terms – should NOT be new!• Elements: , , 𝑎 𝑏 𝑐• Sets: , , 𝐴 𝐵 𝐶• Defining a set:
– enumeration: ={ , , }, thus 𝐴 𝑎 𝑏 𝑐 𝑎∈𝐴– rules: ={ | ≥100 ≤1000}𝐵 𝑥 𝑥 ∧𝑥
• Subset: , 𝐴⊂𝐵 if :∀𝑎∈𝐴 𝑎∈𝐵• Ordered set (vector): = , , 𝑞 ⟨𝑎 𝑏 𝑐⟩• Attributes: key vs secondary attributes• SQL = DML, DDL, DQL, DCL!
V 1.0 [email protected] 55