36
IN3020/IN4020 – Database Systems Spring 2020, Week 15.2 SUMMARY Dr. M. Naci Akkøk, Chief Architect, Oracle Nordics Based upon slides by E. Thorstensen from Spring 2019

IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

IN3020/IN4020 – Database Systems Spring 2020, Week 15.2

SUMMARY

Dr. M. Naci Akkøk, Chief Architect, Oracle NordicsBased upon slides by E. Thorstensen from Spring 2019

Page 2: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

The course

o This is a Database Systems course

o IN3020 is a bachelor level course, and this is the last lecture covering IN3020 curriculum (common with IN4020 so far), though the rest is also recommended

o IN4020 is a higher-level course. There will be approximately 5 x 45 minutes (5 lecture hours) of material. 2 hours by external lecturers, rest by me.

Page 3: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Today

o We will use 1 hour for summary and exam form etc. Relevant for both IN3020 and IN4020

o We will use the second hour to go through DBMS architecture and dependencies/interactions between layers/components.

o The second hour is a requirement for IN5020 – Distributed Systems. Relevant and required for IN4020 and recommended (but not required) for IN3020 students planning to take IN5020 any time soon.

o Do ask questions, but in chat.

Page 4: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Repetition & Highlights

Page 5: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Goals:At the end of the course, you should be able to…o Understand what Database Management Systems (DBMS) do,

how they work and how they are implementedo Have a good grasp of the theoretical & practical aspects of

database administration (which you will hear more about next week Monday in “Current and Future Role of the DBA”)

o Have a good grasp of use of SQL and how queries are optimizedo Understand the principles of transaction managemento Know about the TX management types (isolation levels) that

modern DBMS offer, and their respective strengths/weaknesses

Page 6: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

We looked at these challenges (and more)

o PERFORMANCE, OPTIMIZATION:What can one do if the queries are slow?

o CONCURRENCY:How can one read from and write to the same database at the same time without problems? How would the performance be effected?

o RECOVERY:What happens if the database crashes while updating the data?

Page 7: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

We took up some background we needed to build up the course

o DBMS architecture (relevant also for IN5020)o DBMS file systemso Various data structures and algorithmso Relational algebra (relevant for query optimization)

Page 8: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

The course is divided into several parts

o Main focus:o Queries, SQL and query optimizationo DBMS characteristics and mechanismso ACID, transaction management, concurrency, isolation

levels o Other DBMS (slightly less than previous years)

o DBMS architecture (syllabus for IN4020)o Additional bit about security, DBA´s role, what data science &

analytics require of DBMS, emerging technologies and database research

Page 9: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Syllabus: Queries, SQL and query optimization

o SQL (repetition and a bit more)o Query optimization (including relational algebra)o Indexes, index usage and the underlying searcho Query plans and optimization

Page 10: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Syllabus: ACID, DBMS characteristics and Mechanisms o Managing multiple concurrent uses/users and protecting

the datao ACID – what it stands foro Transaction Management, concurrencyo Locking, logging, buffer & cache managemento Relationships between these mechanismso Synchronization/replication challengeso Serialization, isolation

Page 11: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Syllabus: Other database systems (briefly) –NoSQL DBMS and newer DBMS technologieso Categorization of the NoSQL DBMS and the problems they

address (their “reason of being”)o A bit more about spatial & graph databases, semantic

technologies and databases with built-in intelligence (AI/ML capabilities in and outside the “black box”)

o Autonomous databaseso Multi-model databases

Page 12: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

FOR IN4020(recommended otherwise as well)

Page 13: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Lectures for IN4020 (recommended for all)

o DBMS architecture and relations between DBMS layers/components (today, next hour)

o Security (guest, coming Monday)o Current and future role of the DBA (guest, coming Monday)

o Implications of data science & analytics, modern DBMS and newer mechanisms (coming Wednesday)

Page 14: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Lectures for IN4020 (recommended for all)

o R&D themeso Performance – how and for what purpose, exactly?o Big or fast data – or both? How? How can newer architectures like the Data

Mesh help?o What is the relation doing in the table? What happens to indexes and keys

and search (queries, joins etc.) if it is not part of the data? Can it be as cleanly formalized in an “extended” relational algebra?

o Intelligence in databases – autonomy, auto-inference (also in RDF semantics) and other built in mechanisms of learning and intelligence.

o Extraction-transformation-loading challenges: Intelligence in data quality management and I/O

Page 15: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

EXAM INFO

ABSOLTELY RECOMMENDED FOR ALL J

Page 16: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Exam informationo The final exam is now a take-home examo Evaluation: Pass/Fail

(preferably pass, of course J) o

You can use books and notes and slides and Google (alle hjelpemidler tillat)

o You cannot cooperate or ask your brother/sister to do it for you! Remember that strict social distancing is required J

Page 17: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Exam informationo You will receive the exam on the 26th of May 2020 as

announcedo You have 1 week to deliver the take-home examo The work will be the equivalent of maximum 2 days.o You will be informed how you are to deliver the take-home

(most likely Inspera but you will be informed when you get the exam)

Page 18: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Exam informationo This all means that there won´t be standard exam questions

you can simply look upo It means that there will be actual “do it in practice”

exercises directing you to formulate or try things out, show results and reason about them

o It mean that there will be discussion question, questions that ask you to reason etc., showing your understanding of the syllabus

Page 19: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

In other words…GOOD LUCK!

Page 20: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

(IN3020) IN4020 – Database Systems Spring 2020, Week 15.2

DBMS ARCHITECTURE & DEPENDENCIES BETWEEN COMPONENTS

Dr. M. Naci Akkøk, Chief Architect, Oracle NordicsBased upon slides by V. Goebel & E. Munthe-Kaas

Page 21: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Example DBMS architecture

The components

Typi

cal D

BMS

Com

pone

nts

Page 22: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Same architecture as functional layers

Applications

User Interfaces / View Management

Semantic Integrity Control / Authorization

Query Processing and Optimization

Storage Structures

Buffer Management

Concurrency Control / Logging

Interface

Control

Compilation

Execution

Data Access

Consistency

Tran

sact

ion

Man

agem

ent

Database

Page 23: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Dependencies between layers/componentso Central components (system functions): components that

manage their resources directly down to the operating system interface - system buffer management, lock component, log component (with savepoint management).

o Higher-level components (system functions): components demanding/requiring the central components as prerequisites - transaction management, access path management, sorting component, etc.

Page 24: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Dependencies between central components

LOG COMPONENT(with savepoint management)

LOCK COMPONENT

SYSTEM BUFFER MANAGEMENT

1 3

2

Page 25: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction 1: Log & Buffer (pool)o WAL (write ahead log) principle: log information is written to

secondary storage before the “real” information (pages) is written to secondary storage.

o Page replacement strategy determines which protocol (logging) approaches are applicable. For instance, logical protocol approach rquires indirect replacement strategy.

Page 26: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction 2: Log & Locko Logging unit ≤ Locking unito In case of logical protocol approaches, the logging unit is the set of

all data objects changed by a DML operation. If a rollback is performed no other transactions must be affected (damaged).

o If the lock component realizes only a simple 2-phase-locking protocol - instead of a strict 2-phase-commit protocol (all locks are kept until end-of-transaction), all relationships (interferences) between transactions (also read-only) about commonly used data items must be logged to enable a recursive rollback in case of failure.

Page 27: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction 3: Lock & Buffer (pool)o The main task of the lock component is to guarantee the

“logical single user mode”.

o This requires that the lock component controlls which pages a fixed and which pages are unfixed in the system buffer by the system buffer management (influencing paging strategy) to guarantee isolation (locking).

Page 28: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Dependencies between central & higher-level components

CENTRAL COMPONENTS

ACCESS PATH MANAGEMENT

TRANSACTION MANAGEMENT

SORTINGCOMPONENT

B CA D E F

Page 29: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction A: Log & Access path mgmnt. o Physical state logging:

Requires the maintenance and logging of access paths when inserting new data items.

o Logical state logging: requires no extra effort of access path maintenance because DML operations can be rolled back (inverted) and repeated.

Page 30: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction B: Lock & Access path mgmnt.o For B* trees (special access paths) an unadjusted locking

concept can decrease parallelism. No exclusive locks should be set on the root of the tree, because this would block all access paths along that tree.

Page 31: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction C: Transaction mgmnt. & Logo The “all-or-nothing” principle of a transaction requires the

logging of UNDO information for rollback and REDO information to repeat a transaction.

o In case of very many short transactions the logging component can become the bottleneck of the DBMS.

o One solution to this problem is to group transactions and to defer their end-of-transaction in order to write the log information in a blocked way to secondary storage.

Page 32: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction D: Tx & System buffer mgmnto The system buffer management is the central component

concerning performance optimization. o The number of parallel transactions has a strong influence

on the paging rate. o Transaction management should defer the activation of a

transaction if this would cause a decrease of the paging rate.

o Transaction management should group transactions in a buffer-oriented way for performance reasons.

Page 33: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction E: Tx management & Locko The locking component guarantees the transaction-oriented isolation

of data items. o It is possible that deadlock situations (circular resource allocations)

occur that require “unjustified” rollbacks of transactions to resolve the deadlock.

o Transaction management has to determine which transaction(s) have to be rolled back in order to cause minimum work loss.

o Transaction management and locking component have to be adjusted so that deadlocks are not possible, e.g., by pre-claiming(*), or sequentialization of transactions which have overlapping data areas.

Page 34: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Pre-Claiming Lock Protocolo Pre-claiming protocols evaluate their operations and create

a list of data items on which they need locks in advance.o Before initiating an execution, the transaction requests the

system for all the locks it needs beforehand. o If all the locks are granted, the transaction executes and

releases all the locks when all its operations are over. o If all the locks are not granted, the transaction rolls back

and waits until all the locks are granted.

Page 35: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

Interaction F: Sorting component & System buffer managemento Special precautions for system buffer management

are necessary if the sorting component is active (especially relevant for relational DBMS).

o The paging strategy should not be contra-productive to the applied sorting algorithm.

Page 36: IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …

SEE YOU NEXT WEEK!STAY SAFE, STAY HEALTHY!