49
CS 540 Database Management Systems Lecture 5: DBMS Architecture, storage, and access methods 1

CS 540 Database Management Systems

Embed Size (px)

DESCRIPTION

Database System Implementation User Requirements Conceptual Design Physical Storage Schema Entity Relationship(ER) Model Relational Model Files and Indexes

Citation preview

Page 1: CS 540 Database Management Systems

1

CS 540 Database Management Systems

Lecture 5: DBMS Architecture, storage, and access methods

Page 2: CS 540 Database Management Systems

2

Database System Implementation

Conceptual Design

Physical Storage Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

User Requirements

Page 3: CS 540 Database Management Systems

3

The advantage of RDBMS• It separates logical level (schema) from physical

level (implementation). • Physical data independence– Users do not worry about how their data is stored and

processes on the physical devices.– It is all SQL!– Their queries work over (almost) all RDBMS

deployments.

Page 4: CS 540 Database Management Systems

4

Challenges in physical level• Processor: 10000 – 100000 MIPS• Main memory: around 10 Gb/ sec.• Secondary storage: higher capacity and durability• Disk random access – Seek time + rotational latency + transfer time– Seek time: 4 ms - 15 ms!– Rotational latency: 2 ms – 7 ms!– Transfer time: at most 1000 Mb/ sec– Read, write in blocks.

Page 5: CS 540 Database Management Systems

5

Gloomy future: Moor’s law• Speed of processors and cost and maximum

capacity of storage increase exponentially over time.

• But storage (main and secondary) access time grows much more slowly.

Page 6: CS 540 Database Management Systems

6

Random access versus sequential access

• Disk random access : Seek time + rotational latency + transfer time.

• Disk sequential access: reading blocks next to each other

• No seek time or rotational latency • Much faster than random access

Page 7: CS 540 Database Management Systems

7

DBMS Architecture

Query Executor

Buffer Manager

Storage Manager

Storage

Transaction Manager

Logging & Recovery

Lock Manager

Buffers Lock Tables

Main Memory

User/Web Forms/Applications/DBA

query transaction

Query Optimizer

Query Rewriter

Query Parser

Files & Access Methods

Process manager

Page 8: CS 540 Database Management Systems

8

DBMS Architecture

Query Executor

Buffer Manager

Storage Manager

Storage

Transaction Manager

Logging & Recovery

Lock Manager

Buffers Lock Tables

Main Memory

User/Web Forms/Applications/DBA

query transaction

Query Optimizer

Query Rewriter

Query Parser

Files & Access Methods

Process manager

This lecture

Page 9: CS 540 Database Management Systems

9

A Design Dilemma

• To what extent should we reuse OS services? • Reuse as much as we can – Performance problem (inefficient)– Lack of control (incorrect crash recovery)

• Replicating some OS functions (“mini OS”) – Have its own buffer pool – Directly manage record structures with files –…

Page 10: CS 540 Database Management Systems

10

OS vs. DBMS Similarities?

• What do they manage?• What do they provide?

Page 11: CS 540 Database Management Systems

11

OS vs. DBMS: Similarities

• Purpose of an OS: – managing hardware– presenting interface abstraction to applications

• DBMS is in some sense an OS?– DBMS manages data– presenting interface abstraction to applications

• Both as API for application development!

Page 12: CS 540 Database Management Systems

12

OS vs. DBMS: Related Concepts• Process Management What DB concepts? – process synchronization– deadlock handling

• Storage management What DB concepts?– virtual memory– file system

Page 13: CS 540 Database Management Systems

13

OS vs. DBMS: Differences?

Page 14: CS 540 Database Management Systems

14

OS vs. DBMS: Differences• DBMS: Top-down to encapsulate high-level semantics!– Data• data with particular logical structures

– Queries• query language with well defined operations

– Transactions• transactions with ACID properties

• OS: Bottom-up to present low-level hardware

Page 15: CS 540 Database Management Systems

15

Problems with DBMS on top of OS

• Buffer pool management

• File system

• Process management

• Consistency control

• Paged virtual memory

Page 16: CS 540 Database Management Systems

16

Buffer Pool Management

• Performance of system calls• LRU replacement – Query-aware replacement needed for performance– Circular access: 1, 2, …, n, 1, 2, ..

• Prefetching– DBMS knows exactly which block is to be fetched next

• Crash recovery – Need “selected force out”

Page 17: CS 540 Database Management Systems

17

Relations vs. File system

• Data object abstraction– file: array of characters– relation: set of tuples

• Physical contiguity: – large DB files want clustering of blocks• sol1: managing raw disks by DBMS• sol2: simulate by managing free spaces in DBMS

• Multiple trees (access methods)– file access: directory hierarchy (user access method)– block access: inodes– tuple access: DBMS indexes

Page 18: CS 540 Database Management Systems

18

Process management• Reuse OS process management

– One process for each user• Problem: DB processes are large

– long time to switch between processes• Problem: critical sections

– Processes may have to wait for a descheduled process that has locks.

– n server processes that handle users’ requests• duplication of OS multi-tasking inside servers! • communication between processes:

– Message passing is not efficient

• Solutions: OS implements – favored processes

• not forced out, relinquish the control voluntarily. – faster message passing methods.

Page 19: CS 540 Database Management Systems

19

Consistency control

• OS provides some support for locking and recovery.– OS provides lock on files– DB requires lock on smaller units like tuples

• Commit point – Buffer manager ensures all changes are flushed on disk.– Buffer manager must know the inside of transactions.

Page 20: CS 540 Database Management Systems

20

State of the art• DBMSs duplicate some OS functionalities.• OS customized for DBMS

Page 21: CS 540 Database Management Systems

21

Access methods• The methods that RDBMS uses to retrieve the

data.• Attribute value(s) Tuple(s)

Page 22: CS 540 Database Management Systems

22

Types of search queries• Point query over Product(name, price) Select *

From ProductWhere name = ‘IPad-Pro’;

• Range query over Product(name, price) Select *

From ProductWhere price > 2 AND price <

10;

Page 23: CS 540 Database Management Systems

23

Types of access methods• Full table scan– Inefficient for both point and range queries.

• Sequential access– Efficient for both point and range queries. – Should keep the file sorted. • Inefficient to maintain

• Middle ground?

Page 24: CS 540 Database Management Systems

24

Indexing• An old idea

Page 25: CS 540 Database Management Systems

25

Index• A data structure that speeds up selecting tuples in

a relation based on some search keys.• Search key– A subset of the attributes in a relation–May not be the same as the (primary) key

• Entries in an index– (k, r)– k is the search key.– r is the pointer to a record (record id).

Page 26: CS 540 Database Management Systems

26

Index• Data file stores the table data. • Index file stores the index data structure.

• Index file is smaller than the data file. • Ideally, the index should fit in the main memory.

10

20

30

40

50

60

70

80

10

20

30

40

50

60

Data File Index File

Page 27: CS 540 Database Management Systems

27

Well known index structures• B+ trees:– very popular

• Hash tables: – Not frequently used

Page 28: CS 540 Database Management Systems

28

B+ trees• The index of a very large data file gets too large.

• How about building an index for the index file?

• A multi-level index, or a tree

Page 29: CS 540 Database Management Systems

29

B+ trees• Degree of the tree: d• Each node (except root) stores [d, 2d] keys:

10 32 94

[A , 10) [10, 32) [32, 94) [94, B)

Non-leaf nodes

12 28 32

12 28 32

39 41 65Leaf nodes

Records

Page 30: CS 540 Database Management Systems

30

Example

60

19 50 80 90 110

12 13 17 19 21 30 40 50 52 60 65 72

12 13 17 19 21 30 40 50 52 60 65 72

d = 2

Page 31: CS 540 Database Management Systems

31

Retrieving tuples using B+ tree • Point queries– Start from the root and follow the links to the leaf.

• Range queries– Find the lowest point in the range.– Then, follow the links between the nodes.

• The top levels are kept in the buffer pool.

Page 32: CS 540 Database Management Systems

32

Inserting a new key• Pick the proper leaf node and insert the key.• If the node contains more than 2d keys, split the

node and insert the extra node in the parent.

– If leaf level, add K3 to the right node

K1 K2 K3 K4 K5

R0 R1 R2 R3 R4 R5

K1 K2

R0 R1 R2

K4 K5

R3 R4 R5

(K3, ) parent

Page 33: CS 540 Database Management Systems

33

Example

60

19 50 80 90 110

12 13 17 19 21 30 40 50 52 60 65 72

12 13 17 19 21 30 40 50 52 60 65 72

Insert K = 18

Page 34: CS 540 Database Management Systems

34

Insertion

60

19 50 80 90 110

12 13 17 18 19 21 30 40 50 52 60 65 72

12 13 17 19 21 30 40 50 52 60 65 72

Insert K = 18

18

Page 35: CS 540 Database Management Systems

35

Insertion

60

19 50 80 90 110

12 13 17 18 50 52 60 65 72

12 13 17 19 21 30 40 50 52 60 65 72

Insert K= 20

19 20 21 30 40

2018

Page 36: CS 540 Database Management Systems

36

Insertion

60

19 50 80 90 110

12 13 17 18 50 52 60 65 72

12 13 17 19 21 30 40 50 52 60 65 72

Need to split the node

19 20 21 30 40

2018

Page 37: CS 540 Database Management Systems

37

Insertion

60

19 21 50 80 90 110

12 13 17 18 50 52 60 65 72

12 13 17 19 21 30 40 50 52 60 65 72

Split and update the parent node.What if we need to split the root?

20

19 20 21 30 40

18

Page 38: CS 540 Database Management Systems

38

Deletion

60

19 21 50 80 90 110

12 13 17 18 50 52 60 65 72

12 13 17 19 21 30 40 50 52 60 65 72

Delete K = 21

20

19 20 21 30 40

18

Page 39: CS 540 Database Management Systems

39

Deletion

60

19 21 50 80 90 110

12 13 17 18 50 52 60 65 72

12 13 17 19 30 40 50 52 60 65 72

Note: K = 21 may still remain in the internal levels

20

19 20 30 40

18

Page 40: CS 540 Database Management Systems

40

Deletion

60

19 21 50 80 90 110

12 13 17 18 50 52 60 65 72

12 13 17 19 30 40 50 52 60 65 72

Delete K = 20

20

19 20 30 40

18

Page 41: CS 540 Database Management Systems

41

Deletion

60

19 21 50 80 90 110

12 13 17 18 50 52 60 65 72

12 13 17 19 30 40 50 52 60 65 72

We need to update the number of keys on the node: Borrow from siblings: rotate

19 30 40

18

Page 42: CS 540 Database Management Systems

42

Deletion

60

19 21 50 80 90 110

12 13 17 50 52 60 65 72

12 13 17 19 30 40 50 52 60 65 72

We need to update the number of keys on the node: Borrow from siblings: rotate

18 19 30 40

18

Page 43: CS 540 Database Management Systems

43

Deletion

60

18 21 50 80 90 110

12 13 17 50 52 60 65 72

12 13 17 19 30 40 50 52 60 65 72

We need to update the number of keys on the node: Borrow from siblings: rotate

18 19 30 40

18

Page 44: CS 540 Database Management Systems

44

Deletion

60

18 21 50 80 90 110

12 13 17 50 52 60 65 72

12 13 17 19 30 40 50 52 60 65 72

What if we cannot borrow from siblings?Example: delete K = 30

18 19 30 40

18

Page 45: CS 540 Database Management Systems

45

Deletion

60

18 21 50 80 90 110

12 13 17 50 52 60 65 72

12 13 17 19 40 50 52 60 65 72

What if we cannot borrow from siblings?Merge with a sibling.

18 19 40

18

Page 46: CS 540 Database Management Systems

46

Deletion

60

18 21 50 80 90 110

12 13 17 50 52 60 65 72

12 13 17 19 40 50 52 60 65 72

What if we cannot borrow from siblings?Merge siblings!

18 19 40

18

Page 47: CS 540 Database Management Systems

47

Deletion

60

18 21 50 80 90 110

12 13 17 50 52 60 65 72

12 13 17 19 40 50 52 60 65 72

What to do with the dangling key and pointer? simply remove them

18 19 40

18

Page 48: CS 540 Database Management Systems

48

Deletion

60

18 50 80 90 110

12 13 17 50 52 60 65 72

12 13 17 19 40 50 52 60 65 72

Final tree

18 19 40

18

Page 49: CS 540 Database Management Systems

49

What You Should Know

• What are some major limitations of services provided by an OS in supporting a DBMS?

• In response to such limitations, what does a DBMS do?

• B+ tree indexing