CS 4432lecture #51 Data Items Records Blocks Files Memory Next:

CS 4432 lecture #5 1

Data Items

Records

Blocks

Files

Memory

Next:


Goal : placing records into blocks

blocks ...

a file

assume fixedlength blocks

assume a single file (for now)

records


(1) separating records(2) spanned vs. unspanned(3) mixed record types – clustering(4) split records(5) sequencing(6) indirection

Options for storing records in blocks:


Block

(a) no need to separate if fixed size records.(b) or, use special marker(c) or, give record lengths (or offsets)

- within each record- in block header

(1) Separating records

R2R1 R3


• Unspanned: records within one blockblock 1 block 2

...

• Spanned : records wrap across 2 blocks

block 1 block 2

...

(2) Spanned vs. Unspanned

R1 R2

R1

R3 R4 R5

R2 R3(a)

R3(b) R6R5R4 R7

(a)


• Unspanned is much simpler, but may sometimes waste space…

• Spanned essential if record size > block size

Spanned vs. unspanned:


Example

106 recordseach of size 2,050 bytes (fixed)block size = 4096 bytes

block 1 block 2

2050 bytes wasted 2046 2050 bytes wasted 2046

R1 R2

• Utiliz = 50% -> ½ of space is wasted


• Mixed - records of different types(e.g., EMPLOYEE, DEPT)allowed in same block

e.g., a block

(3) Mixed versus uniform record types

EMP e1 DEPT d1 DEPT d2


Why do we want to mix?

• Answer: CLUSTERING

Records that are frequently accessed together should beplaced into the same block


Example Clustering

Q1: select C_NAME, C_CITY, AMOUNT, …from DEPOSIT, CUSTOMERwhere DEPOSIT.C_NAME =

CUSTOMER.C.NAME

a blocklayout:

CUSTOMER,NAME=SMITH

DEPOSIT,NAME=SMITH

DEPOSIT,NAME=SMITH

CUSTOMER,NAME=JONES

Question: Good idea or bad idea ?


• If Q1 frequent with join on customer and deposit relations, then clustering good

• But if instead Q2 frequent with :Q2: SELECT *

FROM CUSTOMERthen clustering is counter-productive


Mixing of record types on one block?

Problems:

• Creates variable length records in block

• May cause us to store duplicates

• Insert/deletes are harder


Other Ideas ?

Compromise: No mixing, but keep related

records in same cylinder ...


(1) Separating records(2) Spanned vs. Unspanned(3) Mixed record types - Clustering(4) Split records(5) Sequencing(6) Indirection

Recap: Storing records in blocks


(1) separating records(2) spanned vs. unspanned(3) mixed record types – clustering(4) split records(5) sequencing(6) indirection

Options for storing records in blocks:


Fixed part in one block

Typically forhybrid format

Variable part in another block

(4) Split records


Block with fixed recs.

R1 (a)R1 (b)

Block with variable recs.

R2 (a)

R2 (b)

R2 (c)


• Ordering records in file (and thus blocks) by some key value– Sequential file ( sequenced file )

• Why sequencing ?– Typically maked it possible to efficiently read

records in order

(5) Sequencing


Sequencing Options

(a) Next record physically contiguous

...

(b) Linked

What about INSERT/ DELETE ?

Next (R1)R1

R1 Next (R1)


(c) Overflow area

Recordsin sequence

R1

R2

R3

R4

R5

Sequencing Options

header

R2.1

R1.3

R4.7


• How does one refer to records?

• Problem: Records can be on disk or in (virtual) memory. Need common address, but have different physical locations.

(6) Indirection Addressing

Rx

Many options: Physical Indirect


Purely Physical Addressing

Device ID E.g., Record Cylinder #

Address = Track #( ID ) Block #

Offset in block

Block ID


Fully Indirect Addressing

Solution: Record ID (Oracle: ROWID) as global address, maintain a map table.

Map Tablerec ID r address

a

Physicaladdr.Rec ID


Tradeoff

Flexibility Costto move records of indirection(for deletions, insertions) (lookup)

What to do : Options inbetween ?

Physical Indirect


Ex #1 : Indirection in block

Block Header

A block: Free

space

R3

R4

R1 R2


Ex. #2 Use logical block #’s understood by file system

instead of direct disk accessREC ID File ID

Block # Record # or Offset

File ID, PhysicalBlock # Block ID

File System Map


(1) Separating records(2) Spanned vs. Unspanned(3) Mixed record types - Clustering(4) Split records(5) Sequencing(6) Indirection

Recap: Storing records in blocks

Documents

CS 4432lecture #51 Data Items Records Blocks Files Memory Next: