28
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

Embed Size (px)

Citation preview

Page 1: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

1

CS222: Principles of Database Management Fall 2010

Professor Chen Li

Department of Computer Science

University of California, Irvine

Notes 01

Page 2: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 2

Topic 1: Data Storage and Record-Oriented File Systems • Data Storage

– Storage hierarchy– Disks

• Record-oriented file systems

Page 3: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 3

Storage hierarchy

CPU

Memory Controller

Disk/tape

......

cache

Page 4: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 4

Storage Media• Cache: inside/outside CPU

– CPU: becoming faster and faster (>=3 GHz now)

• Main Memory– costs $100/Mbyte -- reduces every year

– ‘volatile’ -- does not survive system failures

– random I/O very fast

– data can be processed by CPU directly

– capacity limited to orders of magnitude lower than what database needs.

Page 5: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 5

Storage Media: secondary storage• Disks (floppy disks, hard disks, CD)

– Cheap, and price reduces each year

– Non-volatile (except when disk crashes)

– Random I/O slow

– Data needs to be transferred to memory to be processed by CPU

• Tape– Cheaper but slower than disks.

– Sequential I/O devices.

– Handy for backups, sometimes for archival.

Page 6: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 6

Databases and Storage Devices• Due to capacity, cost, volatility factors, DBs usually stored in disks.

• Data brought to main memory for processing from disks

• There are many ways to interface memory with disk resident data

• E.g., virtual memory:– VM size limited to max address generated by CPU

– Existing VM does not support durability

• File system provides a more powerful mapping between memory and disk storage

• A bunch of tricks used ensure that high latency of secondary storage does not impact application response time and system throughput– access disks asynchronously with active applications

– prefetch data before application needs it

– intelligent caching techniques

Page 7: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 7

Disk Storages -- Outline

• Disk mechanics

• Access times (random, sequential)

• Examples

• Optimization

• Other topics

Page 8: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 8

Terms: Spindle, Platters, Magnetic surfaces, Disk head, Disk controller, …

Disk mechanics

Page 9: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 9

Top Views

TracksSectorsGaps

Cylinders

Page 10: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 10

Characteristics

• Diameter: 1 inch -- 15 inches• Cylinders: 100 -- 2000• Surfaces: 1 (CDs) -- many• Tracks/Cyl: 2 (floppies) -- 30• Sector Size: 512B -- 50K• Capacity: 360 KB (old floppy) --

>=200GB

Page 11: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 11

“Block”

• Corresponds to 1 or multiple sectors

• Its address consists of:– Physical device # (in case of multi disks)– Cylinder #– Surface #– Sector #

Page 12: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 12

block xin memory

I wantblock X

Random disk access time

Time = Seek Time + Rotational Delay + Transfer Time + Other time 1 time 2 time 3 time 4

Page 13: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 13

3 or 5x

x

1 N

Cylinders Traveled

Time

Time 1: seek time

Page 14: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 14

Average Random Seek Time

SeekTime(Track i Track j)

S =

N(N-1)

N N

i=1 j=1ji

• Assumptions: – Each track has the same probability to be accessed.

– Each track has the probability to jump to another track.

• Typical S value: 10 ms – 50 ms

Page 15: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 15

Time 2: Rotational Delay

Initial Head

Block Wanted

• Average delay: – R = 1/2 revolution

– If disk speed 3600 RPM, then R = 8.33 ms

Page 16: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 16

Complication

May have to wait for start of track before we can read desired block

Head Here

Block We Want

Track Start

Page 17: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 17

Time 3: Transfer time

• Transfer time: block size/transfer rate

• Typical transfer rate:1 3 MB/sec

Page 18: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 18

Time 4: Other Delays

• CPU time to issue I/O

• Contention for controller

• Contention for bus, memory, etc.

Typical value: “0”

Page 19: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 19

• Reading “Next” block

• Additional time = Block size/transfer rate

• Other time negligible:– skip gaps– once in a while, next cylinder

Sequential disk access

Page 20: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 20

• Average sequential IO time much smaller than random IO time

– Random I/O: 20 ms (most time on the initial delay)

– Sequential I/O: 1 ms.

• When designing a structure, try to use sequential IOs.– Data layout on disk becomes critical

– Do not just look at the number of IOs

Random I/O vs Sequential I/O

Page 21: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 21

Modify blocks

• Read block

• Modify in memory

• Write block

• Verify– Optional– If so, the access time needs to add:

full rotation + block size/transfer rate

Page 22: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 22

Disk Specs:• 3.5 in diameter• 3600 RPM• 1 surface• Usable capacity: 16 MB = 224

• # of cylinders: 128 = 27

• 1 block = 1 sector = 1 KB• 10% overhead between blocks (gaps)• seek time:

– average = 25 ms. – adjacent cyl = 5 ms.

Example 1

Page 23: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 23

• bytes/cyl = 224/27 = 217 = 128 KB

• blocks/cyl = 128 KB / 1 KB = 128

Cylinder

Page 24: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 24

One track ...

Track

• Speed: – 3600 RPM 60 revolutions / sec 16.66 ms/rev

• In each revolution:– Time over useful data: 16.66 * 0.9=14.99 ms

– Time over gaps: 16.66 * 0.1 = 1.66 ms

– Transfer time 1 block = 14.99/128 = 0.117 ms

– Trans. time 1 block + gap = 16.66/128 = 0.13ms

Page 25: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 25

Bandwidths

• Burst bandwidth:– No time on gaps (10%)– 1 KB in 0.117 ms.

BB=1KB / 0.117ms = 8.54 KB/ms = 8.33MB/sec

• Sustained bandwidth:– Including time on gaps– 128 KB in 16.66 ms.

SB=128KB /16.66ms = 7.68 KB/ms = 7.50 MB/sec

Page 26: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 26

Time of random block access

• Time to read one random block T1• T1 = seek time + rotational delay + Transfer time

– Assume we do not have to wait for track start

– Seek time = 25ms

– Rotational delay = 16.66ms /2 = 8.33 ms

– Transfer time = .117 ms

– Total = 25 ms + 8.33 ms + .117 ms= 33.45 ms

• Most of the time is on “seek time” and “rotational delay”!

Page 27: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 27

Larger blocks?

• Suppose OS deals with 4 KB blocks

• We need to include the time of reading 1 block (without gap) and 3 blocks (with gaps)

• T4 = 25ms + (16.66ms/2) + (.117) x 1 + (.130) * 3 = 33.83 ms

• Compare to T1 = 33.45 ms – not much difference– That’s why we want to use sequential IOs!

...1 2 3 4

1 block

Page 28: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01

CS222 Notes 01 28

Reading a track

• TT = Time to read a full track (start at any block)

• TT = 25ms (seek time)

+ (0.13ms / 2) (rotational delay, half of a block)

+ 16.66 ms (transfer time)

= 41.73 ms• The time could be a bit less by ignoring the last gap.• Question: what if we need to wait for the start of a

track?