12
Fall 2020 – University of Virginia 1 © Praphamontripong © Praphamontripong Storage and File Structure CS 4750 Database Systems [Silberschatz, Korth, Sudarshan, “Database System Concepts,” Ch.12, Ch.13] [https://www.staff.uni-mainz.de/neuffer/scsi/what_is_raid.html ]

Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 1© Praphamontripong© Praphamontripong

Storage and File Structure

CS 4750Database Systems

[Silberschatz, Korth, Sudarshan, “Database System Concepts,” Ch.12, Ch.13][https://www.staff.uni-mainz.de/neuffer/scsi/what_is_raid.html]

Page 2: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 2© Praphamontripong

Levels of Database Architecture• Databases are stored as files of records stored on disks

External level

Logical/conceptual level

Internal level

Physical level

External/conceptual mapping

Conceptual/internal mapping

DBMS

OS

view2 view3view1

Conceptual schema

Internal schema

Physical schema

DB DB DB DB

Page 3: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 3© Praphamontripong

Overview of Physical Storage Media• Data in a DB must be stored on some storage medium• DBMS software can retrieve, update, and process the data• Storage medium forms a hierarchy

Registers

Cache

Main memory

Flash memory (electronic disk)

Magnetic-disk storage

Optical storage

Tape storage (magnetic tapes)

Fastest and most costly

Temporary storage

Expensive, little/no application to DBMS

General purpose, hold programs and dataToo small, too expensive for DB More memory increases response time of DBMSContent are lost when power fails or system crashes

CDs – read-only data, archive and distribute

Primary, long-term on-line, most cost-effective storageDirect-access – possible to read data in any orderUsually survive power failures and system crashes

Sequential accessBackup and archive; large, cheap, and slow

Page 4: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 4© Praphamontripong

Three Kinds of Storage• Primary storage

• Volatile storage• Fast access time, expensive, small• Cache, main memory

• Secondary storage• Non-volatile• Moderately fast access time• For backup• Also called “on-line storage”• Flash memory, magnetic disks

• Tertiary storage• Non-volatile• Slow access time, cheap, large• Also called “off-line storage”• Magnetic tape, optical storage

Registers

Cache

Main memory

Flash memory (electronic disk)

Magnetic-disk storage

Optical storage

Tape storage (magnetic tapes)

Page 5: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 5© Praphamontripong

Primary StorageMain memory (RAM)

• Impact a DB

• To perform queries on the DB, load information from the DB off the hard drive and into RAM

• The more info loaded into RAM, the faster the DB run• Still have to write back to secondary memory for storage at some

point

• The more RAM in a DB system the better • Also the faster the RAM the better

• Allows more things to be run concurrently

• Allows more DB instances and more tables to be run at the same time without having to write back to disk

Page 6: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 6© Praphamontripong

Secondary StorageSSD (Solid-State Drive/Disk)

• Non-volatile storage device that persistent data in flash memory

• Big impact on DBs

• Main advantage of SSD for a DB – I/O speed

• While having quick access to RAM is great for processing, RAM is volatile memory (content lost when power fails/system crashes)

• The fastest form of non-volatile memory – quick to save to disk

• Gets more of the running application off the disk and into memory fast

• Improves the boot time (the initial application launch time)

• Enables fast transfer of data back and forth

Page 7: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 7© Praphamontripong

Storage Access• A DB is mapped into a number of different files• Files are maintained by the underlying operating system (OS)• Files are organized into blocks, which contain one or more data

item

Major goal of DBMS:

• Minimize the number of block transfers between the disk and memory

• Since it is not possible to keep all blocks in main memory, we need to manage the allocation of the space available for the storage of blocks

• This is similar to the problems encountered by the OS• OS is concerned with processes• DBMS is concerned with only one family of processes

Page 8: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 8© Praphamontripong

Magnetic Disk Mechanism• Access time

• Time it takes from when a read or write request is issued to when the data transfer begins

• Determined by seek time and rotational latency

• Data-transfer rage

• Rate at which data can be retrieved from or stored to the disk

• Mean time to failure (MTTF)

• The average time the disk is expected to run continuously without any failure

Page 9: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 9© Praphamontripong

Mean Time To Failure (MTTF)• On average, MTTF of a given hard drive is 100,000 hours of

operation (~11.41 years)

• Imagine a huge data center (for Google or Apple), how many hard drives do they have in one of their data centers?

• What if some hard drives fail?

• What happens to the data on those hard drives?

• How do we get those hard drive back up and running?

• How do we avoid down-time?

Page 10: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 10© Praphamontripong

RAID• Redundant Array of Independent/Inexpensive Disks

• Disk organization that takes advantage of utilizing large numbers of inexpensive, mass-market disks

• An array of multiple disks accessed in parallel will give greater throughput than a single disk

• Redundant data on multiple disks provides fault tolerance

• Key idea: improve reliability via redundancy

• Store extra information that can be used to rebuild information lost in case of a disk failure

• To create an optimal cost-effective RAID configuration:

• Maximize the number of disks being accessed in parallel

• Minimize the amount of disk space being used for redundant data

Page 11: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 11© Praphamontripong

Benefits of RAID• Backup: mirroring (or shadowing) – duplicate every disk to

provide redundancy

• Logical disk consists of two physical disks

• Every write is carried out on both disks

• If one disk in a pair fails, data still available in the other

• Performance: striping (or parallelism)

• Concatenate multiple drives into one logical storage unit

• Involve partitioning each drive’s storage space into strips – can range from one sector (512 bytes) to megabytes

• Stripes are then interleaved round-robin; thus the combined space is composed alternately of strips from each drive

Page 12: Storage and File Structure - cs.virginia.eduup3f/cs4750/slides/4750-storage-file-structure.… · 4750-storage-file-structure.pptx Author: Upsorn Praphamontripong Created Date: 4/3/2020

Fall 2020 – University of Virginia 12© Praphamontripong

Video on RAID

https://www.youtube.com/watch?v=wTcxRObq738

You should take notes while watching this video!Summarize the major points in your own words