Download ppt - A Case for Flash Memory SSD in Enterprise Database Applications Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo Kim Published

A Case for Flash Memory SSD in Enterprise Database Applications

Authors: Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, Sang-Woo KimPublished on SIGMOD2008

Presented by Jin Xiong11/4/2008

2

Outline• Flash memory SSD• DB storage and workload• Experimental settings• Transaction log• MVCC rollback segment• Temporary table spaces• Conclusions

3

Flash memory SSD (1)

• Flash memory SSD– NAND-type flash memory– SAMSUNG– Interface: IDE

4

Flash memory SSD (2)

• Characteristics– Uniform random access speed

• Purely electronic device, no mechanically moving parts• Access latency is almost linearly proportional to the amo

unt of data irrespective of their physical locations in flash memory.

• One of the key characteristics we can take advantage of– Erase before overwriting

• Data on SSD cannot be updated in place• Erase unit is much larger than a sector, 128KB vs. 1KB• Erase is time consuming, typically 1-2 ms

– Asymmetry of read and write speed • Write is much slower than read on SSD, 0.4 ms vs 0.2 ms i

n this paper

5

Flash memory SSD (3)• Hardware logic

– Dual channel architecture, 4-way interleaving– Hide flash programming latency and increase

bandwidth– 128KB SRAM for program code, data and buffer

memory

6

Flash memory SSD (4)• Firmware: Flash translation layer (FTL)

– Address mapping and wear leveling• Address the issue of limited write cycles of each sector• Based-on super-blocks: 1MB, 8 erase units, 2 on each

flash chip• Limit the amount of information required for mapping

• Trends – Two-fold annual increase in the density– Original used in mobile computing devices

• PDA’s, MP3 players, mobile phones, digital cameras

– Recently more and more used in portable computers and enterprise server market

– Tremendous potential as a new storage medium that can replace magnetic disk and achieve much higher performance for enterprise database servers

7

DB Storage• Data structures in DB systems

– Database tables and indexes• Not within the scope of this paper

– Transaction log• Whenever a transaction updates a data object, its log rec

ord is created• Must be kept on stable storage for recoverability and dur

ability– Temporary tables

• Used to store temporary data required for performing operations such as sorts or joins

– Rollback segments• Used in multiversion concurrent control (MVCC)

8

DB Workload • Typical transactional database workloads, e.g. TP

C-C– Little locality and sequentiality– Many synchronous writes

• Forced writes of log records at commit time• Must wait until data are written on disk

– Prefetching and write buffering are less effective– Performance is limited by disk latency rather than disk b

andwidth and capacity– The latency-bandwidth imbalance of disk seems to be m

ore serious in the future– Low latency of SSD

• Improve performance significantly

9

Experimental Settings• Two machines with identical hardware

except disk– 1.86 GHz Intel Pentium dual-core processor– 2GB RAM

• OS: Linux-2.6.22• Disk

– SSD: Samsung Standard Type, 32GB, PATA (IDE), SLD NAND

– HDD: Seagate Barracuda, 250 GB, 7200 rpm, SATA

• DB– A commercial database server– Used HDD/SSD as a raw device (not through FS)– Database tables were cached in memory

10

Transaction log• Synchronous writes

– When a transaction commits, it appends a commit type log record to the log, and force-writes the log tail to stable storage

• Response time– Tresponse = Tcpu + Tread + Twrite + Tcommit– Tcommit is a significant overhead, waiting disk I/O– Commit time delay is a serious bottleneck

• Append-only sequential writes– HDD: no seek delay , avg latency 4.17ms (7200 rpm)– SSD: do not cause expensive merge or erase operation

s if clean blocks are available

11

Transaction log

• Simple SQL transactions– Multi-threaded concurrent transactions– TPS on SSD is much higher (12x-4x) than that

on HDD– The gap is shrinking with the increase of the

number of concurrent transactions– HDD: Disk access latency is the bottleneck, low

CPU utilization– SSD

• Limited by CPU rather than I/O• Saturated CPU utilization, no increase in TPS

12

Transaction log• TPC-B benchmark performance

– A stress test: transaction commit rate is higher than that of TCP-C

– Suitable for testing the log storage: a large number of small transactions causing significant forced-write activities

– The number of concurrent users: 20– TPS on SSD is 3.5x – Considerably lower log write latency on SSD– CPU is the bottleneck for SSD

13

Transaction log• I/O-bound vs CPU-bound

– SSD: faster CPU improves TPS• Dual-core: saturated at about 3000 TPS• Quad-core: saturated at about 4300 TPS

– HDD: almost no difference

14

MVCC rollback segment• MVCC — Multiversion concurrency control

– An alternative to the traditional concurrency control mechanism based on lock

– When updating a data object, its before image is written to a rollback segment, then the new data is applied to it

– When reading a data object, search for the correct version on rollback segment

– Two advantages• Minimize performance penalty on concurrent updates of

transactions, because read consistency is supported without any lock

• Support snapshot isolation and time travel queries– Cost

• Costly read operation: search through a long list of versions of a data object if it is updated many times

15

MVCC rollback segment• Write pattern 1

– Append only, sequential write – Multiple streams in parallel– 1MB extent

• Write pattern 2– In-place writes to a small logical regi

on• HDD is expected to perform poorly

– Disk arm movement each 1MB– Excessive disk seek

• SSD is expected to perform well– No additional cost when there are cl

ean blocks– Reclamation cost can be amortized

• Infrequent, every 1MB extent • Slight performance difference

– SSD: avg 6.8ms/block– HDD: avg 7.1ms/block

16

MVCC rollback segment• Read pattern

– Clustered, randomly scattered across quite a large logical address space (1GB)

• Performance– SSD: 16x faster than HDD

17

Temporary table spaces• External sort

– Typical algorithm• Partitions an input data set

into smaller chunks• Sorts the chunks separately• Merges them into a single

sorted file

– I/O pattern• Sequential write followed by

random read

– Performance• Sequential write: small

difference• Random read: SSD almost 10

times faster

18

Temporary table spaces

• External sort– Effect of cluster size on sort

performance• HDD: sort performance is

improved with larger cluster size

• SSD: sort performance is deteriorated

• Reasons: – Larger cluster is good for the

first stage, but not good for merging

– The second stage dominates the performance

– Effect of buffer cache size on sort performance• Performance is improved with

larger buffer size in both cases

19

Temporary table spaces• Hash join

– Similarity with sort algorithm• Partition input data set into smaller chunks, and process each chunk

separately

– Opposite I/O pattern• Random writes followed by sequential reads

– Performance• SSD is expected to perform poorly in the first stage• Actual result is unexpected, sequential append-only write in the first

stage• SSD is 3 times faster than HDD

20

Temporary table spaces

• Sort-merge join– SSD is 7 times faster– HDD: sort-merge join is two times slower than

hash join– SSD: sort-merger join is as fast as hash join

21

Conclusions

• Demonstrated that processing I/O requests for transaction log, rollback and temporary data can become a serious bottleneck for transaction processing

• Showed that flash memory SSD can alleviate this bottleneck drastically

• Due attention should be paid to SSD in all aspect of DB system design to maximize the benefit from this new technology