37
Flash-Based Caching For Databases - Energy Efficiency and Performance Ankit Chaudhary

Flash-Based Caching For Databases - Energy Efficiency and Performance

  • Upload
    elias

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Flash-Based Caching For Databases - Energy Efficiency and Performance. Ankit Chaudhary. Problem Statement. How to use flash memory as database caching device? What is the performance improvement? What is the energy efficiency?. Flash Memory. Semiconductor based non-volatile memory. - PowerPoint PPT Presentation

Citation preview

Page 1: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Flash-Based Caching For Databases -

Energy Efficiency and Performance

Ankit Chaudhary

Page 2: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Problem Statement• How to use flash memory as database caching device?

• What is the performance improvement?

• What is the energy efficiency?

Page 3: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Flash Memory• Semiconductor based non-volatile memory.

• Used as SSDs, flash drives, mobile device memory.

• Type:• NOR• NAND :

• Single-level cell• Multi-level cell

• Lower density• Higher erase time

• Higher latency• Shorter life span

Page 4: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Flash Memory – Important Properties

• Does not contain mechanical arm like, HDD.

• Does not require frequent refreshing of capacitors due to charge leak like DRAM.

• Helps in increasing read throughput.

• Helps in reduced power consumption. missi

ng mechanical

arm & disk movement

• Helps in reduced power consumption.

Page 5: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Flash Memory –Operations

3 Operations : Read, Write/Program, and EraseErase sets bit to 1.Write/Program sets bit to 0.

Page 6: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Flash Memory –Problems

• Erase before write

• Write endurance

• Flash random write

Erase sets bit to 1.Write sets bit to 0.

To update the value, we need to erase entire block.

10,000 to 100,000 program/erase cycles.

Throughput is lower.

Page 7: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Flash Memory – Comparison with HDD and DRAM

Data referred:[Yi12]

Page 8: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Architectures• 2-tier Architecture

• 3-tier Architecture

• Hybrid Architecture :• complexity at bottom layer and buffer management.• NOT USED

Page 9: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Basic 3TA Working

Case 1

Look for the page P in DRAM based Buffer (Tt)

Page P found

Look for the page P in Flash based Cache(Tm)

Page P found

Serve the Data from Disk Drive (Tb)

Page Request Served

Request for Page P

Yes

Yes

Yes

No (Page fault in Tt)

No (Page fault in Tm)

Page Located In Top-tier

Page 10: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Basic 3TA Working

Case 2

Look for the page P in DRAM based Buffer (Tt)

Page P found

Look for the page P in Flash based Cache(Tm)

Page P found

Serve the Data from Disk Drive (Tb)

Page Request Served

Request for Page P

Yes

Yes

Yes

No (Page fault in Tt)

No (Page fault in Tm)

Page Located In Middle-tier

Page 11: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Basic 3TA Working

Case 3

Look for the page P in DRAM based Buffer (Tt)

Page P found

Look for the page P in Flash based Cache(Tm)

Page P found

Serve the Data from Disk Drive (Tb)

Page Request Served

Request for Page P

Yes

Yes

Yes

No (Page fault in Tt)

No (Page fault in Tm)

Page Located In Bottom-tier

Page 12: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Energy & Performance Efficiency

• Two replacement algorithm for cache management

• LOC : Local Algorithm• LRU-based replacement algorithm.

• Doesn’t have any information about Top-tier.

• Duplicity of data between Top-tier and Middle-tier.

• GLB : Global Algorithm• LRU-based replacement algorithm.

• Have information about Top-tier as well.

• Duplicity of data between Top-tier and Middle-tier does not exists.

Page 13: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Local Replacement Algo.(LOC)

H Ls

LRU Page

MRU Page

H = directory Tm = middle-tier cache Ls = cache slot List

Request for reading page P from Tm

Tm

Look for slot c containing page P in

directory H

Read page P from slot c

Move slot c to MRU position of Ls

c

Update H and return P

Case 1

Page 14: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Local Replacement Algo.(LOC)

v

H Ls

LRU Page

MRU Page

H = directory Tm = middle-tier cache Ls = cache slot List Tb = bottom-tier disk drive

Request for reading page P from Tm

Tm

Look for slot c containing page P in

directory H

Start page eviction process

Select a victim v, LRU of Ls. Check if it is dirty

then write it to Tb

Load P from Tb to v and move it to MRU

Case 2

P

Update H and return P

Page 15: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Global Replacement Algo.(GLB)

• In case of a page fault at Tt , GLB loads the page from Tm to Tt.

• If there is a cache miss at Tm, the page will directly be loaded to

Tt from Tb.

• In both cases, there will be a page eviction from Tt to Tm.

• IMPORTANT:

• Unlike LOC, GLB loads the page into Tt before

serving the request.

Page 16: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Global Replacement (GLB) -Page Eviction Algo.

v

H Ls

LRU Page

MRU Page

H = directory Tm = middle-tier cache Ls = cache slot List Tb = bottom-tier disk drive

Request for evicting page P to Tm

Tm

Start page eviction process

Select a victim v, LRU of Ls. Check if it is dirty

then write it to Tb

Load P from Tt to v and move it to MRU

P

Update H

Page 17: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Experiment• Comparison between 2TA, LOC and GLB (3TA).• Used simulation and real-life environment for computing the

results.• Results computed for varying sizes of Tm(using “s” parameter).• Computed Virtual Execution Time for 2TA, LOC and GLB• Computed Power Consumption for 2TA, LOC and GLB.

• Formulas Used :Virtual Execution Time

; Access time for middle-tier

; Access time for bottom-tier

Power Consumption

Page 18: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Results : Simulation Based

(c) (d)

using TPC-E

using TPC-C using TPC-H

using TPC-E

Data referred:[Yi12]

Page 19: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Results : Simulation Based

Energy consumption of the TPC-E trace for b = 1000

Data referred:[Yi12]

Page 20: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Results : Real-life

(a) Real-life trace performance : execution time (sec) for each b ϵ {1000, …., 32000}

Real-life trace performance : for b = 32000Data referred:[Yi12]

Page 21: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Conclusion

1. 3TA is better then 2TA in terms of both performance and

energy efficiency.

2. LOC performs better for bigger sizes of flash based middle-

tier.

3. GLB performs better for smaller sizes of flash based middle-

tier.

Page 22: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

What about FTL ?• FTL makes cache management algorithm to work on flash

memory without modification.• FTL provides transparent access to flash memory.

• BUT ……. It is proprietary and vendor specific.FTL : Flash Translation Layer

Page 23: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Small Introduction to GC1. Select the sets of garbage blocks. Each garbage block consists of

valid/invalid pages.2. Move all valid pages from garbage blocks to another sets of free

blocks and update the management information.3. Erase the garbage blocks, which in return will create free blocks.

GC : Garbage Collection

v v iv iv iv viv

v v v v v vv

v v v

v v v v v vv

Page 24: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Problems

• Proprietary FTL = difficult for standardizing the performance.

• No control over various expensive operations like GC,

performed by FTL.

• Cold Page Migration : moving unnecessary cold but valid

pages during the process of GC, leading to expensive and less

efficient operations.

• Inefficient GC = frequent GC = more erase operations.

• Reduced life of flash device due to flash endurance.

Page 25: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Solution• Two approaches :• Logical Page Drop (LPD)

• Access flash memory using FTL.• Introduces a new operation: Delete.• Proactive cold-page dropping.

• Native Flash Access (NFA)• Directly accesses flash memory.• Implements customized GC process.• Block management structure (BMS), maintains validity/cleanliness of

pages.• Bulk GC processing.• Intelligently selects the victim garbage block.

Page 26: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Logical Page Dropping

v v iv iv iv viv

v v v v v vv

v v v

v v v v v

S F d=4

Request for free slot

Tm

Provides the free slot from F

Remove the slot from F

S = set of occupied slots F = set of free slots d = number of victim slots

v

Case 1

F ≠ ф

Page 27: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Logical Page Dropping

v v iv iv iv viv

v v v v v iviv

v iv iv iv iv vv

v v v v v vv

v v v

v v v v v ivv

S F d=4

Request for free slot

Tm

Select a victim slot vb and evict it

Evict d pages & perform delete operation

S = set of occupied slots F = set of free slots d = number of victim slots

Case 2

F = ф

Provide vb as free slot

Perform GC on the block

Page 28: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Native Flash Access –Allocation Algorithm

v v iv iv iv viv

v v v v v vv

v v v

v v v v v

S F Wl =2000 & Wh=60000

Request for free slot

wp

Tm

wp provides the address of free slot

wp increments and point to next free slot

S = set of occupied slots F = set of free slots Wl = low watermark Wh = high watermark Wp = write pointer

v

Case 1

Current block is not full

Page 29: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Native Flash Access –Allocation Algorithm

v v iv iv iv viv

v v v v v vv

v v v

v v iv iv v v

S F Wl =2000 & Wh=20000

Request for free slot

wp

Tm

wp points to the first free slot of next free block

Check the value F with Wl

S = set of occupied slots F = set of free slots Wl = low watermark Wh = high watermark Wp = write pointer

v

Case 2

Current block is full

If |F| < Wl then perform GC’s until F ≥ Wh

v v v vv

v v v

Page 30: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Native Flash Access –Garbage Collection Algorithm

v v v v

v v v v v vv

v v v

v v iv iv v v

S F t= 1/1/13 01:42:53

Check the validity of pages in block

Tm

If All Pages = Valid then, select the victim block

Drop all Valid pages where LAT ≤ t

S = set of occupied slots F = set of free slots t = page-dropping threshold LAT = Last Access Time

v

move others to free slots

v v v

Erase the block and mark it free

Page 31: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Experiment

• Comparison between NFA, LPD, and Baseline (BL).

• Used simulation environment to calculate the results.

• Use of LRU for selecting victim pages or blocks.

• Use of greedy policy for selecting the victim block with least

number of valid block.

• 128 pages X 512 blocks setup for all three approaches.

BL= is the middle-tier cache with indirect flash access working without delete operation.

Page 32: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Result

(a) Throughput (IOPS)

(b) TPC-C (c) TPC-H (d) TPC-E

Breakdown of the trace execution time (seconds) into the fraction of GC tg, cache overhead tc, and disk accesses tb

Data referred:[Yi12]

Page 33: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Result

Distribution of the number of valid pages in garbage-collection blocks. A bar of height y at position x on the x-axis means that it happened y times that a block containing x valid pages got garbage collected.

Number of erase for each block. Each position on the x-axis refers to a block

Data referred:[Yi12]

Page 34: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Conclusion

• NFA and LPD outperforms BL in terms of throughput and GC

efficiency.

• NFA seems to be the better option compared to both LPD and

BL.

• Use of NFA and LPD also take care of wear-levelling.

• Directly accessing flash memory without using FTL helps both

in performance and lifetime improvment.

Page 35: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Summary

• 3-tier architecture performs better than 2-tier architecture

both in terms of energy efficiency and performance.

• Using flash memory as secondary cache improves the

performance significantly.

• Native access of flash memory helps in improving

performance and life of flash device.

Page 36: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

References• [RB09] D.Roberts, T.Kgil, et al.: Integrating NAND device onto servers. Communications of the ACM, vol. 52, no. 4, pages 98-103, 2009.

• [KM07] J.Koomey: Estimating total power consumption by servers in the US and the world. http://sites.and.com/de/Documents/svrpwrusecompletefinal.pdf, February 2007.

• [ID08] The diverse and exploding digital universe (an IDC white paper). http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf, March 2008.

• [AR02] ARIE TAL, M-Systems Newark, CA: NAND vs.\, NOR flash technology. The designer should weigh the options when using flash memory (Article). http://www.electronicproducts.com/Digital_ICs/NAND_vs_NOR_flash_technology.aspx, January 2002.

• [BY10] Byung-Woo Nam, Gap-Joo Na and Sang-Won Lee: A Hybrid Flash Memory SSD Scheme for Enterprise Database Applications, April 2010.

• [TD12] TDK Global: SMART Storage Solution for Industrial Application (Technical Journal), January 2012.

• [GA05] Eran Gal and Sivan Toledo, School of Computer Science, Tel-Aviv University: Algorithms and Data Structures for Flash Memories, January 2005.

• [IO09] Ioannis Koltsidas and Stratis D. Viglas, School of Informatics, University of Edinburgh: Flash-Enabled Database Storage, March 2010

• [SE10] Seongcheol Hong and Dongkun Shin, School of Information and Communication Engineering Sungkyunkwan University Suwon, Korea: NAND Flash-based Disk Cache Using SLC/MLC Combined Flash Memory, May 2010.

• [TH05] Theo Harder: DBMS Architecture -- The Layer Model and its Evolution. March 2005

• [Yi12] Yi Ou: Ph.D. Thesis report, University of Kaiserslautern, Caching for flash-based databases and flash-based caching for databases. August 2012

Page 37: Flash-Based  Caching For  Databases - Energy  Efficiency and Performance

Questions?