36
X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C. Arpaci- Dusseau Remzi H. Arpaci- Dusseau ADvanced Systems Laboratory Computer Sciences Department University of Wisconsin – Madison

X-RAY : A Non-Invasive Exclusive Caching Mechanism for RAIDs

Embed Size (px)

DESCRIPTION

Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau. X-RAY : A Non-Invasive Exclusive Caching Mechanism for RAIDs. ADvanced Systems Laboratory Computer Sciences Department University of Wisconsin – Madison. Host. Application. File system cache. - PowerPoint PPT Presentation

Citation preview

X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs

Lakshmi N. Bairavasundaram

Muthian Sivathanu

Andrea C. Arpaci-Dusseau

Remzi H. Arpaci-Dusseau

ADvanced Systems Laboratory

Computer Sciences Department

University of Wisconsin – Madison

Introduction Caching in modern systems

Multiple levels Storage: 2-level hierarchy

Level 1: File system (FS) cache Software-managed Main memory of host/client LRU-like cache replacement

Level 2: RAID cache Firmware-managed Memory inside RAID system Usually LRU replacement .......

File system cache

RAID cache

RAID

Application

Host

Introduction – contd. LRU

Replace LRU block Cache placement on read

Read Block no. 10

LRU MRU

Read Block no. 10

39 …….. 4523 10…….. 4523

Introduction – contd. LRU

Replace LRU block Cache placement on read

2 levels of LRU Redundant contents

……..

……..

Read Block no. 10

Read Block no. 10

Read Block no. 10

10

10

MRU

MRU

LRU

LRU10

LRU 10 MRU

MRULRU

11

11

12

12

….

….

FS Cache

RAID Cache

Introduction – contd. LRU

Cache placement on read Replace LRU block

2 levels of LRU Redundant contents

Goal: Exclusive caching 10

LRU 10 MRU

MRULRU

11

11

12

12

….

….

FS Cache

RAID Cache

Improved RAID Caching Multi-Queue (Zhou et al. 2001)

Add frequency component to cache policy Not strictly exclusive!

DEMOTE (Wong and Wilkes 2002) Change interface to disk File system issues “cache place” command Has perfect information and hence perfectly exclusive caches Interface changes – difficult to deploy

Ideal RAID Cache Exclusive caching

File system and RAID caches should have different contents

Global LRU Known to work well RAID cache should be a victim cache

No interface changes….

……

FS Cache

RAID Cache Block ReadVictim Block

LRU

MRU

X-RAY Observes disk traffic

Reads and writes to data and metadata

Builds a model of the FS cache Uses semantic knowledge Predicts size and contents of FS cache

Identifies set of exclusive blocks Recent victims of the FS cache

Reads blocks from disk into cache Result

A nearly exclusive cache without interface changes

File system cache

RAID cache

RAID

Host

Model of FS cacheX-RAY

Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion

File System Operation Applications perform file reads and writes File system (Unix)

Translates file accesses to disk block requests Metadata

To maintain application data on disk and manage disk blocks Periodically written to disk Examples: inodes, bitmap blocks

File System Operation Inode

Pointers to data blocks File access information

Inode

Data Blocks

Latest access time

Pointers to data blocks

File

File System Operation File access

Use inode to obtain pointers to disk data blocks Read corresponding blocks from disk if they are not in FS cache Update the access time information in inode

Metadata updates Periodically check for “dirty” inodes and write to disk

The Problem To observe disk traffic and infer

the contents of FS cache Why difficult?

FS cache size changes over time Shares main memory with virtual

memory system

The Problem To observe disk traffic and infer

the contents of FS cache Why difficult?

FS cache size changes over time Disk cannot observe all FS-level

accesses

Read block:

10

10

Disk Read

11

11

101112

12

12

LRU

LRU

MRU

MRU

FS Cache

FS Cache Model

RAID

The Problem To observe disk traffic and infer

the contents of FS cache Why difficult?

FS cache size changes over time Disk cannot observe all FS-level

accesses

Read block:

10

10

Disk Read

11

11

10

12

12

LRU

LRU

MRU

MRU

13

FS Cache

FS Cache Model

RAID

The ProblemRead block:

10

11 12

12 13

13

LRU

LRU

MRU

MRU

FS Cache

FS Cache Model

RAID

To observe disk traffic and infer the contents of FS cache

Why difficult? FS cache size changes over time Disk cannot observe all FS-level

accesses

The ProblemRead block:

10

11 12

12 13

13

LRU

LRU

MRU

MRU

FS Cache

FS Cache Model

RAID

To observe disk traffic and infer the contents of FS cache

Why difficult? FS cache size changes over time Disk cannot observe all FS-level

accesses

Key observation We need information about

accesses that hit in FS cache File system maintains access

information in inodes

Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion

Information Obtain information from observing disk traffic Knowledge of file system structures and operations

File system maintains time of last access in inodes Periodic inode writes Assuming whole file access, all blocks are in FS cache

Assume file system cache policy is LRU

Inferences Read for data block

Block will be placed in file system cache (MRU block)

Read for previously read data block Block became victim in file system cache Blocks with an earlier access time should also be victims

Inode write: new access time , no disk read observed All blocks belonging to file are in FS cache Other blocks with later access time should also be present

Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results Conclusion

Design

Recency list (R-list) List of data blocks ordered

by access time Cache Begin (CB) pointer

Divides R-list into inclusive and exclusive regions

RAID Cache contents Subset of blocks in exclusive

region

LRU MRUA, 1 B, 1 D, 3C, 2 F, 5E, 3

CBInclusive regionExclusive region

Block number Access time

Blocks the RAID should cache

Blocks expected to be in FS cache

Disk Read

LRU MRUA , 1 B , 1 C , 2 D , 3 E , 3 F , 4

CBInclusive regionExclusive region

Read Block ‘D’ ; time = 6

Disk Read

LRU MRUA , 1 B , 1 C , 2 D , 3 E , 3 F , 4

CBInclusive regionExclusive region

Read Block ‘D’ ; time = 6

Disk Read

LRU MRUA , 1 B , 1 C , 2 D , 6E , 3 F , 4

CBInclusive regionExclusive region

Read Block ‘D’ ; time = 6

Inode Write – Access time change

LRU MRUA , 1 B , 1 C , 2 D , 3 E , 4 F , 5

CBInclusive regionExclusive region

G , 7

Inode “23” : access time = 6

Semantic knowledge Inode “23” == blocks D & E

Blocks D, E : access time = 6

Inode Write – Access time change

LRU MRUA , 1 B , 1 C , 2 D , 3 E , 4 F , 5

CBInclusive regionExclusive region

G , 7

Blocks D, E : access time = 6Inode “23” : access time = 6

Inode Write – Access time change

LRU MRUA , 1 B , 1 C , 2 F , 5

D , 6 E , 6

CBInclusive regionExclusive region

G , 7

Blocks D, E : access time = 6Inode “23” : access time = 6

X-RAY Cache

LRU MRUA , 1 B , 1 C , 2 F , 5 D , 6 E , 6

CBInclusive regionExclusive region

G , 7

RAID Cache (size = 2 blocks)

Keep track of additions to window in exclusive region

X-RAY Cache

Read newly-added blocks from disk Replace blocks no longer in the window Additional disk bandwidth

Idle time, extra internal bandwidth, freeblock scheduling

LRU MRUA , 1 B , 1 C , 2 F , 5 D , 6 E , 6

CBInclusive regionExclusive region

G , 7

RAID Cache (size = 2 blocks)

Talk Outline Introduction File Systems Information and Inferences X-RAY Cache Design Results

Tracking FS Cache Contents RAID Cache Performance

Conclusion

Results – Tracking Accurate size and content prediction Highly responsive to FS cache size changes Tolerates changes in inode write interval Partial file reads

X-RAY performs well if percentage of partially accessed files is < 40% (typical traces have less than 30%)

Results – Cache Performance

Performs better than LRU and Multi-Queue

Close to DEMOTE, in spite of imperfect information

Hit rate advantage translates to lower read latency

Additional Results File system cache policy is not LRU

Clock, 2Q X-RAY performs nearly as well as before It performs better than both LRU and Multi-Queue

Idle time requirements X-RAY reads blocks into cache only during idle time It performs well if idle time is greater than one-third of actual idle time

observed in the trace

More in the paper …

Conclusion Easy deployment is an important goal in developing technology

Avoid interface changes – use non-invasive mechanisms

Higher-level systems maintain various pieces of information about data they manage Provide low-level systems with basic semantic knowledge

Semantic intelligence for managing RAID caches Use access information in metadata to track file system cache contents

and cache exclusive blocks In spite of imperfect information, X-RAY performs nearly as well as

changing the interface

Semantically-smart Disk Systems Availability, security and performance improvements

Questions ?

ADvanced Systems Laboratory (ADSL)

Computer Sciences, University of Wisconsin-Madison

http://www.cs.wisc.edu/adsl