Shared Disk File Caching that Account for Delays in Space Reservation, Transfer and Processing

1

Shared Disk File Caching that Account for Delays in

Space Reservation, Transfer and Processing

PI: Ekow J. Otoowith Frank Olken Arie Shoshani Donghui Guo (Postdoc)

2

Goals:• To develop a policy advisory module (PAM) for a coordinated optimal file caching and replication in distributed data repositories

• Efficient processing of file requests on distributed datasets accessed through Storage Resource Managers (SRMs).

• Middleware components that manage storage and access data on Data Grids. • Example projects: Particle Physics Data Grid (PPDG) and Earth Science Grid (ESG).

Areas of Application:

3

Example: Multi-tier Model of Dataset Distribution

Tier 0 CERN

Tier 1

Tier 2

Departmental

Desktop

FNAL RAL IN2P3

Univ. A Lab. B

1 2 3 N

Application Context

4

Data Accesses at a Single Site

Shared disk

ProcessingNode

Mass Storage System

File requests

Network

Multiple Clients Using a Shared Disk for Accessing Remote MSS

5

AdmissionServer

(S1)

MQ1

RQ1

ReplacementServer

(S2)

MQ2

RQ2

CachingServer

(S3)

MQ3

RQ3

ProcessingServer

(S4)

MQ4

RQ4

DRQ

ArrivalProcess

Administrativefunctions

start/stop Suspend/resume

Backward wake up messages:

S3 notifies S1 of cached fileS1 wakes up by ArrivalS2 wakes up S1S3 wakes up S2S4 wakes up S1

Exception messages:

S1 sends exceptions to S2S2 send exception to S3S1 sends exceptions to S4

Advisory objects

Simulation Model of an SRM

Caching request

stage/transferrequest

processingrequest

file was cached

Advisory objectRequest for service object

(wakes up server)

LegendRQ – Request QueueMQ – Message QueueDRQ – Delayed Request Queue

6

AdmissionServer

(S1)

MQ1

RQ1

ReplacementServer

(S2)

MQ2

RQ2

CachingServer

(S3)

MQ3

RQ3

ProcessingServer

(S4)

MQ4

RQ4

DRQ

ArrivalProcess

Administrativefunctions

start/stop Suspend/resume

Backward wake up messages:

S3 notifies S1 of cached fileS1 wakes up by ArrivalS2 wakes up S1S3 wakes up S2S4 wakes up S1

Exception messages:

S1 sends exceptions to S2S2 send exception to S3S1 sends exceptions to S4

Advisory objects

Flow of Objects and Messages in the Model

Caching request

stage/transferrequest

processingrequest

file was cached

queue not full

exception orwake up msg

queue not full



queue not full

Advisory objectRequest for service object

(wakes up server)

LegendRQ – Request QueueMQ – Message QueueDRQ – Delayed Request Queue

7

Role of the Policy Advisory Module

Two Principal Components

1. A disk cache replacement policy• Evaluates which files are to be replaced when space is needed

2. Admission Policy for File Requests• Determines which request is to be processed next• E.g. may prefer to admit requests for files already

in Cache

Work completed concerns disk cache replacementpolicies which we focus on next.

8

Main Results on Caching Policies

• Popular caching algorithms, such as LRU, LRU-K are inappropriate for disk caching over wide-area-network.

• Remote access cost, transfer cost, and rate of requests impact caching policies.

• We have developed an optimal replacement policy is based on a cost-beneficial function computed at time t0 as

)(

)(*

)()(

0

0

0

00 t

ttttt S

CK

K

where K is number of backward references retained,)(

0tS)(

0tC is the cost ofaccessing a file and is the size of file at time t0

• We proved analytically that this policy is optimal.

9

Main Results (Cont.)

Two new practical implementations were developed:

- Maximum Inter-arrival Time with K backward references (MIT-K) - Least Cost Beneficial with K backward references (LCB-K)

• Verified behavior of these practical algorithms by using workloads of MSS file accesses from Jefferson’s Laboratory as well as synthetic workloads

• The main measure we targeted is “average cost-per-file reference” because it is the important cost in a wide-area network

• LCB-K and MIT-K are shown to give best metric measure of average cost-per-file reference

10

Comparison of Hit Ratios

•RND: Random •LFU: Least Frequently Used•LRU: Least Recently Used•MIT-K: Maximum Inter-Arrival Time based on last K references•LCB-K: Least Cost Beneficial based on last K references

Replacement Policies:

11

Comparison of Byte Hit Ratios

12

Comparison of Average Cost Per Reference

13

Scatter Plot of Sample of Generated Workload

14

Comparison of Average Cost Per Reference

15

Future Work

• Complete the implementation of admission policy algorithms

• Evaluate performance under different combinations of admission and cache replacement policies

• Model and evaluate the performance of multiple site SRMs under different network configurations

• Determine impact on policies of global information, s opposed to only local information, on performance.

• Extend model to include failure detection and recovery

• Add PAM to SRM implementations

16

Comparison of Times to Evaluate Replacement

Documents

Shared Disk File Caching that Account for Delays in Space Reservation, Transfer and Processing