Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data

Preview:

DESCRIPTION

Caching is widely known to be an effective method for improving I/O performance by storing frequently used data on higher speed storage components. However, most existing studies that focus on caching performance evaluate fairly small files populating a relatively small cache. Few reports are available that detail the performance of traditional cache replacement policies on extremely large caches. Do such traditional caching policies still work effectively when applied to systems with petabytes of data? In this paper, we comprehensively evaluate the performance of several cache policies, which include First-In-First-Out (FIFO), Least Recently Used (LRU) and Least Frequently Used (LFU), on the global satellite imagery distribution application maintained by the U.S. Geological Survey (USGS) Earth Resources Observation and Science Center (EROS). Evidence is presented suggesting traditional caching policies are capable of providing performance gains when applied to large data sets as with smaller data sets. Our evaluation is based on approximately three million real-world satellite images download requests representing global user download behavior since October 2008.

Citation preview

Performance Evaluation of Traditional Caching Policies on

A Large System with Petabytes of Data

Texas State University, TX, USA1

Texas A&M University-Kingsville2

Auburn University, AL, USA3

04/08/2023

Ribel Fares1, Brian Romoser1, Ziliang Zong1, Mais Nijim2 and Xiao Qin3

Presented at the 7th IEEE International Conference on Networking, Architecture, and Storage (NAS2012)

2

Motivation

04/08/2023

• Large-scale data processing • High-performance storage systems

3

High-Performance Clusters

04/08/2023

• The Architecture of a Cluster

Client Network switch

Computing Nodes

Storage subsystems (or Storage Area Network)Internet

Head Node

4

Techniques for High-Performance Storage Systems

• Caching • Prefetching• Active Storage• Parallel Processing

5

Do traditional caching policies still work effectively?

Over 4 petabytes of satellite imagery available.

More than 3 million image requests since 2008.

Earth Resources Observation and Science (EROS) Center

6

EROS Data Center - System Workflow

datadata

datadata

USGS / EROS Storage System

Type Model CapacityHardwar

eBus

Interface

1Sun/Oracle

F5100 100 TB SSD SAS/FC

2 IBM DS3400 1 PB HDD SATA

3 Sun/Oracle T10K 10 PB Tape Infiniband

The FTP server from which users download images is of type 1.

The USGS / EROS Distribution System

• Each cache miss costs 20–30 minutes of processing time.

USGS / EROS Log File

•Landsat

L

•ETM+ sensor

E

•Satellite designation

7

•WRS path

004•W

RS row

063

•Acquisition year

2006

•Acquisition day of year

247

•Capture station

ASN

•Version

00

Observation 1

• Top 9 aggressive users account for 18% of all requests.

• A second log file was created by removing requests made by the top 9 aggressive users.

Observation 2

• Duplicate images within 7 days were removed from the log file.

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

0 2 4 6 8 10 12 14 16

Dup

licat

e Re

ques

t Per

cent

age

Time Window (Days)

Caching Algorithms

• FIFO: First entry in cache gets removed first.

• LRU: Least recently requested image removed first.

• LFU: Least popular image removed first.

Case StudiesSimulation Number Cache Policy Cache Size (TB) Top 9 Aggressive

Users

1 FIFO 30 Included

2 FIFO 30 Not Included

3 FIFO 60 Included

4 FIFO 60 Not Included

5 LRU 30 Included

6 LRU 30 Not Included

7 LRU 60 Included

8 LRU 60 Not Included

9 LFU 30 Included

10 LFU 30 Not Included

11 LFU 60 Included

12 LFU 60 Not Included

Simulation Assumptions/Restrictions

• When cache server reaches 90% capacity, images will be removed according to adopted cache policy until server load is reduced down to 45%.

• Images are assumed to be processed instantaneously.

• A requested image can not be removed from the server before 7 days.

Results – Hit Rates ofDiffering Cache Replacement Policies

Oct-08

Dec-08

Feb-09

Apr-09

Jun-09

Aug-09

Oct-09

Dec-09

Feb-10

Apr-10

Jun-10

Aug-10

Oct-10

Dec-10

Feb-11

Apr-11

Jun-11

Aug-11

Oct-11

0.15

0.25

0.35

0.45

0.55

0.65

Hit Rates: 60TB – Aggressive Users Included

LRU Hit RateLFU Hit RateFIFO Hit Rate

First Clean Up

Monthly hit ratiosAggressive users excluded

Monthly hit ratiosAggressive users excluded

Results – Impact of Inclusion ofAggressive Users

FIFO LRU LFU

30 TB 0.32661 0.345919 0.339515

60 TB 0.438536 0.457727 0.454811

0.025

0.075

0.125

0.175

0.225

0.275

0.325

0.375

0.425

0.475

30 TB

60 TB

With Aggressive Users

Hit R

ate

Results – Impact of Exclusion ofAggressive Users

FIFO LRU LFU

30 TB 0.319171 0.332741 0.345208

60 TB 0.430349 0.449621 0.45871

0.025

0.075

0.125

0.175

0.225

0.275

0.325

0.375

0.425

0.475

30 TB

60 TB

No Aggressive Users

Hit R

ate

Conclusion & Future Work

• LRU and LFU initiate cache clean-up at similar points.

• Aggressive users destabilize monthly hit rates

• LFU was least affected by the inclusion of aggressive users.

Conclusion & Future Work cont’d.

• LRU and LFU methods improve FIFO as expected.

• However, improvements are on the weaker side.

• Global user behaviors should be further investigated to design more complex caching and/or prefetching strategies.

22

Summary

• Data-Intensive Processing– EROS (Earth Resources Observation and Science)

Data Center– visEROS

• Improving I/O Performance– Prefetching– Active Storage– Parallel Processing

23

The VisEROS Project – Motivation

• 2M downloads from the EROS data center.• No existing visualization tools available to utilize

these data• Need a tool to:– Monitor user download behaviors– Show the current global download “hot spots”– Demonstrate the actual usage of EROS data– Optimize the storage system performance– Improve the satellite image distribution service

24

The VisEROS Prototype

Generated by VisEROS

25

This project is supported by

the U.S. National Science Foundation

No. 0917137

Download the presentation slideshttp://www.slideshare.net/xqin74

Google: slideshare Xiao Qin

27

Many Thanks!

04/08/2023

Recommended