Storage Workload Isolation via Tier Warming: How Models Can...


Citation preview

Storage Workload Isolation via Tier Warming:

How Models Can Help By Ji Xue, Feng Yan, Alma Riska, and

Evgenia Smirni

Presented By Christian Contreras

Proposed solution

•  Storage workload prediction model to support the scheduling in a Multi-tier Storage System. – Autonomic technique that learns the traffic

patterns and make the prediction of the traffic changes.

– Adjust the storage environment using the predictions.


•  Multi-tier storage system •  Traffic patterns •  Problem •  Goals •  Model •  Experimental results •  Simulation •  Conclusions and critiques

Mul$-­‐  $ered  Storage  System:  components  

Fast  &er  

Slow  &er  

Multi- tiered Storage System: Traffic

User  traffic   System  traffic  

Upload  Download  Read/write  

Replica$on  Backup  Restore  Disaster  Recovery  

Multi- tiered Storage System: Concerns

•  Tiered Storage Integration •  Traffic patterns •  Performance •  Availability •  Cost

Problem : How to schedule properly?

Traffic analysis (3 days) •  User traffic

–  Response time –  Availability

•  System traffic –  Time window –  I/O intensive

Overall intensity Working data sets Warm up

Fast tier What ? When ?

Multi- tiered Storage System •  I/O hierarchical structure and traffic patterns

•  Warm up reasoning: •  Fast tier à the user workload traffic

response time requirement. •  Slow tier à the system traffic

time window requirement. •  Challenge: •  what portion of the voluminous data set to

bring up to the smaller fast tiers. •  when to do it,

so the system performance is the highest.

Propose solution : Goals

•  Predict when drastic changes happen

•  Proactively prepare the system for heavy user workload

To optimize Multi-tier Storage System performance.

Analysis: Scenarios

Analysis: Scenarios

CDFs of user response time of 1 day for different algorithms.

Can we predict the changes as good as the proactive scenario?

Proposed solution

•  Markovian-based model that captures the duration of the low/high traffic intensities in user arrivals across different time scales.

•  Model that captures the changes in user performance as a function of fast tier hit rate.

•  Predict when the periods of high/low intensities arrive to schedule system work and cache warm up to optimize the system performance.

Proposed solution: Algorithm for scheduling

Prediction Model Scheduling

Prediction Model: Prediction-based Scheduling Policy

Prediction Model: Traffic trace Daily  pa?ern   Weekly  pa?ern  

Prediction Model: Traffic trace Daily pattern Weekly pattern

Prediction Model: Traffic trace Arrival intensity changes overtime

Prediction Model: Traffic trace

Prediction Model: Class classification

Prediction Model: State classification

Use clustering Silhouette1 and K-means to determine the states

1  Silhoue*e  is  used  to  calculate  the  dissimilarity  value  s(i)  of  the  average  arrival  intensity  of  day  i.  

Prediction Model: High level Markovian model



Prediction Model: High level Markovian model


H2 L2


Cluster analysis: Silhouettes •  A graphical aid to the

interpretation and validation of cluster analysis

•  The dissimilarity value s(i) is defined : –  i is the day index –  a(i) is the average dissimilarity of day i to all other days within

the same cluster –  b(i) is the lowest average dissimilarity of day i to all days in

a different cluster .

•  Values of s(i) are in [−1, 1] à The larger its value the better •  s(i) approaches to 1, a(i) ≪ b(i),

which means that the distance between data within each cluster is the smallest

•  Algorithm to determine the number of clusters.à Classes and states

Estimation method for the instant fast tier hit rate •  Goal of estimating how it changes as active user data moves from

the slow tier up to the fast tier and vice versa. •  Necessary to warm up the fast tier cache rather than allow it to be

warmed up gradually by the user accesses. •  Fast tier hit rate estimated :


Where : µ(t) is average service rate of user traffic at time t. µorigi is the original average service rate of user traffic (no system workload) S(t) is the service slowdown which describes how the average service rate changes from the original one. Rslow is the average slow storage tier access speed Rfast is the average fast storage tier access speed C is the capacity F is the transfer speed S(ti-START) is the Service slowdown at the beginning of the time window i serving the additional work

When to warm up

Actual and predicted arrival intensity state changes

Proposed solution: Algorithm for scheduling

Prediction Model Scheduling

Proposed solution: Algorithm for scheduling  

Algorithm  for  scheduling  system  work  in  a  mul$-­‐$er  storage  system

Algorithm State-based Scheduling Low State

Algorithm State-based Scheduling High State

Testbed:  Hardware  &  workload  •  Server  Memory  is  12GB  and  disk  enclosure  12  SATA  7200RPM  HDDs  of  3TB  each.  

•  System  memory  emulates  the  fast  $er    •  The  disk  enclosure  the  slow  $er  used  for  the  bulk  of  the  data  

•  Workload  –  It  use  “fio”  as  the  IO  workload  generator  –  The  working  set  size  for  the  user  workload  is  1GB  –  The  system  working  set  is  24GB.  –  The  workload  is  generated  and  measured  at  the  host  machine.    

•  The  fast  $er  is  warmed  up  via  a  sequen$al  read  of  the  user  working  set.  

•  The  user  ac$ve  working  set  can  be  determined  by  evalua$ng  sta$s$cally  access  pa?erns  such  as  the  number  of  accesses  per  storage  loca$on.  

Testbed:  System  work  scheduling  policies  

•  user-­‐only  -­‐  used  only  as  a  baseline  to  evaluate  the  impact  of  the  addi$onal  system  work.  

•  feedback-­‐based  -­‐  a  reac$ve  policy  that  monitors  the  current  load  intensity  in  the  system  and  determines  if  it  is  in  a  high  or  low  intensity  period,  

•  predic7on-­‐based  -­‐  a  proac$ve  policy  that  uses  the  proposed  Markovian  model  to  predict  user  traffic  intensity  by  having  learned  from  past  data  the  dura$on  of  periods  of  high  and  low  intensity.  

Experimental Results: user-­‐only  policy

User  IOPS  (throughput)  and  user  response  $me  over  $me,  user-­‐only  policy.  

Experimental Results

User  IOPS  (throughput)  and  user  response  $me  over  $me,  user-­‐only  policy.  

Experimental Results

User  IOPS  (throughput)  and  user  response  $me  over  $me,  user-­‐only  policy.  

Experimental Results

Response time with warm up and without warm up across the experiment time

Simulation results

•  Feedback method –  Use online detection –  Stop and change –  Delay to change

•  Prediction method –  Uses fast tier hit rate

prediction –  Change warming up




Predicted state change by feedback method and prediction method.

- Using traces containing the arrival time - Two-tiered storage system

Evalua$on  of  the  predic$on  model  accuracy  

Simulation results

CDF  of  user  response  $me

Simulation results

Performance  comparisons  via  simula$on.    Note  the  throughput  for  system  work  is  null  in  the  user  only  case.

Conclusions •  The Markovian-based model •  A prediction-based scheduling policy •  The prediction-based policy is very close to the

ideal scenario (knowledge of the future). •  It demonstrates the effectiveness of the prediction

model. –  It detects the incoming High state –  Proactively warm up the fast tier.

•  The larger the fast tier the higher benefit of the predicted approach.

Conclusions and critiques •  Simplistic approach

–  “Determining the user working set (i.e., what to bring) is outside the scope of this paper.” p.3.

–  “The fast tier is warmed up via a sequential read of the user working set.”

–  “The user active working set can be determined by evaluating statistically access patterns such as the number of accesses per storage location.”

•  Performance improvement of the window of workload change. •  Other factors : OS parameters (e.g., paging) , data set locations,

network architecture.

•  Improve –  Model, Testbed, Sensibility analysis

Reference •  Storage Workload Isolation via TierWarming: How

Models Can Help by Ji Xue, Feng Yan, Alma Riska, and Evgenia Smirni

•  Storage Workload Isolation via Tier Warming: How Models Can Help, presentation at ICAC2014.

