Upload
alan-hazelton
View
219
Download
3
Embed Size (px)
Citation preview
MINERVA: an automated MINERVA: an automated resource provisioning resource provisioning
tool for large-scale tool for large-scale storage systemsstorage systems
G. Alvarez, E. Borowsky, S. Go, T. G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Romer, R. Becker-Szendy, R.
Golding, A. Merchant, M. Golding, A. Merchant, M. Spasojevic, A. Veitch, J. WilkesSpasojevic, A. Veitch, J. Wilkes
Large Scale Storage SystemsLarge Scale Storage Systems
►Very Difficult to configure and designVery Difficult to configure and design 10 – 100s of host computers10 – 100s of host computers 10 – 100s of storage devices10 – 100s of storage devices 10 – 1000s of Disks/Logical Volumes10 – 1000s of Disks/Logical Volumes Terabytes of capacityTerabytes of capacity
►Meet throughput demandsMeet throughput demands►Maximize capacity utilizationMaximize capacity utilization►Automation would be nice…Automation would be nice…
MINERVAMINERVA
► Subdivide problem into three stagesSubdivide problem into three stages Choose correct device setChoose correct device set Choose correct configuration parametersChoose correct configuration parameters Map user data onto devicesMap user data onto devices
►NP-hardNP-hard
► Architectural elementsArchitectural elements Declarative descriptions of storage workload Declarative descriptions of storage workload
requirementsrequirements Constraint-based problem representationConstraint-based problem representation Optimization strategies and heuristicsOptimization strategies and heuristics Analytic performance modelsAnalytic performance models
MINERVA InputsMINERVA Inputs
►Workload DescriptionWorkload Description Data type descriptions and access patternsData type descriptions and access patterns Two typesTwo types
►StoresStores Logically contiguous data (db table or filesystem)Logically contiguous data (db table or filesystem)
►StreamsStreams Sequences of accesses on a store (pattern and Sequences of accesses on a store (pattern and
throughput)throughput)
►Device DescriptionsDevice Descriptions Disk information (number, size, and type)Disk information (number, size, and type) Array information (number of LUNs)Array information (number of LUNs)
MINERVA ObjectsMINERVA Objects
MINERVA OutputsMINERVA Outputs
► AssignmentAssignment Device Set taken from Device DescriptionsDevice Set taken from Device Descriptions Mapping of stores to devicesMapping of stores to devices 22nnnnmm possible configurations possible configurations
►O((2m)O((2m)mm) complexity) complexity
GoalGoal►Minimum cost that meets performance requirementsMinimum cost that meets performance requirements
► Effector toolEffector tool Takes assignment as inputTakes assignment as input Automated configuration of physical devicesAutomated configuration of physical devices
Storage System LifecycleStorage System Lifecycle
ArchitectureArchitecture► Array AllocationArray Allocation
Tagger Tagger ► Assigns a preferred RAID levelAssigns a preferred RAID level
AllocatorAllocator► Determines number of arraysDetermines number of arrays
► Array ConfigurationArray Configuration Array DesignerArray Designer
► Actually configures the arraysActually configures the arrays► Store AssignmentStore Assignment
SolverSolver► Assigns stores to LUNsAssigns stores to LUNs
OptimizerOptimizer► Prunes unused resources and balances loadPrunes unused resources and balances load
► EvaluatorEvaluator Verifies design with analytic modelsVerifies design with analytic models
ArchitectureArchitecture
MINERVA ProcessMINERVA Process
Analytical Device ModelsAnalytical Device Models
►Determines feasibilityDetermines feasibility► Predicted throughput error rate = 20% Predicted throughput error rate = 20% ► Streams Streams
Modeled as ON-OFF Markov-modulated Poisson Modeled as ON-OFF Markov-modulated Poisson processprocess
► ArraysArrays Array controller, bus connection, disksArray controller, bus connection, disks
► Case StudyCase Study HP SureStore Model 30/FC High Availability disk HP SureStore Model 30/FC High Availability disk
arrayarray
TaggerTagger
► Choose storage class based on Choose storage class based on access patternaccess pattern
RAID 1/0 or RAID 5RAID 1/0 or RAID 5
► Rule BasedRule Based1.1. Determines capacity bound storesDetermines capacity bound stores
2.2. Estimates average number of IO ops per Estimates average number of IO ops per sec.sec.
IOPSIOPS
Capactiy RulesCapactiy Rules
►Calculated per GB of storageCalculated per GB of storage►Capacity bound = RAID 5Capacity bound = RAID 5
IOPS EstimationIOPS Estimation
►RAID level = least number of per-disk RAID level = least number of per-disk IOPSIOPS
AllocatorAllocator
►““reasonable” set of arraysreasonable” set of arrays►3 steps3 steps
Consider type and number of arraysConsider type and number of arrays Consider array configurationsConsider array configurations Consider LUN divisions and RAID Consider LUN divisions and RAID
configurationsconfigurations
Allocator modelsAllocator models
►Can only use analytic device modelsCan only use analytic device models► Ignores stream phasingIgnores stream phasing►Rillifier handles large resource Rillifier handles large resource
demandsdemands Distribute workload among different LUNsDistribute workload among different LUNs Stores become shardsStores become shards
►Excessive capacity requirementsExcessive capacity requirements
Streams become rillsStreams become rills►Excessive throughput requirementsExcessive throughput requirements
Allocator SearchAllocator Search
► Uses Branch-and-Bound strategyUses Branch-and-Bound strategy Determines number of array typesDetermines number of array types Chooses lowest cost that supports workloadChooses lowest cost that supports workload
► Searches array configurationsSearches array configurations Starts with mixed arraysStarts with mixed arrays Iteratively converts arrays to dedicated typesIteratively converts arrays to dedicated types Branch and Bound-bias dedicatedBranch and Bound-bias dedicated
►Searches in reverse order starting with dedicated typesSearches in reverse order starting with dedicated types
► Calls array designer with configurationCalls array designer with configuration If array designer fails, search continuesIf array designer fails, search continues
Array DesignerArray Designer
►Determines LUN sizes and array Determines LUN sizes and array parametersparameters
►Starts with simple cases of equal size LUNsStarts with simple cases of equal size LUNs Also considers greedy configurationAlso considers greedy configuration
►Workload description determines LUN sizeWorkload description determines LUN size
►Relies on Optimizer to take care of unused Relies on Optimizer to take care of unused capacitycapacity
►Target disk assignment done with round Target disk assignment done with round robin across busesrobin across buses
SolverSolver
►Assigns stores to LUNsAssigns stores to LUNs►Multidimensional constrained bin-Multidimensional constrained bin-
packingpacking Uses analytic device models to evaluate Uses analytic device models to evaluate
objective functionobjective function Constraints:Constraints:
►LUN capacityLUN capacity►LUN phased utilizationLUN phased utilization►Array bus bandwidthArray bus bandwidth►Array controller utilizationArray controller utilization
Solver HeuristicsSolver Heuristics
►Simple RandomSimple Random 50 random cases using first fit50 random cases using first fit
►ToyodaToyoda Best fit using gradient functionBest fit using gradient function
►Objective function combined with economic Objective function combined with economic utilizationutilization
►(1/penalty – lun_cost)(1/penalty – lun_cost) Favors LUNS already in use or low costFavors LUNS already in use or low cost
►LUNs filled in order of increasing costLUNs filled in order of increasing cost Minimizes resource contentionMinimizes resource contention
Solver Heuristics 2Solver Heuristics 2
►ToyodaWeightedToyodaWeighted Maps gradients against remaining Maps gradients against remaining
available resourcesavailable resources Maps stores to LUNs such that utilization Maps stores to LUNs such that utilization
is balancedis balanced Objective_function * cos(Objective_function * cos(αα)) Objective_function = max_lun_cost – Objective_function = max_lun_cost –
lun_costlun_cost►Minimizes costMinimizes cost
Toyoda and ToyodaWeightedToyoda and ToyodaWeighted
OptimizerOptimizer
► Reruns Solver against configurationReruns Solver against configuration Reduces required arraysReduces required arrays
► Runs ToyodaWeighted with new objective Runs ToyodaWeighted with new objective functionfunction Objective_value = 1 – lun_utilizationObjective_value = 1 – lun_utilization Assigns stores to underutilized LUNsAssigns stores to underutilized LUNs
► VariationsVariations Simple RandomSimple Random
►Randomized first fit, chooses lowest utilization varianceRandomized first fit, chooses lowest utilization variance Simple BalancedSimple Balanced
►Round robin first fit, based on capacity and utilization Round robin first fit, based on capacity and utilization constraintsconstraints
ClustererClusterer
►Addresses performance scaling issues Addresses performance scaling issues With many stores runtime grew to daysWith many stores runtime grew to days
►Combines multiple stores into a clusterCombines multiple stores into a cluster Cluster is mapped instead of storesCluster is mapped instead of stores
►Cluster rules based on observationCluster rules based on observation 10MB/s bandwidth10MB/s bandwidth 2GB size2GB size
► Increases cost ~3%Increases cost ~3%
EvaluationEvaluation
►Analytic model performance Analytic model performance predictionspredictions
►Evaluate sensitivity to workload Evaluate sensitivity to workload changeschanges
►Effect of design changesEffect of design changes►Measure live systemMeasure live system
Model ValidationModel Validation
►Based on single FC-30Based on single FC-30►Ran performance tests on physical Ran performance tests on physical
systemsystem►Compared results to model predictionsCompared results to model predictions►Results showed mean error rate of Results showed mean error rate of
+5.4%+5.4% Range of [-11%, +19%]Range of [-11%, +19%]
Safety and SensitivitySafety and Sensitivity
►Examined scaling of workload Examined scaling of workload parametersparameters
►Start with baseline workload, then Start with baseline workload, then modify a single parametermodify a single parameter
►Wanted to have 3 effectsWanted to have 3 effects Mixing of appropriate RAID levelsMixing of appropriate RAID levels Requiring non-trivial number of arrays (2+)Requiring non-trivial number of arrays (2+) Balanced store performance requirementsBalanced store performance requirements
Scaling Store Size and Scaling Store Size and BandwidthBandwidth
►Store size scalingStore size scaling System becomes capacity boundSystem becomes capacity bound
►Creates RAID 5 LUNsCreates RAID 5 LUNs
System size scales linearly with store sizeSystem size scales linearly with store size
►Bandwidth scalingBandwidth scaling Ratio of RAID 1/0 to RAID 5 increases Ratio of RAID 1/0 to RAID 5 increases
linearlylinearly
Scaling Number of StoresScaling Number of Stores
►Number of arrays scales linearly with Number of arrays scales linearly with storesstores
Running timeRunning time
►Quadratic increase with number of storesQuadratic increase with number of stores
Workload VariabilityWorkload Variability
►Workload attributes randomly taken Workload attributes randomly taken from log-normal distributionfrom log-normal distribution Baseline values = mean distribution Baseline values = mean distribution
valuesvalues
►Capacity utilization drops with Capacity utilization drops with increased variabilityincreased variability
►RAID 5 LUNs increaseRAID 5 LUNs increase►Segmentation increasesSegmentation increases
Workload varianceWorkload variance
Whole System ValidationWhole System Validation
►MINERVA vs. Human ExpertMINERVA vs. Human Expert►3 aspects3 aspects
Comparison of resultant system costComparison of resultant system cost Comparison of application performanceComparison of application performance Low runtime and minimal human interactionLow runtime and minimal human interaction
►Based on TPC-D benchmarkBased on TPC-D benchmark Decision Support system based on DB Decision Support system based on DB
queriesqueries►Human designers from HP system Human designers from HP system
benchmarking teambenchmarking team
Execution TimesExecution Times