10
BRAHMA: Planning Tool for Providing Storage Management as a Service Sandeep Uttamchandani, Kaladhar Voruganti*, Ramani Routray, Li Yin+, Aameek Singh, Benji Yolken** IBM Almaden Research, *Network Appliance, +UC Berkeley, **Stanford University 1 Overview Storage management is becoming the largest component in the overall cost of storage ownership. Most organizations are trying to either consolidate their storage management operations or outsource them to a storage service provider (SSP) in order to contain the management costs. Currently, there do not exist any planning tools that help the clients and the SSPs in figuring out the best outsourcing option. In this paper we present a planning tool, Brahma, that specifically addresses the above mentioned problem, as Brahma is capable of providing solutions where the man- agement tasks are split between the client and SSP at a finer granularity. Our tool is unique because a) in addition to hardware/software resources, it also takes human skill set as an input b) it takes planning time window as input be- cause plans that are optimal for a given time period (e.g. a month) might not necessarily be the most optimum for a different time period (e.g. a year) c) it can be used sepa- rately by both the client and the SSP to do their respective planning d) it allows the client and the SSP to propose al- ternative solutions if certain input service level agreements can be relaxed. We have implemented BRAHMA, and our experiment results show that there definitely are cost benefits that one can attain by having a tool with the above mentioned func- tional properties. 2 Introduction The cost of storage management is becoming the largest portion in the overall cost of storage ownership. Most or- ganizations are finding that they need to hire more adminis- trators as the amount of storage consumed by the organiza- tion increases. Storage administrators have to provide sup- port for provisioning, disaster recovery, compliance, perfor- mance, security, and planning requirements for an applica- tion/business. Thus, there is a limit on the amount of storage that can be managed by a single administrator. It is increas- ingly becoming difficult for most organizations to find and retain experienced system administrators. Most organizations are trying to address the above prob- lem by a) carefully dividing up an administrator’s respon- sibilities into high-skilled and low-skilled ones and hiring more low skilled administrators while sharing high skilled administrators across multiple locations b) outsourcing stor- age hosting and storage management tasks. The tasks of a typical storage administrator can be di- vided and performed by a) system planners b) system op- erators and c) system administrators. System planners per- form skill intensive tasks such as capacity planning , dis- aster recovery planning, security infra-structure design and bottle-neck analysis. Their services are invoked on a per need basis. System operators install new hardware, soft- ware and firmware. They also typically assist with wiring, cooling and other installations related tasks. Their services are also utilized on a per need basis. Finally, system admin- istrators perform daily storage management tasks such as resource monitoring, first level problem resolution, storage provisioning, and resource access control. Thus, most orga- nizations are trying to share expensive system planner skill set across multiple sites, and hiring relatively inexpensive system operators and administrators on a per-site basis. Organizations are also tackling the above mentioned problem by outsourcing their storage hosting and/or stor- age management operations to a storage service provider (SSP). The existing options for storage outsourcing can be categorized in the following manner: Data out, Management out: This is the most common outsourcing model where the data center is hosted and managed remotely by the storage service provider [7, 1]. Data in, Management out: In scenarios where the cus- tomers may not be comfortable with having their data stored remotely, customers prefer the model where the data is hosted locally, but the management is done by the SSP (either remotely, or by having their personnel present at the customer site). Data out, Management in: In the this model, the data center is hosted remotely, but the management is done in-house. This is not a common scenario. 1 2007 IEEE International Conference on Services Computing (SCC 2007) 0-7695-2925-9/07 $25.00 © 2007

[IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

  • Upload
    benji

  • View
    229

  • Download
    3

Embed Size (px)

Citation preview

Page 1: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

BRAHMA: Planning Tool for Providing Storage Management as a Service

Sandeep Uttamchandani, Kaladhar Voruganti*, Ramani Routray, Li Yin+, Aameek Singh, Benji Yolken**IBM Almaden Research, *Network Appliance, +UC Berkeley, **Stanford University

1 Overview

Storage management is becoming the largest componentin the overall cost of storage ownership. Most organizationsare trying to either consolidate their storage managementoperations or outsource them to a storage service provider(SSP) in order to contain the management costs. Currently,there do not exist any planning tools that help the clientsand the SSPs in figuring out the best outsourcing option.

In this paper we present a planning tool, Brahma, thatspecifically addresses the above mentioned problem, asBrahma is capable of providing solutions where the man-agement tasks are split between the client and SSP at a finergranularity. Our tool is unique because a) in addition tohardware/software resources, it also takes human skill setas an input b) it takes planning time window as input be-cause plans that are optimal for a given time period (e.g.a month) might not necessarily be the most optimum for adifferent time period (e.g. a year) c) it can be used sepa-rately by both the client and the SSP to do their respectiveplanning d) it allows the client and the SSP to propose al-ternative solutions if certain input service level agreementscan be relaxed.

We have implemented BRAHMA, and our experimentresults show that there definitely are cost benefits that onecan attain by having a tool with the above mentioned func-tional properties.

2 Introduction

The cost of storage management is becoming the largestportion in the overall cost of storage ownership. Most or-ganizations are finding that they need to hire more adminis-trators as the amount of storage consumed by the organiza-tion increases. Storage administrators have to provide sup-port for provisioning, disaster recovery, compliance, perfor-mance, security, and planning requirements for an applica-tion/business. Thus, there is a limit on the amount of storagethat can be managed by a single administrator. It is increas-ingly becoming difficult for most organizations to find andretain experienced system administrators.

Most organizations are trying to address the above prob-lem by a) carefully dividing up an administrator’s respon-sibilities into high-skilled and low-skilled ones and hiringmore low skilled administrators while sharing high skilledadministrators across multiple locations b) outsourcing stor-age hosting and storage management tasks.

The tasks of a typical storage administrator can be di-vided and performed by a) system planners b) system op-erators and c) system administrators. System planners per-form skill intensive tasks such as capacity planning , dis-aster recovery planning, security infra-structure design andbottle-neck analysis. Their services are invoked on a perneed basis. System operators install new hardware, soft-ware and firmware. They also typically assist with wiring,cooling and other installations related tasks. Their servicesare also utilized on a per need basis. Finally, system admin-istrators perform daily storage management tasks such asresource monitoring, first level problem resolution, storageprovisioning, and resource access control. Thus, most orga-nizations are trying to share expensive system planner skillset across multiple sites, and hiring relatively inexpensivesystem operators and administrators on a per-site basis.

Organizations are also tackling the above mentionedproblem by outsourcing their storage hosting and/or stor-age management operations to a storage service provider(SSP). The existing options for storage outsourcing can becategorized in the following manner:� Data out, Management out: This is the most common

outsourcing model where the data center is hosted andmanaged remotely by the storage service provider [7,1].� Data in, Management out: In scenarios where the cus-tomers may not be comfortable with having their datastored remotely, customers prefer the model where thedata is hosted locally, but the management is done bythe SSP (either remotely, or by having their personnelpresent at the customer site).� Data out, Management in: In the this model, the datacenter is hosted remotely, but the management is donein-house. This is not a common scenario.

1

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 2: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

� Hybrid Management: This is a variation of the pre-vious two approaches. In this approach the storagemanagement tasks are split between the storage serviceprovider and the customer.

Over time, as an organization’s storage needs and in-houseskill set evolve, different storage outsourcing models be-come more attractive. Therefore, there is not a single”one-size fits all” solution that can be advocated for all thecustomers. Similarly, the SSPs want to provide differentclasses of services to different customers for different price.Furthermore, both the customer and the storage serviceprovider are trying to minimize their respective costs, andin some cases, their cost optimization goals might promptthem to advocate different storage outsourcing solutions. Itis not easy for both customers and storage service providersto come up with their respective plans as to what is the mostoptimum course of action due to the presence of numerousapplications requirements (performance, availability, secu-rity), devices, business policies, and government regula-tions that one has to satisfy. Manual planning techniquesthat are being employed by customers and storage serviceproviders are both time consuming and error prone. Fur-thermore, plans have to be constantly updated to cope withthe constant changes due to resource consumption growthand new business and government requirements.

In order to address the above set of problems, in this pa-per we propose a web services based multi-site storage out-sourcing planning tool that has the following key features:� Temporal Planning: We present a planning infra-

structure that can take different optimization time win-dows as input. For example, the planner output planthat is the most suitable for a one month time windowmight not be the most cost effective plan for a one yeartime window.� Integrated Human/System Planning: When propos-ing storage outsourcing solutions, in addition to con-sidering storage hardware and software resources, it isvery important to consider the available human skillset. In this paper, we take human skills into accountwhen proposing storage outsourcing solutions.� Generation of multiple plans: The proposed plan-ning infra-structure allows the SSP to propose solu-tions with different cost points if certain conditions arerelaxed. For example, the SSP can indicate to the cus-tomer that if they are willing to downgrade their per-formance or availability requirements, the customerscan get solutions that have better cost points. Thus,the tool outputs multiple solutions with different costpoints.

3 System Overview

This section provides an overview of BRAHMA. It dis-cusses the inputs, outputs and the key components of theplanning tool in the following subsections. The clients andSSPs can perform remote storage management tasks using astandard TCP/IP connection. The actual data flow betweenthe clients and SSPs via either IP networks (using iSCSI)or via FCP (SCSI over Fibre Channel). BRAHMA can ei-ther reside on a separate physical server, or it can share thesame physical server with a storage resource manager soft-ware packages like EMC control centre, IBM TPC or HPAPPIQ.

- Client SLO- Available

Resources

POLICYMANAGER

OPTIMIZER

SLO

PARSER Customizable

Policy DB

Client Resource

Description

SSPResource

Description

FeasibilityMatrix

COUNTER-OFFER

Per-client service models(Reject OR DIMO/ DOMO

/ DIMI / DOMI)

Resource Utilization Plan for accepted

SLOs

- Client SLO- Available

Resources

- Client SLO- Available

Resources

Client Requirements

Figure 1. Design of BRAHMA3.1 Planner InputBrahma obtains the following different types of input:

Client Demand: Client demand consists of SLOs, penaltyagreements, and the client’s willingness to relax certainSLOs. Clients specify capacity, performance, availabil-ity (includes disaster recovery), future growth, and secu-rity related SLOs. To assist the clients in defining theirrequirements, BRAHMA provides pre-defined applicationtemplates that vary in performance and availability require-ments such as OLTP, scientific, and email applications tocapture a diverse set of requirements.

Client and SSP Resources: In this paper we are dealingwith both hardware resources as well as human resources.BRAHMA can either directly query the resources in con-cern or indirectly obtain the data from storage resourcemanager databases (which, in turn, directly query the re-sources using SMI-S [15] and SNMP protocols). This dis-covered hardware resource information is used as input to

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 3: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

BRAHMA. Currently, we do not have an automatic way ofdiscovering and classifying human resources, therefore, weprovide templates which classify human skill level into dif-ferent buckets. In this paper, we have classified human skilllevel into high, medium and low based on the administra-tive tasks an administrator can perform, and the number ofman-hours he takes to complete a particular task.

Business Policies: Clients can specify many types of or-ganizational, government regulations and plan optimizationinput criteria. For example, clients can specify requirementssuch as the exclusive use of a storage controller for a par-ticular user/department, or a government regulation that re-quires that there should be no duplicate copies of data, orthat the data needs to be physically removed from deviceson a particular date etc. Clients can also specify optimiza-tion time window such as the number of months or year asan input.3.2 Planner Components

The framework for BRAHMA, as shown in Figure 1,consists of three key modules:

SLO Parser: Typically clients describe their demands andavailable resources in plain English. These are mapped intoour resource description and resource requirement formatsby the SLO Parser. The output of the parser feeds intothe Policy Manager as well as constraints for the Optimizernamely the penalty function and optimization window.

Policy Manager: This module maps customer SLO re-quirements to the list of candidate hardware and human re-sources that can satisfy those requirements. This modulewill be developed by domain experts and pre-packaged withthe BRAHMA as a customizable Policy Manager. This stepgenerates a list of candidate resource locations (hardwareand human) for the customer SLOs; it embeds the follow-ing domain expertise:� Mapping of an SLO attribute to hardware resource re-

quirements. For example, a recovery window SLO of 5minutes will require a network bandwidth proportionalto the size of the dataset between the customer site andthe backup device� Mapping of an SLO attribute to human tasks, skill-level and manhours.

This step also helps prune the search space for the opti-mizer by eliminating storage devices and human resourcesthat cannot be used to service an SLO.

Constraint-based Optimizer : This module takes the can-didate resource list and generates an optimal allocation of

resources for a given customer SLO. We formulate the op-timization as a 0/1 multi-knapsack problem [14], which isa well-known NP-hard problem. Figure 1 shows the overalldesign of .3.3 Planner Output

The output of BRAHMA can contain either a single planor multiple plans. When the planner outputs multiple plans,it typically uses the additional plans to inform the user thecost benefit if one or more SLO requirements were relaxed.� Data Placement Information: This information de-

scribes on what site and storage location a particu-lar type of user data (like log data or temporary data)should be placed.� Administrator Placement Information: The admin-istrator placement information describes what type ofsystem administrator (with the appropriate skill level)should be placed at which location.� Solution Cost: The solution cost indicates the cost indollars with respect to the overall setup cost. In this pa-per we use a simple cost model where we assign costfor both different types of storage as well as cost fordifferent administrator tasks. We obtained these costsafter surveying the cost models of various storage ser-vice providers.

Next, we delve deeper into the different components ofBRAHMA.

4 SLO Parser

The input to the SLO Parser is a combination of customerrequirements, resource information, and business policies.The SLOs are typically in plain English or some human-readable format. The hardware resource information is typ-ically a collection of data-sheets or from management tools.This section gives details of the internal representation ofthis information as generated by the SLO Parser.4.1 Client Requirements

The SLO requirements are translated into correspondingmeasurable parameters. Table 1 shows the parameters usedby the BRAHMA prototype implementation.

In BRAHMA, the given customer SLO is inter-nally represented as three sub-SLOs corresponding to thelevel of service required for the customer’s regular data,archival/backup data, and compliance data.

For each of the SLO attributes in Table 1, it is possiblefor the customer to specify whether the attribute is flexible –this allows BRAHMA to relax the corresponding constraintand generate multiple allocation plans.

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 4: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

category SLO parametersCapacity GigabytesPerformanceper gigabyte

throughput, latency

Availability 9s, MTTF, MTTRDisasterrecovery

RTO, RPO, Distance, Application Impact,Backup Window

Compliance HIPAA, SOXFutureGrowthPer Month

Capacity Growth, Performance Growth, PortCount Growth

Time to pro-vision

YY:MM:DD:HH:MM:SS

Provisioninggranularity

GBytes

Security access control, on disk, authorization, on wire,physical security, separate setup,

others customer support (24 ), Problem turnaroundtime

Reliability CRC, Checksum, LRC

Table 1. Example SLO parameters4.2 Hardware resour e des riptionThe information about the hardware attributes can ei-

ther be generated manually or using SAN management toolssuch as EMC Control Center [2] and IBM Total Productiv-ity Center [4]. In the BRAHMA protoype, a storage deviceis configured with attributes, such as hot code upgrade,

Maximum Capacity and Encryption Support.4.3 Human resour e des riptionThe human resource model consists of the following pa-

rameters:� Skill Level: This is categorized into high, medium, andlow� Expertise: This is based on the task groups supportedby the administrator. In BRAHMA, the task groupsare planning (performance, disaster recovery, secu-rity), monitoring (performance and system status), andoperations (firmware and system installation, provi-sioning, compliance).� Yearly man-hours: The number of man-hours forwhich the administrator will be available� Hourly rate: The dollar cost associated with the ad-ministrator

5 Policy Manager

As shown in figure 1, the Policy Manager takes as in-put the customer requirements, the hardware and human re-sources of the customer and the SSP to generate the follow-ing:� A feasibility matrix of the candidate hardware re-

sources that can satisfy the customer SLO. The rowsin the matrix are the data types (regular, backup, com-pliance) of all customer SLOs and the columns are thehardware devices.� A feasibility matrix of the candidate human resourcesthat can satisfy the administrative tasks associated withthe customer SLO. The rows of the matrix representthe six task groups for each SLO (performance, capac-ity, availability, disaster recovery, compliance, secu-rity) while the columns represent the available humanadministrators on the customer as well as SSP sites� An enumeration of human manhours required for eachadministrative task group associated with the SLO

In data-centers today, the matrix generation is done infor-mally and manually – automating this process with the Pol-icy Manager reduces errors and planning time, and com-bined with optimization (described later) it will signifi-cantly improve resource usage and SSP costs.

Internally, the Policy Manager uses two sets of config-urable policies to the generate the output:Policies for mapping SLOs to hardware attribute con-straints: These are rules that establish whether a particularSLO can be fulfilled by a specific resource. It can be thoughtof as a boolean function feasible(SLO; Stg�Resour e)which indicates whether Stg�Resour e is feasible for theSLO. For example, for compliance data only WORM stor-age devices are candidates and thus will return true. Addi-tionally, the customer may specify the SLO as a level of ser-vice such as bronze, gold, platinum. For example, a policyrule that maps the SLO requirement of platinum for avail-ability to a storage controller which has all the followingattributes: 1) Fault Tolerant & Dual redundant; 2) HotswapRAID Controller Cards; 3)Battery Backup Units; 4) Non-disruptive hardware and software code load updates (HotCode Upgrade); 5) Multi-pathing device driver

Policies for mapping SLOs to human tasks, skill level andmanhours: These rules map the administrative tasks re-quired to meet an SLO. For example, servicing a 10 TBSLO with security-enabled and full disaster recovery capa-bility would require planning administrators, performanceand security analysts and DR experts. This identificationof tasks is based on a customizable set of best practices

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 5: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

and rules-of-thumb that are commonly used in data cen-ters. Next, based on task identification all administratorwho have the necessary skills to service an SLO are con-sidered feasible for the specific SLO and added to the can-didate list.

Generating the acceptability matrix is intuitively amatching problem, where the set of SLO requirementsneed to be matched to appropriate hardware and human re-sources. Here, we briefly illustrate an example work flowof the Policy Manager from starting with a customer SLOand generating a feasibility matrix. Consider a SSP withtwo sites A and B with high-end storage controllers of 1PB capacity in site A and low-end storage devices with 500TB capacity and backup tapes of 100 TBs in site B. Addi-tionally, site A has high skilled performance and availabilityanalysts and medium skill monitoring administrators, whilesite B has high skill planning and security analysts. Now,assume a client SLO which requires 50 TB of regular data,out of which 10 TB needs to be backed up on tapes. Theclient requires high throughput requirement and full secu-rity with at-rest and on-wire encryption.

The policy manager first looks at the storage resourcerequirements for the client. Based on the capacity require-ments, storage controllers at both site A and site B arecandidates. However, using the storage resource policies(feasible()), the high throughput requirements of the clientmight eliminates devices at site B as they are low end de-vices. For the backup 10 TB data, only backup devicesavailable at site B are candidates. Next, the policy manageruses the task identification rules to break down the manage-ment requirement into individual tasks and number of manhours required for that task. Then, it selects candidates forfulfilling those tasks. For example, using the thumb rule of1 high-skill man hour of planning for provisioning every 10TB capacity, the policy manager chooses the administratorsat site B as candidates and assigns 5 man hours require-ment. Similarly for other task groups.

This process is repeated for all client SLOs. As a resultof this analysis, a comprehensive feasibility matrix can becreated which has all storage devices and human resourcesas columns and client SLOs as rows. The cell (i; j) ismarked yes based on whether a resource j is feasible forthe SLO i. Based on this feasibility matrix, the optimizerwill try and find the best allocation of storage and admin-istrator resources to satisfy client SLOs. The optimizer isdiscussed in the next section.

6 Optimizer

BRAHMA formulates the decision-making as a con-straint optimization problem, where the objective functionis to maximize the summation of the cost difference that theSSP can deliver to all customers; the variables are the allo-

cation decision for all data (e.g., where to store the data) andthe management tasks (where the management tasks will beperformed); the constraints are based on the input informa-tion. In addition, to account for the future growth poten-tial, BRAHMA introduces a lookahead time window andestimates the possible clients’ requirement using the spec-ified growth probability and growth percentage. The totalcost savings are the sum of the cost savings over the entirelookahead window. The run-time provisioning is performedby appropriately balancing the benefit (saved penalty cost)and the cost (hardware purchasing and maintenance cost).

For a given SLO, the allocation of hardware resources isdone in units of the data types (i.e., regular data , backupdata, and compliance data). One straight-forward solutionis to plan all three together in a single optimization problem(e.g., determine the location for 3N data sets, where N isthe number of customer). However, this formulation failsto capture many real world constraints that exist in suchenvironments, e.g. the backup data often can not stay to-gether with its regular data. Such constraints introduce cor-relations between planning decisions and grow the solutionspace exponentially. BRAHMA assumes that the regular,backup and compliance data need to be placed on differentdevices and breaks the resource optimization problem intothree independent sub-problems.

The allocation of human resources is in terms of avail-able manhours for each administrator and the number ofmanhours required for different administrative tasks asso-ciated with the SLO. Finally, the optimization for allocatinghardware and human resource in BRAHMA can be doneindependently. This is possible because the management ofhardware can be done remotely and is not required to be lo-cal. In our prototype, we choose to plan for managementtasks independently.6.1 Generating Multiple O�ers

The primary design objective of is to create an exten-sible framework that can capture most of the intricacies oftoday’s human driven SLO pricing process. An importantcomponent of this process is counter-offers where after con-sidering a client’s SLO, the SSP generates additional planswith different costs for certain relaxations to the client SLO.For example, the SSP might offer the client a better price ifit was willing to have a lower security requirement (suchas at-rest encryption instead of on-wire encryption). Suchcounter-offers are motivated by lack of availability of re-sources to handle the task (e.g. no available high-skill secu-rity administrators with the SSP at that time). Offering suchalternatives to the client is extremely common, and in factexpected.

While SLOs relaxations introduce significant complex-ity, the performance of our approximation algorithm (see

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 6: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

Section 7) makes it a little easier to account for thesecounter-offers. We allow the client to specify SLO parame-ters that are flexible – internally, the optimizer is run severaltimes, changing one or more parameters at a time. This al-lows us to retain the core optimization algorithm while cre-ating counter-offers. Clearly our solution has the limitationon the number of parameters that can be relaxed. Addi-tionally, we have tried several techniques that preserve stateduring multiple optimization cycles which provide marginalrun-time benefits.

7 Experimental Evaluation

The experimental evaluation of BRAHMA consists ofthe following tests:� First, Sanity Check. The goal of this test is to ex-

amine BRAHMA’s ability to adapt to simple changesin system parameters such as cost functions, penaltyfunctions, optimization time window and SLO require-ments.� Second, Counter offers. In this test, we examinethe BRAHMA’s ability to generate flexible allocationplans by relaxing SLO requirements.� Third, Performance Comparison. In this test, wecompare the performance of the BRAHMA with twocommonly used, simplistic heuristics- all in-house andall outsourcing.

In the rest of this section, we describe our configurationfor the BRAHMA prototype and present experimental re-sults for the tests listed above.

Configuration of the BRAHMA Prototype

The BRAHMA prototype is implemented in Java. TheSLO Parser implements XML-based parsers for the clientrequirements, business policies, and human resource model.The hardware resource information is imported using JDBCcalls to the monitoring database of a commercial storagemanagement suite. The Policy Manager implements thefeasible function for the hardware and human resources asdescribed in Section 5. To generate counter offers, the op-timizer then modifies the SLO requirements of the clients(according to clients’ willingness to relax them) and feedsthem to the policy manager to start the process of findingalternative allocation plans.

Because of the large number of parameters required tospecify a testing scenario, we implemented a configurationgenerator that assists in the creation of realistic testing sce-narios. The configuration generator maintains the standardconfiguration for a number of different client applications,

SSP storage devices and SSP administrators. Each applica-tion type, for instance, is associated with a number of mea-surable SLO parameters (Table 1) as well as a capacity andthroughput range. Due to space limitations and the largenumber of parameters involved, we omit most of the details.As an example, Table 2 shows the capacity and throughputranges for regular data of each application type as well asthe feasible device tiers that result when the associated ap-plication parameters are passed through the policy manager.

ApplicationTypes

Capacity Throughput FeasibleDevices

Archival 100GB-100TB 10MB/s - 10GB/s Tier 1-5Compliance 3TB-1PB 100MB/s-

32GB/sWORM

OLTP Stan-dard

100GB-100TB 10MB/s-10GB/s Tier 1

OLTP High 100GB-100TB 25MB/s-25GB/s Tier 1ScientificComputing

100TB - 10PB 1GB/s - 100GB/s Tier 1-4

SMB 100GB-100TB 100MB/s-10GB/s

Tier 1-3

Table 2. Application SLO Requirements

The physical characteristics of the storage devices cre-ated by the generator are derived using the data sheets pro-vided by hardware vendors. In addition, the device costfunctions are set based on a cost/benefit analysis from a ma-jor storage service provider, which estimates that the bur-dened cost (including the hardware and software cost, floorspace, etc.) of Tier 1 storage is roughly $4.20/GB/month.The cost functions of the other tiers are estimated by scal-ing this number (e.g., Tier 2 0.8X, Tier 3 0.6X, Tier 4 0.2X,Tier 5 0.1X and WORM 0.5X). Table 3 lists the types ofstorage devices and their cost ranges as implemented by theBRAHMA prototype.

Tier Products Cost[$/GB/Month]

Tier 1 EMC Symmetrix, Hitachi Light-ning, IBM DS8000

4.2�0.4

Tier 2 3PAr InServ S800, Hitachi Thun-der, IBM DS6000

3.3�0.3

Tier 3 3PAr InServ S400, IBM DS4000 2.5�0.3Tier 4 Adaptec JBOD Systems, IBM

DS4000.8�0.2

Tier 5 ADIC Scalar, HP StorageWorksVLS, IBM TS1120 Rewritable

0.4�0.1

WORM EMC CENTERA, HP Storage-Works Ultrium, IBM TS1120WORM Tape

2.1�0.2

Table 3. Storage Devices Configurations

For the cost of administrators, we assume that a highskilled administrator in the United States has an hourly rateof $55- $70, medium skilled administrators are in the rangeof $40-$55 per hour and low skilled administrators cost $30

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 7: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

to $40 per hour. For administrators in developing countries(e.g., China and India), we apply a scaling factor of 15% toaccount for the lower pay scales in those locations.

Using the default configuration as the base, testing sce-narios are easily generated by specifying different levels ofdetail for the device types, application types, the number ofSSP sites, administrators, and clients.7.1 Sanity Che k

In this test, we evaluate BRAHMA’s ability to adapt tochanges to various input parameters. For this test, the con-figuration generator is to first create a “default” configura-tion, and then modify this configuration one parameter at atime to observe the results generated by BRAHMA.

The details of the default configuration are as follows:� SSP resources and sites. There are four SSP sites(China, India, US1, US2) in the system. In total, wehave 500TB Tier 1, 255TB Tier 2, 380TB Tier 3, 10TBTier 4, 45TB Tier 5 and 250TB WORM storage, and18 high skilled, 36 medium skilled and 70 low skilledadministrators with different skill sets. The locationsand exact cost functions (e.g., administrators’ hourlyrates) of these resources are randomly determined ac-cording to the ranges specified in the generator. Weskip a more detailed discussion of the SSP resources ateach site because of space constraints.� Client Configuration. We have six clients in the sys-tem. Their application types and specifications aregiven in Table 4.� Other Constraints. The optimization time window isset to be six months unless otherwise specified. Addi-tionally, a “penalty” represents the dollar amount thatthe SSP will be charged for every GB of capacity ithas promised to the client but cannot provide. The op-timizer is set to run for 20 iterations and the best planis selected.

For the default configuration, BRAHMA generates thefollowing data allocation plan as shown in Table 5.

Client ID Regular Data Backup Data Compliance DataClient 1 N/A Tier 3 (US1) N/AClient 2 Tier 3 (China) Tier 3 (China) N/AClient 3 Tier 1 (US1) Tier 2 (India) WORM (China)Client 4 Tier 2 (India) Tier 3 (US1) WORM (US2)Client 5 Tier 1 (India) Tier 1 (US2) N/AClient 6 Tier 1 (India) Tier 2 (US1) WORM (China)

Table 5. Data Allocation Plan For the DefaultConfiguration.

Compared to complete in-house solution with an es-timated cost of$7,670,420, the above allocation decisionsaves $4,827,968. The latter cost savings includes a$166,550 initial provisioning cost on the SSP side. On thehuman administration side, BRAHMA saves 49.1% of costby outsourcing a part of the management tasks. The de-tails of the recommended management plan are omitted forbrevity.

In the remainder of this section, we modify parametersin the above default configuration and discuss the changesthat result in the BRAHMA’s plan.

7.1.1 Test 1: Impact of cost functions

In this test, the cost of the Tier 1 device in India is increasedfrom $3.9/GB/month to $4.2/GB/month. As a result, thejobs originally on these devices (namely the regular data ofClients 5 and 6) are now placed on the US1 site. The totalcost savings drops to $4,783,253 (from $4,827,968 in thedefault setup). Similar tests done on the human costs yieldsimilar results.

7.1.2 Test 2: Impact of penalty functions

In this test, the penalty functions for the regular and backupdata for Client 2 are increased from $3/GB to $10/GB. Inboth the default and modified cases, these jobs are placedon Tier 3 storage in China. As a result of increasing thepenalty, however, the amount of provisioning at that loca-tion increases from 47.52TB to 53.44TB.

7.1.3 Test 3: Impact of client’s capacity and through-put

In this test, we change a client’s SLO parameters and ex-amine the BRAHMA’s ability to adapt. Specifically, we re-duce the capacity requirement of Client 4 from 100TB to10TB and its throughput from 10GB/s to 1GB/s. Due tothe reduced capacity and throughput requirements, Client 4can now be satisfied at lower tier (Tier 5) storage devices.The BRAHMA correctly captures this change and puts theclient’s regular and backup data on tier 5 devices in Chinaand the US, respectively.

7.1.4 Test 4: Impact of the optimization time window

The goal of this test is to check the BRAHMA’s ability toaccount for temporal behavior. This is done by changing theoptimization time window from six months to one monthonly, while keeping all other settings the same. As a result,the regular data of Client 4 are placed on the Tier 3 storagein China, as opposed to the original Tier 2 storage in India.

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 8: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

Client ID Application Type Capacity Throughput Variability Future Growth [permonth]

Penalty[$/GB]

Client 1 Archival 50TB 2GB/s 10% 5% 2Client 2 SMB 15TB 1GB/s 30% 20% 3Client 3 OLTP Standard 50TB 5GB/s 20% 10% 4Client 4 Scientific 100TB 10GB/s 10% 10% 3Client 5 OLTP High 40TB 10GB/s 10% 5% 4Client 6 OLTP Standard 20TB 10GB/s 30% 15% 4

Table 4. Client Configurations7.2 Multiple O�ersIn this test, we demonstrate the BRAHMA’s ability to

generate multiple allocation plans when a client is will-ing to be flexible on one or more of its SLO requirements.Specifically, we randomly generate a testing scenario whereClient 7 (a SMB application) is willing to relax the relia-bility requirement on its regular data (e.g., they are non-profit with a low budget, and short outages are an an-noyance but will not fundamentally hurt their operations).In this situation, BRAHMA gradually moves the reliabil-ity requirements from “PLATINUM” to “SILVER” and to“BRONZE”. As a result, the regular data of Client 7 ismoved from Tier 1 to Tier 2, and then finally to Tier 3 stor-age. At the same time, the cost savings grow from $121,520to $283,549 to $445,578. BRAHMA returns all of theseoptions to the client so it can study the trade-offs betweenreliability and the corresponding cost to support.7.3 Comparison of BRAHMA outputwith Cookie Cutter te hniques

In this test, we compare the BRAHMA with two simple,commonly used strategies: storing and managing all oper-ation in-house (“all in”) and outsourcing everything (“allout”) on the other extreme. In the latter case, each job isplaced on the least loaded device among those which arefeasible. For each strategy, we measure the final cost, de-fined as the sum of the cost of jobs served by the client siteand the cost of jobs on the SSP sites, and the decision time.In addition, we focus on scenarios that do not require pro-visioning because the provisioning decision twists the finalcost and makes the relative advantages of each allocationstrategy unclear.

We start with three representative scenarios. In scenarioone, the clients all have cheaper costs than the SSP (e.g.,all clients are overseas). In this case, the optimal allocationplan should be all in-house. In scenario two, the hardwareand operating costs on the clients are more expensive thanthe SSP costs (due to economies of scale ) and the SSP siteshave homogeneous resources. As such, the best result isclearly to outsource all jobs. In scenario three, both theclients and the SSP have a heterogeneous mixture of de-

Scenario Strategy SSPCost

ClientCost

Total Cost DecisionTime (s)

Scenarioone

BRAHMA 0 483965 483965 0.31

All In 0 483965 483965 0All Out 2723839 0 2723839 0.28

Scenariotwo

BRAHMA 4285205 0 4285205 0.31

All In 0 6373392 6373392 0All Out 4285205 0 4285205 0.30

Scenariothree

BRAHMA 1433615 568658 2002273 0.28

All In 0 2803096 2803096 0All Out 30990033 0 30990033 0.33

Table 6. Comparison of BRAHMA, All In-House and All Outsourcing approaches

vices, applications, and costs. In this case, our tool shouldperform better than either of the simple heuristics discussedabove because it allows for a hybrid of inhouse and out-sourcing based on cost optimization. Table 6 lists the totalcost and the decision time for each option. Note that, forthe all in-house strategy, the decision is fixed (always on theclient site) and thus the decision making time is effectivelyzero.

8 Related Work

BRAHMA handles both the outsourcing decision mak-ing and the capacity planning. In this section, we summa-rize the existing efforts for both problems.

A core component of our work is optimized utilizationof resource for Storage Service Providers (SSPs) and is thusrelated to various capacity planning tools. Tools like Min-erva [8], Hippodrome [9], Ergastulum [10] plan for stor-age environment setup and disaster recovery based on anal-ysis of the workloads. However, their optimization is withinthe scope of a single SAN and they do not account for anymanagement costs. In contrast, BRAHMA is a end-to-endframework mapping clients and their SLO requirements toSSP resources (both storage and human) optimized for max-imum cost savings.

With the growing total cost of ownership, many organi-zations are outsourcing their storage hosting and manage-

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 9: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

0

1

2

3

4

5

6

7

8

9

0 10 20 30 40 50 60 70 80 90 100

Deci

sion T

ime (

sec)

Number of Clients

Our ToolAll Outsource

Figure 2. Decision Time of BRAHMA and Alloutsourcing

ment tasks. The first question to ask is, to outsource or not.Various studies articulate critical elements that need to beconsidered when answer this question. [12] studies 5000firms and lists risks involved in outsourcing. They concludethat outsourcing can be dangerous for a company due toit’s dependence on a service provider over time. To addressthis problem, they suggest alternatives such as having mul-tiple service providers or selective outsourcing. Our toolcan fulfill these requirements because it allows the clientsto selectively oursource jobs and choose among multipleSSPs. Another empirical study [13] of application serviceproviders find that the basic reason for companies to out-source is to save cost and maximize ROI. They identifythree main reasons for clients to adopt the application ser-vice provider concept—core competence, lack of skilledpersonnel and the organization’s overall strategy. Our toolprovides a framework to account for these factors. It easesthe decision process and helps clients perform the what-ifanalysis.

Currently, many organizations hire a consulting firm spe-cializing in outsourcing IT operations [3] to make the out-sourcing decision. Such firms rely heavily on experiencedconsultants who make decisions based on industry bestpractices and their experiences. There also exist several au-tomatic planning techniques for making the outsourcing de-cisions. The Return-On-Investment (ROI) calculators [6, 5]use some static rules of thumb to estimate whether outsourc-ing will yield significant benefits or not. Other works areanalytical model based solutions. One interesting work ap-plies an Analytic Hierarchy Process (AHP) [11] to developa model to make outsourcing decisions. Their model iden-tifies activities to outsource to maximize ROI and they also

help to choose appropriate outsourcing methodology (e.g.vendors and contracts to adopt). However, this work is notspecific to storage services and thus, cannot leverage on do-main expertise—a critical element in our design. In addi-tion, they cannot account for human and administrative re-sources while planning.

9 Conclusion

In this paper, we propose a multi-site storage outsourcingplanning tool, BRAHMA. It takes into account the clientsSLOs, available hardware/software resources, managementresources and the optimization time window, and outputsthe data and management allocation plan that leads to max-imum cost savings for the clients and SSPs. BRAHMA hasa policy manager that prunes the candidate search space andformulates the problem as a constraint optimization prob-lem. BRAHMA is implemented and evaluated using a Java-based implementation. Our evaluation results show thatBRAHMA’s plan can account for system parameters suchas cost functions, penalty functions, optimization time win-dow and clients SLOs and it also allows for the clients andSSPs to explore alternate options. Finally, by comparing theplans generated by this planning tool with commonly usedoutsourcing strategies we show that BRAHMA can adapt tosystem changes and always find the optimal plan with a de-cision overhead similar to the simple outsourcing strategies.

References

[1] Amazon Web Storage Services. http://aws.amazon.com/s3.

[2] EMC ControlCenter family of storage resource management(SRM). http://www.emc.com/products/storage management/controlcenter.jsp.

[3] Enterprise portfolio analysis and collaborative decision sup-port. http://www.expertchoice.com.

[4] IBM TotalStorage. http://www-1.ibm.com/servers/storage.[5] Insight builder. http://www.insightbuilder.net/whaticcm/

roicalculator/index.shtml.[6] Outsourcing calculator. http://www.info-sourcing.com.[7] Sun Storage Services. http://www.sun.com/

service/storage/index.html.[8] G. Alvarez, J. Wilkes, E. Borowsky, S. Go, T. Romer,

R. Becker-Szendy, R. Golding, A. Merchant, M. Spasojevic,and A. Veitch. Minerva: An automated resource provision-ing tool for large-scale storage systems. ACM Transactionson Computer Systems, 19(4):483–518, November 2001.

[9] E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal,and A. Veitch. Hippodrome: Running circles around storageadministration. In Proceedings of USENIX Conference onFile and Storage Technologies, pages 175–188, 2002.

[10] E. Anderson, M. Kallahalla, S. Spence, R. Swaminathan,and Q. Wang. Ergastulum: quickly finding near-optimalstorage system designs. HP Laboratories SSP technical re-port HPL-SSP-2001-05, June 2002.

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007

Page 10: [IEEE IEEE International Conference on Services Computing (SCC 2007) - Salt Lake City, UT, USA (2007.07.9-2007.07.13)] IEEE International Conference on Services Computing (SCC 2007)

[11] V. Bansal and V. Pandey. A decision-making framework forit outsourcing using the analytic hierarchy process, 2006.

[12] R. Gonzalez, J. Gasco, and J. Llopis. Astudy of information systems outsourcing risks.http://csrc.lse.ac.uk/asp/aspecis/20040054.pdf, 2004.

[13] B. Johansson. Exploring application service provision:adoption of the asp concept for provision of icts in smes,2004.

[14] D. Pisinger. A minimal algorithm for the 0-1 knapsack prob-lem. Journal of Operations Research, 45:758–767, 1997.

[15] Storage Networking Industry Association. SMI Specifica-tion version 1.0. http://www.snia.org, 2003.

2007 IEEE International Conference on Services Computing (SCC 2007)0-7695-2925-9/07 $25.00 © 2007