[IEEE Distributed Processing, Workshops and Phd Forum (IPDPSW) - Atlanta, GA, USA (2010.04.19-2010.04.23)] 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops

Simplifying solution deployment on a Cloud through composite appliances

Trieu Chieu, Alexei Karve, Ajay Mohindra, Alla Segal IBM T. J. Watson Research Center

19 Skyline Drive Hawthorne, NY, USA

e-mail: { tchieu, karve, ajaym, segal}@us.ibm.com

Abstract— Containing runaway IT costs is one of the top priorities for enterprises. Cloud Computing, with its on-demand provisioning capability on shared resources, has emerged as a new paradigm for managing IT costs. In this paper, we describe a framework to simplify deployment of complex solutions on a Cloud infrastructure. We discuss the concept of a composite appliance and show how it can be used to reduce management costs. We illustrate the benefits of our approach with a complex three-tiered solution that can be deployed and configured on a set of virtual machines instances without any manual intervention.

Keywords- Provisioning; Composite Appliance

I. INTRODUCTION The phrase “Cloud Computing” [1-3] has been

described as a means to contain and manage IT costs for enterprises. Cloud Computing is a paradigm where compute, storage, and network capacity is made available to users in an on-demand manner through a shared physical infrastructure. CIOs expect that sharing hardware, software, network resources, and management personnel would reduce per unit IT cost for their enterprises. Several vendors such as Amazon EC2, Google, and Rackspace have been offering commercial Cloud offerings. Though not enterprise grade level yet, Cloud Computing has increased the interest of several large enterprises, which have started deploying and experimenting with the technology for their test and development environments.

The Cloud Computing model has evolved from the traditional dedicated resource model to a shared resource model so as to contain the costs associated with building and managing a delivery center. Traditional customer hosting services require the IT department to install and configure the OS, middleware, and applications, and perform configuration of the solution on top of the software stack. With the advancement of virtualization technologies, it is now possible to build and deploy standardized images of software stacks in virtual machines. For example, several images with different software stacks are now available in the Amazon EC2 catalog, while the IBM Developer Cloud offers images with preinstalled DB2 and WebSphere software.

However, users have started demanding new capability to add and provision complex multi-tiers solutions in a Cloud environment. The main reason is that building multi-tiered topologies is both difficult and labor intensive using the single virtual machine based deployments. For example, to enable a complex solution, users either colocate all the

components of the solution in one single image, or provision individual virtual machines and manually configure and install the software on each provisioned virtual machine. The current approach to deploying complex solutions can adversely affect the cost-benefit analysis of Cloud Computing for enterprises due to required manual intervention to deploy and configure the solutions.

In this paper, we will describe a framework to simplify deployment of solutions using composite appliances on a Cloud. We will first describe the current state-of-the-art in Cloud Computing and the various approaches to configure and deploy solutions. We then introduce the concept of a Composite Appliance, and illustrate the concept through an example implementation.

The outline for the rest of the paper is as follows. Section 2 gives a background on Cloud Computing. Section 3 describes the Vega framework for composite appliance provisioning and example scenario. Section 4 discusses the related work. Finally, Section 5 concludes the paper.

II. CLOUD COMPUTING AND IMAGES Cloud Computing promises lower costs due to better

economies of scale. To achieve low cost, it is critical to eliminate manual processes associated with systems management and provisioning. In this section we briefly describe a high level architecture of a Cloud Computing model and discuss the role of images in the paradigm.

A. Cloud Computing Cloud Computing provides a self-service environment

for requesting compute resources. Figure 1 shows the logical architecture of a Cloud Computing deployment. Typically, service providers create a pool of networked hardware resources. Each hardware resource runs virtualization software such as VMWare[4], Xen[5], or KVM[6]. Virtualization enables each hardware resource to host and run multiple virtual machines. The Cloud resources are made available to users through a User Portal or Web Services APIs. User requests are forwarded to a Provisioning Component that performs the following tasks:

1) Refers to a Resource Manager to locate a hardware resource that has available capacity to run the virtual machine that the user requested.

2) Copies the image for the virtual machine from the Image Library to the target hardware resource.

3) Creates the configuration for the virtual machine on the target hardware resource and creates the virtual machine.

4) Notifies the user after the virtual machine has been successfully created.

978-1-4244-6534-7/10/$26.00 ©2010 IEEE

Figure 1 Logical Architecture of a Cloud Computing deployment

B. Images An image is nothing more than the disk representation of

a virtual machine pre-installed with an operating system. A virtual machine image preloaded with application software is further referred to as an appliance. An image or an appliance consists of two files: the configuration file, and the actual disk image. The configuration file represents the metadata about location of the disk image file, display name, attached network and peripheral devices.

Cloud Computing uses images as the building blocks for provisioning. When a user requests a compute resource, the Provisioning Component locates and retrieves the appropriate image from the Image Library, and uses the image to create the new virtual machine.

C. Building solutions Users create complex multitier deployment topologies for their solutions by requesting multiple virtual resources for each tier in the topology, installing the required software components, and wiring the virtual resources together. Though workable, this approach creates a management nightmare as the solution needs to be managed as a collection of individual virtual resources. Further, the need to install and configure each component separately introduces steps that are error-prone, and costly in terms of management tasks. A successful Cloud Computing deployment should offer a mechanism where system administrators or users can create complete solutions and have them available in the Image Catalog for provisioning. A solution-based approach would simplify system administration by allowing system administrator to test and capture working topologies that can be easily deployed at the click of a button.

III. VEGA – A FRAMEWORK FOR SPECIFYING SOLUTIONS In the following sections, we present the design and

architecture of Vega, a framework that enables system administrators and developers to specify, and deploy complete solutions in a Cloud environment. We discuss the framework with an example of a three-tier solution. We also

demonstrate how the framework simplifies the management of the environment.

A. Composite Appliances The intent of the Vega framework is to enable system

administrators to specify a solution’s requirements and deployment topology using an XML file (We are in the process of converting the XML to an OVF compatible format). To achieve this goal, Vega uses the concept of a Composite Appliance. A Composite Appliance is a collection of individual appliance images that are preconfigured to work together. Though the detailed knowledge of configuration points in a Composite Appliance for a given solution is required, the requirement analysis and the determination of configuration points are beyond the scope of this paper.

A Composite Appliance is specified using an XML file. Figure 2 shows the UML diagram for specifying a solution requirements and topology of a Composite Appliance. The top-level class for the specification is an Appliance section. Each Appliance section consists of one or more Node sections. Further, each Node section consists of an Image Requirement section and an Image Specification section. The Node section corresponds to one logical node of the solution. The Image Requirement section specifies the memory, CPU, disk and network requirements of the image associated with the solution. This approach allows a system administrator to quickly configure and deploy different sized solution by merely changing the values in the specification.

The Image Specification section contains information about the image that is part of the solution. The information is used by the Provisioning Component at the time of deployment of the solution. As mentioned in Section II.B, each image may contain a preconfigured software stack. The values correspond to the following attributes

• OSType – Descriptive name of the Operating System name and version installed on the image

• ImageId – Identifier of the image in the image library

• Filename – File name for the image • VirtualizationType - Type of virtualization

technology needed for this image • RunOnceScript – A semi-colon (;) separated string

of commands that are executed when the newly-created virtual machine from the image is first started. We discuss this attribute in the following sections.

• Userid – Specifies the “Userid” with root privileges for the new virtual machine

• Password – Specifies the password that needs to be set for the “Userid” with root privileges

Each Image Specification section also contains multiple Software Attributes. The Software Attributes contain the name-value pairs describing the attributes associated with the software stack that is configured on the specific image. For example, in an image that has the DB2 Server software installed, the Software Attributes section would contain the db2port, db2InstallationPath, db2UserId, and db2Password

information. Since an image may be authored by another person, this information provides system administrators with information about the image so that they can write scripts to change the configuration as needed.

Figure 2 UML Schema of a Composite Appliance

B. Linking Image Specification into a Solution. System administrators specify a solution by creating an

XML description with sections corresponding to each node in the solution. The Provisioning Component processes the XML specification and goes about its task of provisioning individual virtual machines. Since the hostname and ipaddress for each virtual machine is assigned to each solution via late binding, the Vega framework provides a mechanism where system administrators use predefined variables for the assigned hostname and ipaddress information about each node in the solution. This information is passed as parameters to the RunOnceScript section and further used for configuring any dependencies between different nodes in the solution.

We further illustrate through an example of a three-tier solution that uses WebSphere Portal Server, WebSphere Process Server, and DB2 Database Server. The WebSphere Portal Server is configured to communicate with the WebSphere Process Server and the DB2 Server, while the WebSphere Process Server is configured to communicate with the DB2 Server. The XML specification consists of three Node sections.

<appliance> <name>My First Solution </name> <description>Three tier Solution </description> <node> <name> WebSphere Portal Server </name> <description> WPS for tier1 </description> <image-requirement> <req_memory> 8GB </req_memory> <req-cpu> 2 </req-cpu> <req-disksize> 40GB </req-disksize> <req-network> 1GB </req-network> </image-requirement> <image-specification> <ostype>Redhat 5u3 </ostype> <imageid>img-1000</imageid> <filename>WPimage.tgz</filename> <virtualization-type> KVM </virtualization-type> <runoncescript>”/root/fixup.sh $hostname

$domain;/opt/ibm/WebSphere/AppServer/setWAS.sh $HOST1;/opt/ibm/WebSphere/AppServer/setDB2Server.sh $HOST2;”</runoncescript>

<userid> root </userid> <password> passw0rd </password> <software-attributes> <software-attribute name=”userid” value=”admin”/> <software-attribute name=”passwd” value=”admin”/> ….. <software-attribute name=”port” value=”9080”/> </software-attributes> </image-specification> </node> <node> <name> WebSphere Application Server </name> <description>WAS for tier2 </description> … <image-requirement> …. </image-requirement> <image-specification> <runoncescript>”/root/fixup.sh $hostname

$domain;/opt/ibm/WebSphere/AppServer/setDB2Server.sh $HOST2;”</runoncescript>

.... </image-specification> </node> <node> <name> DB2 Server </name> <description>DB2 for tier3 </description> … <image-requirement> …. </image-requirement> <image-specification> <runoncescript>”/root/fixup.sh $hostname $domain;

”</runoncescript> .... </image-specification> </node> </appliance>

Vega uses the following convention for identifying the hostnames specified in the solution. The identifier assigned to the first Node section is 0, the second is 1, and so on. Vega provides the following keywords: HOST, IPADDRESS, and DOMAIN that can be used in the XML specification. So HOST0 refers to the hostname assigned to the first Node in the XML, HOST1 is the hostname assigned to the second Node, and so on. Keywords HOSTNAME and DOMAIN refer to the hostname and domain assigned to the current Node.

In the specification, the system administrators and developers can use the RunOnceSection to define scripts and pass the appropriate information about the other nodes in the solution. In the example above, the DB2 Server is the third node. It is identified by HOST2 and IPADDRESS2 keywords. Both, the WebSphere Portal Server and the WAS Server nodes refer to the actual hostname of the DB2 Server as HOST2. This information is passed in the RunOnceSection, where the scripts use the information to configure the datasources for the two servers. Vega framework invokes the scripts listed in the RunOnceSection when individual servers are first started. After the scripts are run, subsequent restarts of the virtual machines do not invoke the scripts.

In the example above, the <runoncescript> section for the WebSphere Portal Server consists of invoking script fixup.sh to set the new hostname and domain name for the virtual server and scripts (setWAS.sh and setDB2Server.sh) to configure the WebSphere Application Server and the DB2 Server. The information about the new hostnames assigned to the WebSphere and the DB2 Server is specified by HOST1 and HOST2 variables.

C. Resource Management The Vega framework provides a Resource Manager to

manage the hardware and network resources in the system. The resources are organized in groups and are placed in different resource pools. For instance, the available host machines with information about their disk storages, memories and CPUs are placed in a system-wide host-pool, while the list of free hostnames and ipaddresses for guest virtual machine instances are specified in a defined free-guest-pool. In response to a provisioning request, the Resource Manager provides both a free ipaddress from the free-guest-pool and a host machine from the host-pool to the Provisioning Component for provisioning. Multiple Host Pools allow users to select the subnet on which a VM should be created. The pools are tagged allowing the VM requirements to be matched with capabilities of hosts within the pool.

D. Placement of Virtual Machine The process of selecting the most suitable host machine

for deploying a virtual machine is known as virtual machine placement. During placement, the available host machines from a given host-pool are rated based on the potential virtual machine’s resource requirements and the anticipated usage of resources by taking into consideration of the overall placement goal.

Particularly, the placement goal can be either load balancing among hosts or resource maximization on individual hosts. For the goal of load balancing, the suitability of each host is rated with the intent to minimize the processing load on any one host. In this case, a simple round-robin placement algorithm may be used on available hosts that satisfy the minimum requirements. For the goal of resource maximization, the suitability of each host is rated based on the intention to consolidate multiple low-utilization workloads on a single host. In this case, the placement algorithm is required to determine the capacity limits for a particular host and to promote that host for placement until the limits are reached. Vega placement advisor also allows tags to match the requirements for the images with the capabilities of the hosts. For example, a hypervisor may expose its capability of hosting 64-bit or 32-bit or both. The image requirement would then be matched against the hosts providing that capability.

In Vega, a default placement algorithm based on load balancing is provided to allow identification of physical host from a given host-pool for virtual machine placement. To enhance flexibility and usefulness, Vega utilizes a plug-in mechanism to allow the execution of any placement algorithm that implements its Java interface. Additional details can be found in [7].

E. Systems Management Simplified One of the main challenges that system administrators face is a need for a replicable process that can configure and deploy solutions without any manual intervention. The approach specified in the paper simplifies systems management in several aspects. First, system administrators can develop and test pre-configured images for specific software components. The images need only to expose some system administration scripts that can configure specific aspects of the image. Second, the pre-configured images can be used to create complex deployment topologies using the approach described in this paper. The prepared solutions can then be made available to users for deployment in a Cloud environment. Third, any subsequent changes to the solutions can be easily accomplished by encoding additional commands in the RunOnceSection of the Composite Appliance xml file.

The work described in this paper has been implemented as part of IBM Research Compute Cloud environment for use by the worldwide IBM Research community. For single VM solutions, the mechanism is primarily used for configuring the software stack, and enforcing security policies (e.g. changing root password). We have also used the mechanism to create several multitier solution definitions. For example, a retail analytic solution containing a data integration server, a database server and a business intelligence server is being used in a customer first-of-a-kind project. The RunOnceSection contains commands to configure the data sources, schedule jobs, and create reports for the user.

The approach described in this paper has served well for specifying solution topologies where the configuration steps can be expressed as a set of commands that can be executed independently on each virtual machine. The approach does not work for complex scenarios where the set of commands need to be coordinated among different virtual machines. For simple cases, when the execution of a command on a virtual machine is dependent on the result of a command on another virtual machine, we poll the status of an application using the RunOnceScript and wait until it is started before continuing, For example given in Section III.B, before starting the WebSphere Process Server on one virtual machine, we wait for the DB2 database server to be started on another virtual machine by polling the DB2 port.

IV. RELATED WORK Current work in Cloud Computing has been centered on

providing single VM Compute resources to the end users. Both, Amazon EC2 [8] and IBM Developer Cloud [9] provide the ability for users to provision single images in the Cloud. Dynamic scalability of web applications deployed through appliances in a cloud has also been addressed in [10]. No support is available to define and request multi-server solutions. A user needs to manually configure solutions after the VMs are available. Other vendors are focusing more on providing application runtimes (Google App Engine[11]), or delivery of specific business function via the software-as-a-service delivery model (Salesforce [12], GoogleDocs[13]).

Solution-based deployments have not been discussed in context of Cloud Computing. In [14], the authors describe a model-driven approach to deploying solutions in virtualized environments. Our approach is more extensible as it allows system administrators to specify and execute system administrations tasks as part of the solution deployment.

V. CONCLUSION We have presented a framework for system administrator

to use to specify, and deploy composite solutions for a Cloud Computing environment. The approach uses an XML schema that allows administrators to specify the attributes

and inter-dependencies among solution components. The xml specification is then used by the Provisioning Component to automatically deploy and configure complex solutions without any manual intervention. Our work has demonstrated the advantages of using a simple text based approach that builds upon current skills that system administrators are familiar with. As part of future work, we plan to extend the work to enable system administrators to specify operational characteristics of the solution such as deployment zones, and availability requirements.

REFERENCES [1] G. Gruman, "What cloud computing really means", InfoWorld, Jan.

2009. [2] R. Buyya, Y. S. Chee, and V. Srikumar, “Market-Oriented Cloud

Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities”, Department of Computer Science and Software Engineering, University of Melbourne, Australia, July 2008, pp. 9.

[3] D. Chappell, “A Short Introduction to Cloud Platforms”, David Chappell & Associates, August 2008.

[4] VMware, http://www.vmware.com/ [5] Xen, http://wiki.xensource.com/xenwiki/ [6] KVM, http://www.linux-kvm.org/ [7] T. Kwok and A. Mohindra, ”Resource Calculations with Constraints,

and Placement of Tenants and Instances for Multi-tenant SaaS Applications”, Services-Oriented Computing – ICSOC, 2008, 5364

[8] Amazon EC2, http://aws.amazon.com/ec2 [9] IBM Developer Cloud, http://www.ibm.com/cloud/developer [10] T. C. Chieu, A. Mohindra, A. A. Karve and A. Segal, “Dynamic

Scaling of Web Applications in a Virtualized Cloud Computing Environment”, Proceedings of the IEEE International Conference on e-Business Engineering (ICEBE 2009), Macau, China, Oct. 2009, pp. 281-286,

[11] Google App Engine, http://code.google.com/appengine/ [12] Salesforce, http://www.salesforce.com [13] GoogleDocs, http://docs.google.com [14] A. V. Konstantinou, T. Eilam, M. Kalantar, A. A. Totok, W. Arnold

and E. Snible, “An architecture for virtual solution composition and deployment in infrastructure clouds”, International Conference on Autonomic Computing Proceedings of the 3rd international workshop on Virtualization technologies in distributed computing.

Documents

[IEEE Distributed Processing, Workshops and Phd Forum (IPDPSW) - Atlanta, GA, USA (2010.04.19-2010.04.23)] 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops