10
Creating Self-healing Service Compositions with Feature Models and Microrebooting Abstract: Service-oriented architectures (SOAs) provide loose coupling and software reuse in enterprise applications. SOAs expose individual reusable software applications or components as remotely accessible services that communicate using standardized message-oriented protocols, such as the Simple Object Access Protocol (SOAP). SOAs enable applications to heal themselves by failing over to alternate services when a critical application component or service reference fails. The numerous intricate details of identifying errors, releasing resources used to access services, and planning a recovery strategy, however, makes it hard to develop applications that can heal by swapping services. Model-driven engineering (MDE) offers a potential solution to handling the complexity of building applications that can heal by swapping services. Existing MDE solutions for building adaptive applications require developers to explicitly model each potential error state and recov- ery action, which can be extremely complex. Moreover, developers must then implement the complex recovery actions modeled, which adds significant development complexity. This paper presents an MDE technique called Refresh that is based on microrebooting and uses (1) feature models to derive a new and correct service composition when a failure occurs, (2) an application’s component container to shutdown the reference to the failed service, and (3) the application con- tainer to reboot the subsystem with the new service composition. We also present the results from a case study that shows Refresh significantly reduces both modeling and healing implementation effort. 1 Introduction Organizations are rapidly deploying service-oriented architec- tures (SOAs) that create loosely coupled and highly reusable ap- plication components through the use of standardized message- oriented protocols, such as the Simple Object Access Protocol (SOAP). Often, within a single organization or group of collab- orating organizations, multiple services are available that can accomplish a particular task. The redundancy in services pro- vides the potential to create applications that can heal them- selves by failing over to leverage similar services when a service in their service composition (i.e., the services used by the appli- cation) fails. Failing over to another equivalent—but not neces- sarily identical—service can create robust applications that can adapt to service failures and remain functional. Designing and implementing a mechanism to build self- healing service compositions is complex. Since software de- velopment projects already have low success rates and high costs, building a service capable of healing is hard [4]. More- over, building adaptive mechanisms greatly increases applica- tion complexity and can be hard to decouple from application code if the development of the adaptive mechanism is not suc- cessful. Model-driven engineering (MDE) provides a potential solu- tion to managing the complexity of developing adaptive ser- vices. In an MDE approach, high-level adaptive models are used to generate the complex adaptive code required to heal the application when services fail. This approach allows MDE tools to generate much of the complex healing code, and in many cases, remove the healing code if it does not function properly. Although numerous approaches [8, 16, 12, 7] have been devised to build MDE models and platforms for enterprise applications, these approaches tend to suffer from one or more of the follow- ing problems: 1. They require significant development effort to explicitly model the numerous potential error states and recovery paths from an error state to a correct state and 2. They require extensive effort to develop the adaptation ac- tion implementations for a realistic application. This paper presents an MDE approach and toolset called Re- fresh, for designing and implementing self-healing service com- positions that addresses the limitations outlined above. Refresh is specifically designed for healing a service composition when (1) the application is implemented with a component-based technology, such as Enterprise Java Beans or the CORBA Com- ponent Model, (2) catastrophic failure is imminent, (3) the ap- plication and any redundant instances in an application cluster cannot continue functioning correctly in their current configura- tion, and (4) the application has alternate composable services that could potentially be exploited to avoid failure. For each potential error state that an application’s service composition could enter, conventional MDE adaptation tech- niques [8, 16, 12, 7] require explicitly modeling both the er- ror state and the numerous actions to transition from the error state to a correct state. For large enterprise applications, more- over, there are usually a significant number of potential error Copyright c 200x Inderscience Enterprises Ltd. 1

Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

Creating Self-healing ServiceCompositions with FeatureModels and MicrorebootingAbstract:

Service-oriented architectures (SOAs) provide loose coupling and software reuse in enterpriseapplications. SOAs expose individual reusable software applications or components as remotelyaccessible services that communicate using standardized message-oriented protocols, such as theSimple Object Access Protocol (SOAP). SOAs enable applications to heal themselves by failingover to alternate services when a critical application component or service reference fails. Thenumerous intricate details of identifying errors, releasing resources used to access services, andplanning a recovery strategy, however, makes it hard to develop applications that can heal byswapping services.

Model-driven engineering (MDE) offers a potential solution to handling the complexity ofbuilding applications that can heal by swapping services. Existing MDE solutions for buildingadaptive applications require developers to explicitly model each potential error state and recov-ery action, which can be extremely complex. Moreover, developers must then implement thecomplex recovery actions modeled, which adds significant development complexity. This paperpresents an MDE technique called Refresh that is based on microrebooting and uses (1) featuremodels to derive a new and correct service composition when afailure occurs, (2) an application’scomponent container to shutdown the reference to the failedservice, and (3) the application con-tainer to reboot the subsystem with the new service composition. We also present the results froma case study that shows Refresh significantly reduces both modeling and healing implementationeffort.

1 Introduction

Organizations are rapidly deploying service-oriented architec-tures (SOAs) that create loosely coupled and highly reusable ap-plication components through the use of standardized message-oriented protocols, such as the Simple Object Access Protocol(SOAP). Often, within a single organization or group of collab-orating organizations, multiple services are available that canaccomplish a particular task. The redundancy in services pro-vides the potential to create applications that can heal them-selves by failing over to leverage similar services when a servicein their service composition (i.e., the services used by the appli-cation) fails. Failing over to another equivalent—but not neces-sarily identical—service can create robust applications that canadapt to service failures and remain functional.

Designing and implementing a mechanism to build self-healing service compositions is complex. Since software de-velopment projects already have low success rates and highcosts, building a service capable of healing is hard [4]. More-over, building adaptive mechanisms greatly increases applica-tion complexity and can be hard to decouple from applicationcode if the development of the adaptive mechanism is not suc-cessful.

Model-driven engineering (MDE) provides a potential solu-tion to managing the complexity of developing adaptive ser-vices. In an MDE approach, high-level adaptive models areused to generate the complex adaptive code required to heal theapplication when services fail. This approach allows MDE toolsto generate much of the complex healing code, and in manycases, remove the healing code if it does not function properly.

Although numerous approaches [8, 16, 12, 7] have been devisedto build MDE models and platforms for enterprise applications,these approaches tend to suffer from one or more of the follow-ing problems:

1. They require significant development effort to explicitlymodel the numerous potential error states and recoverypaths from an error state to a correct state and

2. They require extensive effort to develop the adaptation ac-tion implementations for a realistic application.

This paper presents an MDE approach and toolset calledRe-fresh, for designing and implementing self-healing service com-positions that addresses the limitations outlined above. Refreshis specifically designed for healing a service composition when(1) the application is implemented with a component-basedtechnology, such as Enterprise Java Beans or the CORBA Com-ponent Model, (2) catastrophic failure is imminent, (3) theap-plication and any redundant instances in an application clustercannot continue functioning correctly in their current configura-tion, and (4) the application has alternate composable servicesthat could potentially be exploited to avoid failure.

For each potential error state that an application’s servicecomposition could enter, conventional MDE adaptation tech-niques [8, 16, 12, 7] require explicitly modeling both the er-ror state and the numerous actions to transition from the errorstate to a correct state. For large enterprise applications, more-over, there are usually a significant number of potential error

Copyright c© 200x Inderscience Enterprises Ltd.

1

Page 2: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

AccountDAO OrderDAO ProductDAO ItemDAO

DAOs

Single

JTAPresentRef

Multiple

Datasources

JTAPresent JTANotPresent

JTA

PetStore

PetStoreServiceComposition

Figure 1: Pet Store Service Composition Feature Model

states and complex nuanced considerations, such as availabil-ity of other services, database locks held, and transactionstates.These considerations make it hard to create a model for ser-vice composition healing. Rather than explicitly modelingerrorstates and recovery actions, Refresh usesFeature Models[17]to capture the rules for determining what is or is not a correctconfiguration/error state.

Feature models describe an application in terms of points ofvariability and their affect on each other. For example, in an e-commerce application, a feature might be a service for access-ing an order database. The order feature can have different sub-features, such as different potential services that can serve as theorder database access service. If one particular order databaseaccess service is chosen, it excludes the other potential orderservices from being used (it constrains the other features). Ifthe chosen service fails, a new feature selection can be derivedthat does not include the failed service’s feature.

This paper provides the following contributions to the studyand development of self-healing service compositions:

• It shows how when a failure occurs (such as the inabilityto communicate with a dependent service) Refresh uses theapplication’s feature models to derive a new and valid ser-vice composition from the currently available services andcomponents, which eliminates the need to model every po-tential error state and recovery action.

• It describes Refresh’s use of an approach based onmi-crorebooting [9], which is a technique for rebooting asmall set of failed components rather than an entire appli-cation server, to shutdown the failed service compositionand launch the newly derived composition, eliminating theneed for developers to implement recovery actions.

• It presents empirical results from a case study applyingRefresh to an e-commerce application that shows Refreshprovides a∼55% decrease in modeling complexity and∼60% decrease in implementation cost versus other MDEapproaches for building self-healing service compositions.

The remainder of this paper is organized as follows: Section2presents the e-commerce application that we will use as a casestudy throughout the paper; Section 3 enumerates current chal-lenges in applying existing MDE techniques for building adap-tive applications to our case study; Section 4 describes Refresh’sapproach to using feature models and microrebooting to reducethe complexity of modeling and implementing an applicationthat can heal; Section 5 analyzes empirical results obtained from

applying Refresh to our case study; Section 6 compares Refreshwith related work; and Section 7 presents concluding remarks.

2 Case Study: The Java Pet Store

To show the complexity of applying conventional MDE tech-niques to creating healing applications, we present a case studybased on Sun’s Java Pet Store e-commerce application [20]. ThePet Store provides a web-based storefront for selling pets.Thestore includes multiple categories of pets, products (e.g., Bull-dog and Iguana), and individual product items (e.g., FemaleBulldog Puppy). Customers browse for pets and purchase dif-ferent items.

Sun and other parties use the Pet Store as a reference appli-cation to showcase various enterprise Java technologies. Sincethe Pet Store application is widely known and can serve as areference for comparing different technologies, the Pet Storehas been re-implemented in different programming languagesand with different frameworks. For example, the Java SpringFramework [15] has created the Spring Pet Store. The SpringFramework’s version of the Pet Store includes support for inte-grating web services and is the implementation we have chosenfor the case study.

Figure 1, presents a high-level feature model of the featuresrelated to the Pet Store’s data tier. Features are denoted bythevarious boxes in the diagram. The levels of hierarchy representsubfeatures. For example, all PetStore instances haveDAOs,Datasources, andJTA as subfeatures (the filled circles at thetop of the child features denote required features). The PetStore Java Transaction API (JTA) feature can either be present,denoted when the childJTAPresentfeature is selected, or notpresent.

A Feature can also specify rules restricting the selection ofother features if the feature is selected. For example, the selec-tion of theDatasources/Multiple features requires thatJTAPre-sentalso be selected. This requirement is denoted by theJTAP-resentRefrequired feature reference underMultiple.

HessianOrderServiceSOAPOrderService LocalOrderDAO BurlapOrderService

OrderDAO

Figure 2: Feature Model of the J2EE Pet Store’s OrderDAO

The SpringFramework allows the swapping of individualcomponents in the Pet Store with proxies to remote services.

2

Page 3: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

Figure 1 lists the various DAOs that are available in the Pet-Store. Each DAO can potentially be swapped for a remote ser-vice. Figure 2 shows the various options for the OrderDAO.Either the OrderDAO can be implemented by a local compo-nent or it can be implemented as a dynamically created Javaproxy to a SOAP, Burlap, Hessian, or RMI order service. Thecase study focuses on failing over from the middle-tier DAOsto different remote services to demonstrate the complexities ofapplying existing MDE techniques.

3 Challenges of Creating Self-healing Service Compositions

A common approach [8, 16, 12, 7, 3, 18, 13] to modeling appli-cation healing is to model the individual error states that the ap-plication can enter and a recovery path (a sequence of recoveryactions) to return the application to a correct state. For exam-ple multiple MDE approaches [3, 18, 13] useState Charts[14]to capture the various error states of an application and these-quences of recovery actions to return to a correct state. Enu-merating each potential error state and each recovery path canrequire significant modeling complexity. This section showshow even when an MDE tool can generate the majority of theself-healing code for a service composition, the requirement tomodel and implement recovery actions places a heavy burdenon developers.

3.1 Challenge 1: Significant Modeling Complexity toSpecify a Recovery Path from an Arbitrary ErrorState to a Correct State

A healing model must use different error states for each im-plementation of a service type or component type. The fail-ure of the OrderDAO seems like a fairly simple error conditionto model and specify a recovery path for, but it is not. Theproblem with modeling each potential error state and recoverypath is that the series of recovery actions that must be invokedis different for the local OrderDAO and remote service imple-mentation.

For example, if the local OrderDAO fails, it may be swappedfor another implementation. If a remote service fails, it may benecessary to free resources, such as memory used by caches ornetwork ports, that were used by a connection to it. Servicesconnected through different protocols also need separate errorstates to associate their unique recovery actions with.

If the Pet Store’s service composition healing is modeled us-ing State Charts, as shown in Figure 3, there are 4 differentstates for each DAO. To increase readability, Figure 3 does notinclude events and guards on transitions, which further compli-cate the model. There are 20 different states needed to representthe potential services and components that can serve as the PetStore’s DAOs.

For every error state that the system needs to recover from,the model must explicitly specify a recovery path. For ex-ample not only should the failure of a Hessian and SOAP-basedorder service be modeled separately, but the series of recov-ery actions attached to each also should be modeled separately.

Figure 3: Pet Store Service Composition State Chart

As with error states, the number of recovery path specificationsproduced for healing each component of an enterprise applica-tion can be large.

The Pet Store requires a number of recovery actions to takeplace to swap the service used for a DAO. The various ac-tions for swapping the OrderDAO to one of the remote ser-vices is modeled in Figure 4. First, to swap a DAO, a SpringHotSwappableTargetSource (an object capable of swappingan active component in the application) must be obtained. Next,any resources held by the old DAO implementation or DAOproxy to a remote service must be released. After releasing re-sources, a new proxy to another remote service can be created.Finally, the newly created proxy can be swapped into the appli-cation using theHotSwappableTargetSource. Including therecovery paths in the model ups the total number of states perDAO from 4 to 25.

Healing a local error may require evaluating the globalapplication state. For example, if the Java Transaction API(JTA) is being used to manage transactions, the Pet Store canfail over to any remote service and still provide proper transac-tion behavior. If JTA is not being used to manage transactions,however, the system can only provide transactions across a sin-gle datasource, meaning that all the DAOs must be accessing thesame database instance. Requiring the use of a single databaseinstance prevents an arbitrary service from being chosen. In thenon-JTA situation, the service may only fail over to a remoteservice if the service is accessing the same database instance asall other referenced remote services.

An extension of the OrderDAO recovery State Chart to in-clude the JTA consideration is show in Figure 5. Each transitionto the swap states now includes a guard to ensure that swappingis allowed. A newGlobalSwapControllerhas been added to themodel to only allow swapping when either JTA is present or asingle data source is being referenced by the application’sser-vice composition. Section 4.2 shows how Refresh uses featuremodeling and other techniques to eliminate the need to modelevery potential error state and recovery action.

3

Page 4: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

Figure 4: OrderDAO Recovery Paths State Chart

3.2 Challenge 2: Significant Complexity to Write Re-configuration Code that Can Bring the System froman Arbitrary Error State to a Correct State.

Regardless of the MDE approach used to build the applica-tion healing mechanism, developers must always implementthe application-specific recovery actions. This requirement par-allels the development of enterprise applications and services,where despite the frameworks used, developers are always re-quired to implement the core business logic. Some specializedMDE tools may provide pre-built recovery actions for specificdomains, but in general, nearly every MDE approach requiresdevelopers to write the recovery actions.

For each path from an error state to a recovery state, com-plex recovery logic must be written. The more error statesthat are possible in the application, the more recovery actionsmust be written by developers. These numerous recovery ac-tions can be both expensive to develop and hard to test, whichcan become problematic when projects are already prone to fail-ure and cost overruns.

Figure 5: OrderDAO Recovery Paths State Chart when Ac-counting for JTA

In the Pet Store application, there are four separate DAOs thatcan each be swapped to one of four remote services to avoidfailures. To implement a simple swapping mechanism in thePet Store, the Spring framework provides numerous complexutility classes for hotswapping components and connectingtoremote services, such as Apache Axis web services. Despitethese numerous utility classes (as shown in Section 5), to createan action to swap just the OrderDAO to one of the four remoteservices requires 77 lines of Java code to implement the swap-ping logic and 11 lines of XML code to enable and configure theswapping action in the Pet Store. Although some level of refac-toring and object-oriented design can be used to share commonlogic across actions, implementing each action still requires sig-nificant effort. Section 4.3 shows how microrebooting can sig-nificantly reduce this substantial development burden by load-ing a new service composition derived by a constraint solver.

3.3 Challenge 3: Executing Arbitrary Recovery Actionsin Arbitrary Error States can have Numerous Unfore-seen Side-effects.

Error states are often specified in such a way that the systemas a whole can be in numerous different states that all fall un-der the definition of the same error state. For example, whenthe OrderDAO fails, the Pet Store can have orders in progress,category listings in progress, and numerous other nuanced con-ditions. Building a robust and correct recovery action requirestaking into account the side effects of the recovery action on thecomplex overall state of the application.

For example, what will happen if the local OrderDAO is

4

Page 5: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

swapped with a remote service during the submission of one ormore customer orders? Does the safety of the swap depend onwhether or not a local or JTA-based transaction mechanism isused? These complex nuanced questions are not easy to answerand must be considered for each recovery action implementa-tion. These intricacies make developing a recovery action thatwill not lead to unforeseen problems hard. Section 4.3 how us-ing microrebooting as the basis for recovery eliminates many ofthese hard to predict recovery side-effects and also provides amore well understood state transition mechanism.

4 Modeling and Building Healing Adaptations with Refresh

The challenges in Sections 3.1-3.3 stem from two causes: (1)the requirement that every error state and recovery path must bemodeled explicitly and (2) that developers must implement ev-ery complex recovery action. This section describes our MDEtoolset, calledRefresh, that eliminates these two sources of sub-stantial complexity.

4.1 Overview of Refresh

Refresh is based on the concept of microrebooting [9]. Whenan error is observed in the application, Refresh uses the applica-tion’s component container to shutdown and reboot the applica-tion’s components. Using the application container to shutdownthe failed subsystem takes milliseconds as opposed to the sec-onds required for a full application server reboot. Since itislikely that rebooting in the same configuration (e.g.referencingthe same failed remote service) will not fix the error, Refreshderives a new application configuration and service composi-tion from the application’s feature models that does not containthe failed features (e.g., remote services).

The service composition dictates the remote services usedby the application. The application configuration determinesany local component implementations, such a SOAP messagingclasses, needed to communicate and interact properly with theremote services. After deriving the new application configura-tion and service composition, Refresh uses the applicationcon-tainer to reboot the application into the desired configuration.The overall structure of Refresh is shown in Figure 6.

Refresh interacts directly with the application container, asshown in Figure 6. During the initial and subsequent con-tainer booting processes, Refresh transparently insertsappli-cation probesinto the application to observe the applicationcomponents. Observations from the application componentsaresent back to anevent stream processorthat runs queries againstthe application event data, such as exception events, to identifyerrors. Whenever an application’s service composition needsto be healed,environment probesare used to determine avail-able remote services and global application constraints, such aswhether or not JTA is present.

Refresh uses event stream processing [19] to run queriesagainst the application’s event data and identify feature fail-ures. The initial implementation of Refresh, based on the SpringFramework’s IoC container, uses the Esper event stream pro-cessor [2] for Java. Esper is a high-performance event stream

Figure 6: Refresh Structure

Figure 7: Error Propogation to Refresh

5

Page 6: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

processor that is capable of handling 100,000 events a secondwith 2,000 queries on a single dual-core CPU [1].

Each feature in the feature model that could potentially failis associated with a group of event stream queries. At runtime,when a query associated with a feature returns a result, Refreshis notified that the associated feature has failed, as shown in Fig-ure 7. The data and objects observed and analyzed by Refreshare determined by the query specifications.

Once Refresh is notified of a feature failure, its three maintasks are to use (1) the container to shutdown the application’scomponents, (2) the application’s feature model to derive anewapplication configuration and service composition, and (3)thecontainer to reboot the application in the new configuration.The sequence of events from a feature failure notification totherebooting of the container are shown in Figure 8.

Figure 8: Refresh Reconfiguration, Shutdown, and Launch Re-covery Sequence

To derive a new configuration of the application that does notinclude the failed feature, Refresh transforms the featureselec-tion problem into a constraint satisfaction problem (CSP) usingtechniques that have been developed by us an others in priorwork [22, 6, 23]. Once the feature selection problem is trans-formed into a CSP, a high-performance general purpose con-straint solver, such as the Java Choco [5] solver, is used to derivea new set of features/configuration for the application.

After the new application configuration and service composi-tion is derived, Refresh invokes the container’s shutdown se-quence to properly release resources, abort transactions,andperform other critical activities. The new configuration isin-jected into the container through programmatic calls or by re-generating the application’s configuration files [22]. After theconfiguration is injected into the container, the application islaunched in the new configuration without the failed service, asshown in Figure 9.

4.2 Use Feature Modeling to Capture the Rules for Deriv-ing what is Considered a Correct State

As discussed in Section 3.1, modeling each individual errorstate and recovery path is complex. Refresh uses feature model-ing to avoid requiring developers to model each individual errorstate and recovery path. Feature modeling captures the rules–rather than individual error states and recovery paths–forde-riving what constitutes a correct application configuration and

Figure 9: Refresh Launches the Application in the New Config-uration

service composition. In terms of healing, feature modelingde-scribes:

• The component or service types that are needed to com-pose the application

• The sets of components or services that can serve as theimplementation of a service type in the application’s com-position

• The rules dictating the requirements, such as dependent li-braries, required by each component or service implemen-tation and

• The rules constraining how the choice of one service im-plementation restricts the choices of other component orservice implementations in the same application composi-tion.

When the failure of a feature is observed, Refresh uses thefeature model of the application to derive an alternate set offeatures for the application that does not include the failed fea-ture. For example, in the Pet Store, when theLocalOrderDAOfeature fails, Refresh uses the feature model to derive an alter-nate feature selection for the Pet Store. In the example shown inFigure 10, Refresh chooses a new feature selection that usestheBurlapOrderServicerather than the failedSOAPOrderService.

Automated Feature Selection Using a Constraint Solver:The key to Refresh’s healing capabilities is its ability to use aconstraint solver [10] to derive a new feature selection forthe

6

Page 7: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

Figure 10: Deriving a new Service Composition from the PetStore Feature Model

application automatically. Prior work [22, 5, 6] provides ex-tensive details on the process for transforming a feature selec-tion problem into a constraint satisfaction problem (CSP) [10],which is the input format of a constraint solver, and deriving afeature selection. We briefly cover this mapping below.

A constraint satisfaction problem is a series of variables anda set of constraints over the variables. For example, "A+B<C"is a constraint satisfaction problem over the integer variablesA,B, andC. A constraint solver automatically derives a correctlabeling (values for the variables). The labeling "A = 1,B =

2,C = 4" is a correct labeling of the example CSP.

A selection of features from a feature model can be repre-sented by a set of integer variables with domain 0 or 1. Eachvariable represents a unique feature from the feature model.If the variable representing theHessianOrderServiceis repre-sented by the variableV1, thenV1 = 1 in a labeling of a featureselection CSP means that the feature is selected in the solution.If the labeling containsV1 = 0, it implies that the feature is notselected in the solution. The configuration of an application andits service composition is represented as a set of these variablesthat denote which services and application components are en-abled in a configuration.

Rules dictating the proper composition of the services arespecified as constraints over theVi variables. For example,since only one ofHessianOrderServiceandSOAPOrderServicecan be used at a time by the Pet Store, a constraint can beused to capture this rule. Let,V2 be the variable representingtheSOAPOrderService. This rule is specified as the constraintV1 = 1→V2 = 0. As described in [22], complex rules, such asmemory constraints, can be described using a CSP.

When a feature is flagged as failed, Refresh adds a new con-straint to the feature selection process preventing the failed fea-ture from being selected (e.g., Vi = 0). Refresh then uses a theconstraint solver to derive a new feature selection that canbeused by the application based on the environmental constraints(e.g. JTA vs. No JTA) and feature model composition con-straints (e.g., only one of the order services may be selected ata time).

4.3 Reusing the Component Container’s Shut-down/Configuration/Launch Mechanisms for StateTransitions

Sections 3.2-3.3 show the complexity and large developmentburden of writing recovery actions to heal an application byfail-ing over to alternate services. Refresh attacks the problemwitha combination of code reuse and automation. In particular, itreuses an application container’s ability to shutdown an applica-tion’s components, reconfigure the components (i.e. create thenewly desired service composition), and launch the applicationin the new state (i.e. transition the application into the new ser-vice composition state). By reusing existing mechanisms thatare well-tested and trusted by developers, the need to writecus-tom recovery actions is eliminated.

Moreover, since rebooting in the same application configura-tion with the same service composition is unlikely to fix a failedreference to a service, Refresh automatically derives a newandvalid application configuration and service composition. Thisautomated approach to deriving a new service composition froman application’s feature model allows microrebooting to beap-plied to service composition healing. Normally, with a man-ual recovery action implementation process, developers woulddeduce the correct states to transition the application into andimplement the transition logic. Refresh’s automated derivationprocess eliminates the need for developers to: (1) determinewhere to transition to, (2) decide how to accomplish the transi-tion, and (3) implement the transition.

Container Rebooting-based Healing Reduces Potential Un-intended Side-effects: A key benefit of using the container’sbuilt in component management mechanisms for state transi-tions is that they are guaranteed to bring the non-persistent ap-plication state to the desired correct state. This guarantee helpsto resolve the problems outlined in Section 3.3 of dealing withthe potential of unintended side-effects from recovery actions.

With Refresh, the application container shuts down compo-nents, which releases resources and resets in-memory state, andthen re-launches the application with a clean memory state.With recovery actions, there is the potential that one or moreof the affects on the application will have unforeseen con-sequences to the non-persistent in-memory application state.These unforseen side-effects are not possible with a containerrebooting approach that resets non-persistent state.

A container rebooting approach does not eliminate the pos-sibility that persistent application state, such as database rows,will not be placed into an inconsistent state. The approach does,however, have a number of properties that make this scenariofar less likely than a recovery action approach. First, all compo-nents typicallymustimplement lifecycle methods that are calledby the container to manage the component. If a component doesnot properly handle persistent state on shutdown, it is a flawinthe implementation of the component that could emerge–evenif the application never uses healing mechanisms.

Second, most enterprise applications maintain the consis-tency of persistent application state through transactions. More-over, most enterprise applications use container-managedper-sistence APIs, such as JTA. Even the Non-JTA examples pro-

7

Page 8: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

vided for the Pet Store still use an alternate container-managedpersistence API that works across only a single datasource.When the container is used as the healing transition mechanism,any transactions that are in process will be properly rolledbackor committed by the container during the healing of the appli-cation’s service composition.

5 Applying Refresh to the Java Pet Store

To compare the development effort of including recovery ac-tions into the Pet Store, we implemented the following threeversions of the Spring Pet Store with self-healing service com-positions.

• The first implementation was produced using a purelymanual approach that used Spring’s proxying and aspectinfrastructure to implement the monitoring of the DAOsand SpringHotSwappableTargetSourcesto swap remoteservices on-the-fly.

• The second implementation was produced assuming anMDE tool was provided that could model the error statesand recovery actions for the Pet Store and generate the re-quired monitoring code and recovery path logic but not theimplementations of the recovery actions. We refer to thisMDE approach as theMDE error state/recovery pathap-proach.

• The third implementation was produced using Refresh,which captures the rules for configuring the applicationand its service composition in feature models and uses mi-crorebooting to eliminate the need to implement recoveryactions.

The self-healing for all three implementations was built aroundthe ability to swap failed DAOs with remote services and toswap from failed remote services to other remote services.The modifications for the three implementations are availablefrom [21].

Manual implementation. The top table in Figure 11 showsthe results of the initial implementation efforts. The manual ap-proach required implementing two key classes aServiceSwap-per capable of (1) looking up the Spring HotSwappableTarget-Source for a DAO, (2) connecting to a Hessian, Burlap, SOAP,or RMI remote service, and (3) swapping in the new service forthe failed component/service. As shown in Figure 11, the classrequired 77 lines of code. The second class implemented was aSpringMethodInterceptorthat was used to monitor each invo-cation on a DAO or remote service for Exceptions and call theappropriate ServiceSwapper when an Exception occurred. Thisclass required 20 lines of code. Finally, the components wereincluded in the Pet Store by adding them to the XML configu-ration files for the Pet Store, which required adding 96 linesofXML code.

MDE Error State / Recovery Path Implementation. Theanalysis for the MDE error state/recovery path approach wasbased on a generic model of the minimum effort that would berequired for any MDE adaptation modeling tool and frameworkthat did not provide Spring-specific recovery action implemen-tations. The models were built using State Charts, since it isarguably the most widely used and mature state modeling lan-guage. State Charts also have a number of powerful concepts,such as parallel states, which reduce the total modeling com-plexity.

For the MDE implementation effort analysis, we measuredonly the lines of code required to implement the ServiceSwap-per and to integrate the needed ServiceSwappers into the con-figuration files of the Pet Store. We assumed that all of the logicfor choosing the correct ServiceSwapper to execute, the imple-mentation of the MethodInterceptor, and all configuration coderequired to integrate the method interceptors and their depen-dent proxies into the configuration file would be generated bythe tool. Our experiments thus only measured the cost of mod-eling error states and recovery actions and implementing them.

The MDE error state/recovery action approach used the StateCharts presented in Section 3.1. The full State Chart healingspecification requires 111 states and 102 transitions betweenstates. As seen in Figure 11, the MDE approach still requires77 lines of code to implement the ServiceSwapper recovery ac-tion but eliminates the 31 lines of code needed to implement therecovery path execution logic and the 20 lines of code requiredfor the monitoring implementation.

Refresh implementation. Finally, we implemented the swap-ping capabilities in the Pet Store using Refresh. Refresh’suse ofFeature models required a total of 33 model elements (features)and 29 connections versus the MDE approach’s 111 model ele-ments (states) and 102 connections (transitions). Refreshalsorequired 16 lines of code to specify the Esper queries overthe event stream of the Pet Store to map queries to the fail-ure of one of the Pet Store features. Refresh’s use of the con-tainer’s built-in shutdown/configuration/launch mechanisms forhealing, eliminated the need to implement the code for the Ser-viceSwapper.

Refresh automatically generates the required monitoringcode for the Pet Store (this was assumed for the other MDEapproach as well). Refresh did require 23 more lines of code tobe modified in the configuration file of the Pet Store versus theother MDE approach. These extra lines of configuration codeare a result of adding the Refresh annotations dictating howtodynamically modify the application’s configuration based on afeature selection. Overall, the Refresh approach required55%less implementation effort than the other MDE approach and60% less modeling effort.

Refresh performance. We used Apache JMeter to simulatethe concurrent access of 40 different customers to the Pet Storeand the time required to complete 4,000 orders. We simulatedthe failure of different DAOs to force Refresh to heal the PetStore by swapping remote services for the failed DAOs. Afterthe DAOs were swapped to remote services, we iteratively shut-

8

Page 9: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

Figure 11: Comparing Implementation Effort for the HealingPet Store

down the services used by the Pet Store to force the failover toalternate remote services.

Over the tests, Refresh averaged 151ms from the time an ex-ception indicating a failure was observed until the Pet Storewas reconfigured and rebooted with a new service composition.These times were obtained by running the Pet Store on a 2.0ghzIntel Core DUO on Windows XP with 2 gigabytes of RAM. Theaverage time required by the constraint solver to derive a newfeature selection was 12ms. These times indicate that Refreshcan provide high-performance application healing while reduc-ing modeling and implementation effort.

6 Related Work

Microrebooting [9] is a technique used to restart only the com-ponent, or collection of components in which the failure oc-curred. Refresh uses microrebooting to eliminate the need tomodel and implement recovery actions, as described in Sec-tion 4. The problem with applying microrebooting alone to ser-vice composition healing is that remote services usually can-not be rebooted and thus failures will persist across reboots.Refresh, however, dynamically derives a new service compo-sition and application configuration before rebooting thatelim-inates the reference to the failed service. Reconfigurationof theservice composition allows Refresh to eliminate references tofailed services and prevent an error from persisting acrossre-boots.

Lapouchnian et al. [18] propose using goal modeling tohelp develop autonomic applications. Moreover, Lapouchnian’stechnique uses feature models to help understand the variabil-ity in system objectives. Lapouchnian’s techniques are focusedon developing a design for an autonomic system and also relyon Statecharts. Refresh, in contrast, does not require a specificapplication design–only that the application have different po-tential services or components that it can be composed of. Fur-thermore, as Section 3, Lapouchnian’s use of Statecharts addsa substantial development burden. Refresh does not use errorstate/recovery action based modeling and implementation andthus avoids this development burden.

Crawford and Dan developed a framework, known aseModel [11], to assist in monitoring and adapting a systembased on its environment. One of their primary design goalswas ease of use for model providers and model users interact-ing with the framework. This framework requires a model, in

the form of an XML file, to specify the states to be identified andthe actions to be taken in such situations. The model provider isthus required to identify all potentials states of the system andprovide a specific set of actions to take for each state. Section 3showed the problems associated with specifying error states andrecovery actions. Unlike eModel, Refresh does not require ex-plicit specification of recovery actions and avoids these difficul-ties.

There are a large number of other healing or adaptation ap-proaches [8, 16, 12, 7, 3, 18, 13] that rely on identifying errorstates and then planning and executing some number of recov-ery actions. As shown in Section 3, modeling and implementingrecovery actions is complex and costly. Moreover, as the em-pirical results from Section 5 showed, by eliminating the needto model and implement recovery actions, Refresh produced a55% reduction in implementation effort and a 60% reduction inmodeling effort compared to techniques that require error stateand recovery action modeling.

7 Concluding Remarks

Numerous MDE approaches for building self-healing servicecompositions [8, 16, 12, 7, 3, 18, 13] rely on developers model-ing each potential error state and the recovery paths from eachstate. Regardless of the technique used, developers are alwaysresponsible for implementing the complex application-specificrecovery actions. Moreover, since these approaches use recov-ery actions to transition an application between two arbitrarystates, recovery actions can have unintended side-effectson theapplication, such as producing deadlock or data corruption, thatare hard to identify and avoid.

This paper describes how our Refresh technique uses fea-ture modeling to capture the rules for deriving a correct ser-vice composition state. Our experience using Refresh showedthat leveraging feature models to automatically derive newser-vice compositions when a dependent service fails eliminates thecomplexity of needing to model each individual error state andrecovery action. Moreover, by using microrebooting to transi-tion the application from its failed service composition tothenew service composition, we found that developers need notimplement complex recovery actions. Finally, through resultsobtained from applying Refresh to case studies, we observedthat eliminating the modeling and implementation of recovery

9

Page 10: Creating Self-healing Service Compositions with Feature Models …schmidt/PDF/cservices.pdf · 2007-11-22 · Feature models describe an application in terms of points of variability

actions greatly reduced the cost of creating self-healing servicecompositions.

Refresh is available in open-source form as part of theGEMSModel Intelligenceproject atwww.eclipse.org/gmt/gems.

REFERENCES

[1] Esper faq,http://esper.codehaus.org/tutorials/faq_esper/faq.html#performance.

[2] Event stream intelligence with esper and nesper.http://esper.codehaus.org.

[3] F. Barbier. MDE-based Design and Implementation ofAutonomic Software Components.Cognitive Informatics, 2006.ICCI 2006. 5th IEEE International Conference on, 1, 2006.

[4] H. Barki, S. Rivard, and J. Talbot. Toward an assessment ofsoftware development risk.Journal of Management InformationSystems, 10(2):203–225, 1993.

[5] D. Benavides, S. Segura, P. Trinidad, and A. Ruiz-Cortés. UsingJava CSP solvers in the automated analyses of feature models.Post-Proceedings of The Summer School on Generative andTransformational Techniques in Software Engineering (GTTSE).

[6] D. Benavides, P. Trinidad, and A. Ruiz-Cortes. AutomatedReasoning on Feature Models.17th Conference on AdvancedInformation Systems Engineering (CAiSEŠ05, Proceedings),LNCS, 3520:491–503, 2005.

[7] V. Bhat, M. Parashar, H. Liu, M. Khandekar, N. Kandasamy,andS. Abdelwahed. Enabling Self-Managing Applications usingModel-based Online Control Strategies.Proceedings of the 3rdIEEE International Conference on Autonomic Computing,Dublin, Ireland, June 2006.

[8] R. Calinescu. Model-Driven Autonomic Architecture.Proceedings of the 4th IEEE International Conference onAutonomic Computing, Jacksonville, Florida, USA, June, 2007.

[9] G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox.Microreboot-a technique for cheap recovery.Proceedings of the6th Symposium on Operating Systems Design andImplementation, pages 31–44, 2004.

[10] J. Cohen.Constraint Logic Programming Languages,volume 33. ACM Press, New York, NY, USA, 1990.

[11] C. Crawford and A. Dan. eModel: addressing the need for aflexible modeling framework in autonomic computing.Modeling, Analysis and Simulation of Computer andTelecommunications Systems, 2002. MASCOTS 2002.Proceedings. 10th IEEE International Symposium on, pages203–208, 2002.

[12] Denaro, Giovanni and Pezze, Mauro and Tosi, Davide.Designing Self-Adaptive Service-Oriented Applications.2007.

[13] X. Elkorobarrutia, A. Izagirre, and G. Sagardui. A Self-HealingMechanism for State Machine Based Components.Proceedingsof the 1st International Conference on Ubiquitous Computing:Applications, Technology and Social Issues, Alcalá de Henares,Madrid, Spain, June, 2006.

[14] D. Harel et al. Statecharts: A visual formalism for complexsystems.Science of Computer Programming, 8(3):231–274,1987.

[15] R. Johnson and J. Hoeller.Expert one-on-one J2EEdevelopment without EJB. Wrox, 2004.

[16] K. Joshi, W. Sanders, M. Hiltunen, and R. Schlichting.Automatic Model-Driven Recovery in Distributed Systems.Atthe 24th IEEE Symposium on Reliable Distributed Systems(SRDS’05), pages 25–38, 2005.

[17] K. Kang, S. Kim, J. Lee, K. Kim, E. Shin, and M. Huh. FORM:A feature-; oriented reuse method with domain-; specificreference architectures.Annals of Software Engineering,5:143–168, 1998.

[18] A. Lapouchnian, S. Liaskos, J. Mylopoulos, and Y. Yu. TowardsRequirements-driven Autonomic Systems Design.Proceedingsof the 2005 workshop on Design and evolution of autonomicapplication software, pages 1–7, 2005.

[19] D. Luckham.The Power of Events: An Introduction to ComplexEvent Processing in Distributed Enterprise Systems.Addison-Wesley Longman Publishing Co., Inc. Boston, MA,USA, 2001.

[20] S. Microsystems. Java Pet Store Sample Application.

[21] J. White. Healing pet store case study implementation.http://www.dre.vanderbilt.edu/ jules/petstore-casestudy-code.zip,2007.

[22] J. White, K. Czarnecki, D. C. Schmidt, G. Lenz, C. Wienands,E. Wuchner, and L. Fiege. Automated Model-basedConfiguration of Enterprise Java Applications. InEDOC 2007,October 2007.

[23] J. White, A. Nechypurenko, E. Wuchner, and D. C. Schmidt.Optimizing and Automating Product-Line Variant SelectionforMobile Devices. In11th International Software Product LineConference, September 2007.

10