Shadow/Puppet Synthesis: A Stepwise Method for the · PDF fileShadow/Puppet Synthesis: A Stepwise Method for the Design of Self ... the proposed method by synthesizing several new

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for moreinformation.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TPDS.2016.2536023, IEEE Transactions on Parallel and Distributed Systems

1

Shadow/Puppet Synthesis: A StepwiseMethod for the Design of Self-Stabilization

Alex Klinkhamer and Ali Ebnenasir

Abstract—This paper presents a novel two-step method for automated design of self-stabilization. The first step enables thespecification of legitimate states and an intuitive (but imprecise) specification of the desired functional behaviors in the set oflegitimate states (hence the term “shadow”). After creating the shadow specifications, we systematically introduce the mainvariables and the topology of the desired self-stabilizing system. Subsequently, we devise a parallel and complete backtrackingsearch towards finding a self-stabilizing solution that implements a precise version of the shadow behaviors, and guaranteesrecovery to legitimate states from any state. To the best of our knowledge, the shadow/puppet synthesis is the first sound andcomplete method that exploits parallelism and randomization along with the expansion of the state space towards generatingself-stabilizing systems that cannot be synthesized with existing methods. We have validated the proposed method by creatingboth a sequential and a parallel implementation in the context of a software tool, called Protocon. Moreover, we have usedProtocon to automatically design three new self-stabilizing protocols that we conjecture to require the minimal number of statesper process to achieve stabilization (when processes are deterministic): 2-state maximal matching on bidirectional rings, 5-statetoken passing on unidirectional rings, and 3-state token passing on bidirectional chains.

Index Terms—Self-Stabilization, Distributed computing, Program synthesis

F

1 INTRODUCTIONSelf-stabilization is an important property of to-day’s distributed systems as it ensures convergencein the presence of transient faults (e.g., loss of co-ordination and bad initialization). That is, from anystate/configuration, a Self-Stabilizing (SS) system re-covers to a set of legitimate states (a.k.a. invariant)in a finite number of steps. Moreover, starting fromits invariant, the executions of an SS system remainin the invariant; i.e., closure. Design and verifica-tion of convergence are difficult tasks [1], [2], [3] inpart due to the requirements of (i) recovery fromany state; (ii) recovery under distribution constraints,where processes can read only the state of theirneighboring processes (a.k.a. their locality), and (iii)the non-interference of convergence with closure. Oneapproach to facilitate the development of SS systemsis to separate the concerns of closure and convergencefor the designer; i.e., separate functional concernsfrom self-stabilization. Moreover, automating the de-sign steps could potentially decrease developmentcosts. However, there are two important impedimentsbefore such a two-step design method. First, for someprotocols (e.g., token passing systems like Dijkstra’s 4-state chain [1] and Gouda’s 3-bit ring [4]) specifyingthe functional behaviors (in the absence of faults)amounts to specifying their self-stabilizing version!

• The authors are with the Department of Computer Science, MichiganTechnological University, Houghton, MI 49931, U.S.A.E-mail: {apklinkh,aebnenas}@mtu.edu

This work was sponsored by the NSF grant CCF-1116546.Superior, a high performance computing cluster at Michigan Technolog-ical University, was used in obtaining the experimental results presentedin this paper.

That is, the actions enabled in the invariant (a.k.a.closure actions) also provide convergence from ille-gitimate states. Second, existing automated methodsare either incomplete [5], [6], [7], [8] or complete [9],[10] but require exact specification of legitimate statesusing formal logics (e.g., temporal logic), which aredifficult to use by mainstream engineers. This paperpresents a novel method where we address thesechallenges.

Most existing methods for the design of self-stabilization are either manual [1], [11], [12], [13], [14],[15], [2] or rely on heuristics [5], [6], [7], [8] thatmay fail to generate a solution for some systems.For example, Awerbuch et al. [11] present a methodbased on distributed snapshot and reset for locallycorrectable systems; systems in which the correctionof the locality of each process results in global recov-ery to invariant. Gouda and Multari [12] divide thestate space into a set of supersets of the legitimatestates, called convergence stairs, where for each stairclosure and convergence to a lower-level stair areguaranteed. Stomp [13] provides a method based onranking functions for design and verification of self-stabilization. Gouda [2] presents a theory for designand composition of self-stabilizing systems. Demirbasand Arora [16] present a method for stabilization-preserving refinement of abstract designs using wrap-pers at the specification and implementation levels.Nesterenkol and Tixeuil [17] define the notion of idealstabilization where all states are considered legitimate,and show how ideal stabilization can facilitate thedesign and composition of some stabilizing systems.Methods for algorithmic design of convergence [5],[6], [7], [8] are mainly based on sound heuristics that



2

search through the state space of a non-stabilizing sys-tem in order to synthesize recovery actions while en-suring non-interference with closure. However, theseheuristics may fail to find a solution while thereexists one; i.e., they are sound but incomplete. Theapproach in [9], [10] is sound and complete, but itemploys constraint solvers as a blackbox. By contrast,the shadow/puppet synthesis relies on a backtrackingalgorithm that is customized for the synthesis of self-stabilization and can be micromanaged towards pro-viding better efficiency (evident by our experimentalresults). Moreover, to increase the likelihood of find-ing a solution, we exploit parallelism and randomiza-tion in our backtracking algorithm. Bounded synthesis[18] allows for a systematic state space expansion, butit does not consider all states as potential initial states.

This paper proposes a two-step method for auto-mated design of self-stabilization where we enabledesigners to deal with the concerns of closure andconvergence separately, and provide a method forstate space expansion. The underlying philosophybehind our work is that the constraints that the syn-thesis algorithm must follow (e.g., read restrictions due todistribution) should not be imposed on designers. Specifi-cally, the proposed approach includes two layers (seeFigure 1): (1) intuitive specification of an invariant andfunctional behaviors in the absence of faults (i.e., clo-sure behaviors/actions), called the shadow specifica-tion, and (2) automated synthesis of a self-stabilizingsystem that implements a precise characterization ofthe shadow behavior in the invariant and enablesconvergence to the specified invariant in terms ofsome superposed variables, called the puppet system.This shadow/puppet synthesis is inspired by the waya puppeteer wishes to tell a story via shadows ofpuppets on a screen. A puppet itself can be moreintricate than the shadow it casts. To make our desiredshadow (functional behavior), we therefore rely on aclever puppeteer (the computer) to construct a puppet(synthesized protocol).

Fig. 1: The proposed shadow/puppet synthesis method.

The algorithmic design of the puppet actions isbased on a backtracking search. The backtracking

search is conducted in a parallel fashion amongst afixed number of threads that simultaneously searchfor an SS solution in the space of all acceptable actions.When a thread finds a combination of design choices(i.e., actions) that would result in the failure of thesearch (a.k.a. a conflict), it shares this information withother threads, thereby improving resource utilization.Contributions. The contributions of this work aremulti-fold. First, we devise a two-step design methodthat separates the concerns of closure and conver-gence for the designer, and enables designers to in-tuitively specify functional behaviors and systemati-cally include computational redundancy. Second, wepropose a parallel and complete backtracking searchthat finds an SS solution if one exists. If a solu-tion does not exist in the current state space ofthe program, then designers can include additionalpuppet variables or alternatively increase the domainsize of the existing puppet variables and rerun thebacktracking search. Third, we present three differentimplementations of the proposed method as a soft-ware toolset, called Protocon (http://asd.cs.mtu.edu/projects/protocon/), where we provide a sequentialimplementation and two parallel implementations;one multi-threaded and the other an MPI-based im-plementation. We also demonstrate the power ofthe proposed method by synthesizing several newnetwork protocols that all existing heuristics fail tosynthesize. These case studies include 2-state maximalmatching on bidirectional rings, 5-state token passingon unidirectional rings, and 3-state token passing on abidirectional chains. In [19], [20], we perform detailedserial and parallel benchmarks for simpler systemssuch as coloring on Kautz graphs [21] which canrepresent a P2P network topology, the 3-bit token ringof Gouda and Haddix [4], ring orientation, and leaderelection on a ring.Organization. Section 2 provides a motivating ex-ample for shadow/puppet synthesis. Section 3 in-troduces the basic concepts of protocols, transientfaults, closure and convergence. Section 4 formallystates the problem of designing self-stabilization. Sec-tion 5 presents the backtracking algorithm. Section 6provides an overview of our case studies. Section 7discusses related work. Finally, Section 8 makes con-cluding remarks and presents future/ongoing work.

2 MOTIVATING EXAMPLE

In order to illustrate the proposed approach, wediscuss the design of the well-known token passingprotocol in a unidirectional ring (proposed by Dijkstra[1]) using shadow/puppet synthesis. Token passing ina unidirectional ring includes the following require-ments: (1) in the absence of faults, there is exactly onetoken in the ring (i.e., legitimate states or invariant)and the token is circulated along the ring, and (2)if there are multiple tokens in the ring (due to the

http://asd.cs.mtu.edu/projects/protocon/

http://asd.cs.mtu.edu/projects/protocon/



3

occurrence of faults), then the protocol will eventuallyreach a configuration where exactly one token existsthereafter. Dijkstra [1] formulated this protocol asfollows: each process has a variable x with a finitedomain of at least N values, where N is the numberof processes in the ring. Each process Pi (0 ≤ i < N )can read and write its own variable xi and can alsoread xi−1, where addition and subtraction are moduloN . The following actions guarantee both closure andconvergence: (xN−1 = x0 −→ x0 := x0 + 1) of P0

and (xi−1 6= xi −→ xi := xi−1) of each other Pi

(1 ≤ i < N ). Each action has a guard condition (placedbefore −→) and a statement (placed after −→) thatis executed atomically if the guard evaluates to true.Process P0 has the token iff (if and only if) xN−1 = x0,and any other Pi (1 ≤ i < N ) has the token iff xi−1 6= xi.Research Problem. Now, imagine that we did nothave Dijkstra’s solution and we would like to designa self-stabilizing protocol that meets the aforemen-tioned requirements. We are faced with two options:(1) do what Dijkstra did; that is, in a single step designa protocol that circulates the unique token in legit-imate states and ensures convergence to legitimatestates if there are multiple tokens, or (2) assume nofaults occur and design a system that ensures the ex-istence of a unique token in the absence of faults, andthen devise actions that ensure convergence withoutinterfering with closure actions. It is not hard to seethat the first option seems more difficult since we haveto simultaneously tackle two problems. However, thechallenge of the second option is that specifying thebehaviors in the absence of faults in terms of protocolvariables could be a daunting task for some protocols.For example, these actions capture the closure actionsof Dijkstra’s token passing protocol: (xN−1 = x0 −→x0 := x0 + 1) of P0 and (xi−1 = xi + 1 −→ xi := xi−1)of each other Pi (1 ≤ i < N ). Observe that thedesign of these actions is not straightforward. Forsome protocols (e.g., Gouda’s constant space 3-bittoken passing [4]), it is even impossible to separatethe closure actions from convergence actions (sinceclosure actions provide convergence). Next, we illus-trate the proposed approach (Figure 1) in the contextof Dijkstra’s token passing protocol.Layer 1 - Step 1: Specify legitimate states. Basedon the first layer of the proposed approach in Figure1, designers should first specify functional behaviors.This layer includes two steps: the specification of(i) legitimate states, and (ii) the actions that maybe executed in legitimate states. First, we considera shadow Boolean variable toki for each process Pi

indicating whether Pi has the token. Thus, the set oflegitimate states is simply specified as: I = (∃!i : i ∈ZN : (toki = 1)) capturing the configurations wherethere is a unique token in the ring. The quantifier ∃!means there exists a unique value of i.Layer 1 - Step 2: Specify shadow actions. To expressthat each Pi should pass the token to Pi+1, designers

simply write the intuitive action (toki = 1 −→ toki :=0; toki+1 := 1; ). Notice that Pi explicitly changesthe state of Pi+1, thereby violating the read/writerestrictions of the locality of Pi specified on x variablesin Dijkstra’s protocol. The fundamental idea behindshadow/puppet synthesis is that such restrictions mustnot tie the hands of designers in expressing what they want.Layer 2 - Step 1: Superpose puppet variables. Afterspecifying the shadow variables and actions, design-ers should augment the shadow specification withthe puppet variables (which in the case of token ringincludes the x variables) along with the correspondingread/write restrictions for processes. The domain ofpuppet variables and their read/write restrictions canrespectively be obtained from the system requirementand topology.Layer 2 - Step 2: Automated design using a parallelbacktracking search algorithm. This step is fullyautomated where designers benefit from the proposedparallel backtracking algorithm (called the synthesisalgorithm) to find a solution that is self-stabilizing tothe specified legitimate states. We have developedthree implementations of this algorithm, namely acentralized version, a multi-threaded version and anMPI-based distributed version. The synthesis algo-rithm is not allowed to read shadow variables sincethe shadow variables/actions provide only an outlineof what is required to guide the synthesis algorithm.As such, the guards of the actions of the synthesizedprotocol will only be specified in terms of puppetvariables.Layer 2 - Step 3: Existence of a solution. If thebacktracking algorithm fails to find a solution, thendesigners should expand the state space by either in-creasing the domain of puppet variables or introduc-ing additional puppet variables. This can be achievedby jumping to Step 1 of Layer 2 and repeating thesteps of Layer 2.

3 PRELIMINARIES

In this section, we present the formal definitions ofprotocols and self-stabilization. Protocols are definedin terms of their set of variables, their actions and theirprocesses. The definitions in this section are adaptedfrom [1], [14], [22], [2]. For ease of presentation, weuse a simplified version of Dijkstra’s token ring pro-tocol [1] as a running example.Protocols. A protocol p comprises N processes{P0, · · · , PN−1} that communicate in a shared mem-ory model under the constraints of an underlyingnetwork topology Tp. Each process Pi, where i ∈ ZN

and ZN denotes values modulo N , has a set of localvariables Vi that it can read and write, and a set ofactions (a.k.a. guarded commands [23]) as defined inSection 2. Thus, we have Vp = ∪N−1i=0 Vi. The domainof variables in Vi is non-empty and finite. Tp specifieswhat Pi’s neighboring processes are and which oneof their variables Pi can read; i.e., Pi’s locality. A local



4

state of Pi is a unique snapshot of its locality and aglobal state of the protocol p is a unique valuation ofvariables in Vp. The state space of p, denoted Sp, is theset of all global states of p, and |Sp| denotes the size ofSp. A state predicate is any subset of Sp specified as aBoolean expression over Vp. We say a state predicateX holds in a state s (respectively, s ∈ X) iff X evaluatesto true at s. A transition t is an ordered pair of globalstates, denoted (s0, s1), where s0 is the source and s1 isthe target state of t. A valid transition of p must belongto some action of some process. The set of actions ofPi represent the set of all transitions of Pi, denoted δi.The set of transitions of the protocol p, denoted δp, isthe union of the sets of transitions of its processes. Adeadlock state is a state with no outgoing transitions.An action grd −→ stmt is enabled in a state s iff grdholds at s. A process Pi is enabled in s iff there existsan action of Pi that is enabled at s.Example: Token Ring (TR). For simplicity, consider a3-process (i.e., N = 3) version of Dijkstra’s TokenRing (TR) protocol represented in Section 2 (adaptedfrom [1]). Each process Pi (0 ≤ i ≤ 2) includes aninteger variable xi with a domain {0, 1, 2}. Each Pi canread and write xi and can read xi−1 (where x0−1 = x2when i = 0). The actions of processes are as specifiedin Section 2. The state predicate ITR captures the setof states in which only one token exists, where ITR is:

(x0=x1 ∧ x1=x2) ∨ (x0 6=x1 ∧ x1=x2) ∨ (x0=x1 ∧ x1 6=x2)

Minimal Actions. Notice that the guard of an actionA : grd −→ stmt of a process Pi can be specifiedin terms of a proper subset of Vi. In such cases, theaction A is the union of a set of k > 1 minimalactions grd1 −→ stmt1, · · · , grdk −→ stmtk, wheregrd = (grd1 ∨ · · · ∨ grdk), and each grdj (1 ≤ j ≤ k)is specified in terms of the values of all variablesin Vi (where i ∈ ZN ). More precisely, a minimalaction of a process Pi includes a single valuationof all readable variables for Pi in its guard and asingle valuation of all writable variables for Pi inits assignment statement. For example, consider anaction (x2 = x0 −→ x0 := x0 + 1) in the TR protocol.This action is the union of the minimal actions (x2 =0 ∧ x0 = 0 −→ x0 := 1), (x2 = 1 ∧ x0 = 1 −→ x0 := 2),and (x2 = 2 ∧ x0 = 2 −→ x0 := 0). The proposedbacktracking algorithm in Section 5 explores the spaceof all minimal actions.Computations. Intuitively, a computation of a pro-tocol p is an interleaving of its actions. Formally, acomputation of p is a sequence σ = 〈s0, s1, · · · 〉 of statesthat satisfies the following conditions: (1) for eachtransition (si, si+1) in σ, where i ≥ 0, there exists anaction (grd −→ stmt) in some process such that grdholds at si and the execution of stmt at si yields si+1,and (2) σ is maximal in that either σ is infinite or if itis finite, then σ reaches a state sf where no action isenabled. A computation prefix of a protocol p is a finitesequence σ = 〈s0, s1, · · · , sm〉 of states, where m > 0,

such that each transition (si, si+1) in σ (where i ∈ Zm)belongs to some action grd −→ stmt in some process.The projection of a protocol p on a non-empty statepredicate X , denoted δp|X , consists of transitions ofp that start in X and end in X .Closure and Invariant. A state predicate X is closedin an action grd −→ stmt iff executing stmt from anystate s ∈ (X ∧ grd) results in a state in X . We say astate predicate X is closed in a protocol p iff X is closedin every action of p. In other words, closure [2] requiresthat every computation of p starting in X remains inX . We say a state predicate I is an invariant of p iff Iis closed in p.TR Example. Starting from a state in the predicateITR, the TR protocol generates an infinite sequenceof states, where all reached states belong to ITR. C

Remark. Some researchers [24], [22] have a strongerdefinition for the notion of invariant, where in ad-dition to closure a program must meet its specifi-cations from its invariant. Since in the problem ofsynthesizing self-stabilization (Problem 4.1) we have aconstraint of preserving specifications in invariant, wehave a more relaxed definition of invariant where onlyclosure is required. We also note that a program mayhave multiple invariants, however, ideally we wouldlike to use the weakest possible invariant.Convergence and Self-Stabilization. A protocol pstrongly converges to I iff from any state in Sp, everycomputation of p reaches a state in I . A protocol pweakly converges to I iff from any state in Sp, thereis a computation of p that reaches a state in I . Wesay a protocol p is strongly (respectively, weakly) self-stabilizing to I iff I is closed in p and p is strongly(respectively, weakly) converging to I . For ease ofpresentation, we drop the term “strongly” whereverwe refer to strong stabilization.

4 PROBLEM STATEMENT

In this section, we formally state the problem of creat-ing a self-stabilizing puppet protocol from a specifiedshadow protocol. Let p be a non-stabilizing protocoland I be an invariant of p. We manually expand thestate space of p by including new variables. Suchpuppet variables provide computational redundancy inthe hopes of giving the protocol sufficient informa-tion to detect and correct illegitimate states withoutforming livelocks. Our experience shows that betterperformance can be achieved if variables with smalldomains are included initially. If the synthesis fails,then designers can incrementally increase variabledomains or include additional puppet variables. Thisway designers can manage the growth of the statespace and keep the synthesis time/space costs undercontrol.

Now, let p′ denote the self-stabilizing version ofp that we would like to design and I ′ representan invariant. Sp′ denotes the state space of p′; i.e.,



5

the state space of p expanded by adding puppetvariables. Such an expansion can be reversed by afunction H : Sp′ → Sp that maps every state in Sp′

to a state in Sp by removing the puppet variables.The expansion itself is expressed as a one-to-manymapping E : Sp → Sp′ that maps each state s ∈ Sp toa set of states {s′ | s′ ∈ Sp′ ∧H(s′) = s}. Observe thatH and E can also be applied to transitions of p and p′.That is, the function H maps each transition (s′0, s

′1),

where s′0, s′1 ∈ Sp′ , to a transition (s0, s1), where

s0, s1 ∈ Sp. Moreover, E((s0, s1)) = {(s′0, s′1) | s′0 ∈Sp′ ∧ s′1 ∈ Sp′ ∧ H((s′0, s′1)) = (s0, s1)}. Furthermore,each computation (respectively, computation prefix)of p′ in the new state space Sp′ can be mapped to acomputation (respectively, computation prefix) in theold state space Sp usingH. Our objective is to design aprotocol p′ that is self-stabilizing to I ′ when transientfaults occur. That is, from any state in Sp′ , protocol p′

must converge to I ′. In the absence of faults, p′ mustbehave similar to p. Thus, each computation of p′ thatstarts in I ′ must be mapped to a unique computationof p starting in I . We state the problem as follows1:(The function Pre(δ) takes a set of transitions δ andreturns the source states of δ.)

Problem 4.1: Synthesizing Self-Stabilization.• Input: A protocol p and an invariant I , the func-

tion H and the mapping E capturing the impactof puppet variables.

• Output: A protocol p′ and an invariant I ′ in Sp′ .• Constraints:

1) Preserve the invariant:I = H(I ′)

2) Preserve transitions in the invariant:(δp|I = H(δp′ |I ′) \ {(s, s) | s ∈ I}

)∧ ∀(s′0, s′1) ∈ δp′ : (s′0 ∈ I ′ =⇒ s′1 ∈ I ′)

3) Preserve deadlock freedom in the invariant:∀s ∈ Pre(δp|I) : (E(s) ∩ I ′) ⊆ Pre(δp′ |I ′)

4) Preserve progress in the invariant:∀s ∈ Pre(δp|I) : (δp′ |I ′)|E(s) is cycle-free

5) p′ strongly converges to I ′

The first constraint requires that no states areadded/removed to/from I ; i.e., I = H(I ′). The secondconstraint requires that each transition of the shadowprotocol δp|I is represented by some transitions inδp′ |I ′, and all other transitions in δp′ originating fromI ′ remain in I ′ and do not change shadow values. Thethird and fourth constraints require that non-silentstates (i.e., where a process is enabled) of p in I shouldremain deadlock-free and livelock-free in p′ in orderto ensure progress to a state with different shadowvalues. Finally, p′ must be livelock-free and deadlock-free in ¬I ′.

Example 4.2: 4-State Token RingKeeping with the idea of superposition from [20]for the sake of example, we can specify a token

1. This problem statement is an adaptation of the problem ofadding fault tolerance in [22].

ring using a non-stabilizing 2-state token ring. Eachprocess Pi owns a binary variable ti and can readti−1. The first process P0 is distinguished as Bot, inthat it acts differently from the others. Bot has atoken when tN−1 = t0 and each other process Pi>0

is said to have a token when ti−1 6= ti. Bot has action(tN−1 = t0 −→ t0 := 1 − t0; ), and each other processPi>0 has action (ti−1 6= ti −→ ti := ti−1; ). Let I denotethe legitimate states where exactly one process has atoken:

I = ∃! i ∈ ZN : ((i=0 ∧ ti−1=ti) ∨ (i6=0 ∧ ti−1 6=ti))

To transform this protocol to a self-stabilizing ver-sion thereof, we add a binary puppet variable xi toeach process Pi. Each process Pi can also read itspredecessor’s variable xi−1. Let I ′ = E(I) be theinvariant of this transformed protocol. Let the newprotocol p′ have the following actions for Bot and theother processes Pi>0:

Bot : (xN−1 = x0) ∧ (tN−1 6= t0) −→ x0:=1− x0;

Bot : (xN−1 = x0) ∧ (tN−1 = t0) −→ x0:=1− x0; t0:=xN−1;

Pi : (xi−1 6= xi) ∧ (ti−1 = ti) −→ xi:=1− xi;

Pi : (xi−1 6= xi) ∧ (ti−1 6= ti) −→ xi:=1− xi; ti:=xi−1;

This protocol is stabilizing for all rings of sizeN ∈ {2, . . . , 7} but contains a livelock when N =8. In [20], we found that given this topology andshadow protocol, no self-stabilizing protocol exists forall N ∈ {2, . . . , 8}. Gouda and Haddix [4] give asimilar token ring protocol that stabilizes for all ringsizes. They introduce another binary variable readyi toeach process.

Let us check that the superposition preserves the 2-state token ring protocol for a ring of size N = 3.Figure 2 shows the transition systems of the non-stabilizing 2-state protocol p within I and stabilizing4-state protocol p′, where each state is a node andeach arc is a transition. Legitimate states are boxedand reoccurring transitions within these states areblack. Recovery transitions are drawn with dashedgray lines. Solid gray lines denote transitions withinour maximal choice of I ′ but would serve as recoverytransitions for smaller choices of I ′. Verifying theconditions from Problem 4.1, we find: (1) I = H(I ′)is true since I ′ = E(I), (2) the shadow protocol ispreserved since there exists a transition of p′ thatchanges rows iff a similar transition exists in the 2-state protocol p, (3) no deadlocks are introduced inthe invariant since all states in the boxed rows haveoutgoing transitions, (4) progress is preserved in theinvariant since since there are no cycles in any of theboxed rows of p′, and (5) convergence from ¬I ′ to I ′

holds since there are no livelocks or deadlocks within¬I ′.Shadow/Puppet vs Superposition. Compare theshadow/puppet method (Section 2) with the super-position method (Example 4.2) when used to specifya token ring’s invariant and behavior. In the shadow



6

Non-Stabilizing p

t0t1t2

000

001

010

011

100

101

110

111

Stabilizing p′

x0

x1

x2

∣∣∣∣∣∣000

001

010

011

100

101

110

111

Legend

legitimatestate

illegitimatestate

closuretransition

convergencetransition

eithertransition

Fig. 2: Transition systems of the non-stabilizing 2-stateand stabilizing 4-state token rings of size N = 3.

specification of Section 2, each toki variable denoteswhether a process has the token, and a token is passedby simply moving a 1 value forward to toki+1. In thenon-stabilizing protocol of Example 4.2, each ti vari-able holds token information in a clever way resultingfrom a comparison with ti−1, and a token is passed byflipping the ti bit. We see that in the case of superposi-tion, a designer must have some foresight about howthe ti variables can be manipulated in a stabilizingprotocol. By contrast, the shadow/puppet methodgives a designer freedom to define token passingwith the toki shadow variables without constrainingthe specific actions of the synthesized protocol p′.It is therefore not surprising that shadow/puppetsynthesis can find protocols that cannot be reasonablyexpressed with superposition (Section 6).

In Protocon, the two methods differ only bywhether the variables used to specify the invariant aremarked with the shadow keyword or not. In the casethat shadow variables exist, our synthesis methodtreats them as write-only when constructing p′. Whenthe shadow protocol is silent (i.e., has no actionsin its invariant), processes should know the finalvalues of their writable shadow variables. Self-loopsare a trivial consequence of this and must be ignoredfor actions that only assign shadow variables (e.g.,Section 6.1). When the shadow protocol is non-silent,processes should know when they are performingshadow actions, but they should also be able to actwithout affecting the shadow state (i.e. shadow self-

loops). Thus, the minimal actions we use must allowshadow variables to not be assigned (e.g., Section 6.3).Deterministic, Self-Disabling Processes. Theorems4.3 and 4.4 below show that the assumptions ofdeterministic and self-disabling processes do not im-pact the completeness of any algorithm that solvesProblem 4.1. In general, convergence is achieved bycollaborative actions of all processes. That is, eachprocess partially contributes to the correction of theglobal state of a protocol. As such, starting at a states0 ∈ ¬I , a single process may not be able to recoverthe entire system single-handedly. Thus, even if aprocess executes consecutive actions starting at s0, itwill reach a local deadlock from where other processescan continue their execution towards converging toI . The execution of consecutive actions of a processcan be replaced by a single write action of the sameprocess. As such, we assume that once a processexecutes an action it will be disabled until the actionsof other processes enable it again. That is, processesare self-disabling.

Theorem 4.3: Let p be a non-stabilizing protocolwith invariant I . There is an SS version of p to I iffthere is an SS version of p to I with self-disablingprocesses.

Proof: The proof of right to left is straightforward,hence omitted. The proof of left to right is as follows.Let pss be an SS version of p to I , and Pj be aprocess of pss. Consider a computation prefix σ =〈s0, s1, · · · , sm〉, where m > 0, ∀i : 0 ≤ i ≤ m : si /∈ I ,and all transitions in σ belong to Pj . Moreover, weassume that Pj becomes disabled at sm. Now, wereplace each transition (si, si+1) ∈ σ (0 ≤ i < m) bya transition (si, sm). Such a revision will not generateany deadlock states in ¬I . Moreover, if the inclusion ofa transition (si, sm), where 0 ≤ i < m−1, forms a non-progress cycle, then this cycle must have been alreadythere in the protocol because a path from si to smalready existed. Thus, this change does not introducenew livelocks in δp|¬I .

Theorem 4.4: Let p be a non-stabilizing protocolwith invariant I . There is an SS version of p to Iiff there is an SS version of p to I with deterministicprocesses.

Proof: Any SS protocol with deterministic pro-cesses is an acceptable solution to Problem 4.1; hencethe proof of right to left. Let pss be an SS versionof p to I with nondeterministic but self-disablingprocesses. Moreover, let Pj be a self-disabling processof pss with two nondeterministic actions A and Boriginated at a global state s0 /∈ I , where A takes thestate of pss to s1 and B puts pss in a different state s2.The following argument does not depend on s1 ands2 because by the self-disablement assumption, if s1and s2 are in ¬I , then there must be a different processother than Pj that executes from there. Otherwise, thetransitions (s0, s1) and (s0, s2) recover pss to I .

The state s0 identifies an equivalence class of global



7

states. Let s′0 be a state in the equivalence class. Thelocal state of Pj is identical in s0 and s′0, and the partof s0 (and s′0) that is unreadable to Pj could varyamong all possible valuations. As such, correspondingto (s0, s1) and (s0, s2), we have transitions (s′0, s

′1) and

(s′0, s′2). To enforce determinism, we remove the action

B from Pj . Such removal of B will not make s0 and s′0deadlocked. Since s′0 is an arbitrary state in the equiv-alence class, it follows that no deadlocks are createdin ¬I . Moreover, removal of transitions cannot createlivelocks. Therefore, the self-stabilization property ofpss is preserved in the resulting deterministic protocolafter the removal of B.

5 SYNTHESIS USING BACKTRACKING

In [3], we have shown that Problem 4.1 is an NP-complete problem in the size of the expanded statespace. In this section, we present an efficient andcomplete backtracking search algorithm to solve Prob-lem 4.1. Backtracking search is a well-studied tech-nique [25] that is easy to implement and can give verygood results. Throughout this section, we use “ac-tions” and “minimal actions” interchangeably (unlessotherwise stated). Section 5.1 provides a high-leveldescription of the algorithm, and Section 5.2 presentsthe details of the algorithm. Section 5.3 presents anintelligent method for the inclusion of new actionsin candidate solutions. Finally, Section 5.4 discussessome issues related to the optimization of the pro-posed algorithm.

5.1 Overview of the Search AlgorithmLike any other backtracking search, our algorithm in-crementally builds upon a guess, or a partial solution,until it either finds a complete solution or finds thatthe guess is inconsistent. We decompose the partialsolution into two pieces: (1) an under-approximationformed by making well-defined decisions about theform of a solution, and (2) an over-approximation that isthe set of remaining possible solutions (given the cur-rent under-approximation). In a standard constraintsatisfaction problem, a backtracking search buildsupon a partial assignment to problem variables. Thepartial assignment is inconsistent in two cases: (i) theconstraints upon assigned variables are broken (i.e.,the under-approximation causes a conflict), and/or(ii) the constraints cannot be satisfied by the remain-ing variable assignments (i.e., the over-approximationcannot contain a solution). Each time a choice is madeto build upon the under-approximation, the currentpartial solution is saved at decision level j and a copythat incorporates the new choice is placed at level j+1.If the guess at level j + 1 is inconsistent, we moveback to level j and discard the choice that broughtus to level j + 1. If the guess at level 0 is found to beinconsistent, then enough guesses have been tested todetermine that no solution exists.

Initialize under-approximation as the empty setand over-approximation as all possible minimal actions

ReviseActionsRemove self-loops from over-approximation

Is under-approximationequal to over-approximation?

PickActionLet A be a candidate action that resolves adeadlock that the fewest candidate actions resolve

ReviseActionsCopy partial solution,add A to its under-approximation

AddStabilizationRecRecurse with the copy of the partial solution

ReviseActionsRemove A from over-approximation

Inconsistentpartial solution

No solution exists

Inconsistent

Solution found

Okay Yes

No

Okay

Inco

nsis

tent

Inco

nsis

tent

Inco

nsis

tent

Backtrack! Back

trac

kfr

omto

pmos

tde

cisi

onle

velOkay

Recurse!

AddStabilizationRec

Add to under-approximation orremove from over-approximation

Eliminate actions from over-approximationthat violate determinism or self-disablement

Use the partial solution to calculate the weakestinvariant I ′ such that the partial solutionand I ′ meet the constraints of Problem 4.1

Okay

Inconsistentpartial solution

I ′ exists

No I′ ex

ists

ReviseActions

Fig. 3: Overview of the backtracking algorithm.

In the context of our work, we apply a backtrackingsearch in the space of all valid minimal actions thatcan be included in a solution. Specifically, we use aset of actions, called delegates, that plays the role ofthe under-approximation, and another set of actions,called candidates, that contains the remaining ac-tions to potentially include in delegates. Thus, theset (delegates∪candidates) constitutes the over-approximation.

Figure 3 illustrates an abstract flowchart of the pro-posed backtracking algorithm. We start with the non-stabilizing protocol p, its invariant I (which is closedin p), the topology of p′, and the mappings to (E)and from (H) its expanded state space. The algorithmin Figure 3 starts by computing all valid candidateactions (in the expanded state space) that adhere tothe read/write permissions of all processes. The initialvalue of delegates is often the empty set unlessthere are specific actions that must be in the solution(e.g., to ensure the reachability of particular states).The algorithm in Figure 3 then calls ReviseActionsto remove self-loops from candidates (since theyviolate convergence), and checks for inconsistencies inthe partial solution. The designer may give additionalconstraints that forbid certain actions.

In general, ReviseActions (see the bottom dashed boxin Figure 3) is invoked whenever we strengthen the



8

partial solution by adding to the under-approximationor removing from the over-approximation. It may fur-ther remove from the over-approximation by enforc-ing the determinism and self-disablement constraints(see Theorem 4.3). Then ReviseActions computes thelargest possible invariant I ′ that could be used by thecurrent partial solution. That is, it finds the weakestpredicate I ′ for which the constraints of Problem 4.1can be satisfied using some set of transitions δp′ per-missible by the partial solution. The partial solutionrequires δp′ to include all transitions corresponding toactions in delegates. Additionally, δp′ can includeany subset of transitions corresponding to actions incandidates. For example, Constraint 5 of Problem4.1 stipulates that the transitions of delegates arecycle-free outside of I ′ and that the transitions ofdelegates ∪ candidates provide weak conver-gence to I ′. If such an I ′ does not exist, then the partialsolution is inconsistent.

If our initialized delegates and candidatesgive a consistent partial solution, then we in-voke the AddStabilizationRec routine. The objective ofAddStabilizationRec (see the top dashed box in Figure 3)is to go through all actions in candidates and checktheir eligibility for inclusion in the self-stabilizingsolution. In particular, AddStabilizationRec has a loopthat iterates through all actions of candidates untilit becomes empty or an inconsistency is found. Ineach iteration, AddStabilizationRec picks a candidateaction to resolve some remaining deadlock at the nextdecision level. In general, the candidate action canbe randomly selected. However, to limit the possiblechoices, we use an intelligent method for pickingcandidate actions described in Section 5.3. After pick-ing a new action A, we invoke ReviseActions to addaction A to a copy of the current partial solution byincluding A in the copy of delegates and removingit from the copy of candidates. If the copied partialsolution is consistent, then AddStabilizationRec makesa recursive call to itself, using the copied partial so-lution for the next decision level. If the copied partialsolution is found to be inconsistent (either by a callto ReviseActions or by the exhaustive search in the callto AddStabilizationRec), then we remove action A fromcandidates using ReviseActions. If after removal of Athe partial solution is consistent, then we continue inthe loop. Otherwise, we backtrack since no stabilizingprotocol exists with the current under-approximation.

5.2 Details of the AlgorithmThis section presents the details of the proposedbacktracking method. Notice that we assume that theinput to this algorithm is a non-stabilizing shadowprotocol already superposed with some new finite-domain puppet variables. Misusing C/C++ notation,we prefix a function parameter with an ampersand (&)if modifications to it will affect its value in the caller’sscope (i.e., it is a return parameter).

Algorithm 1 Entry point of the backtracking algo-rithm for solving Problem 4.1.AddStabilization(p: protocol, I : state predicate,

E : mapping Sp → Sp′ ,&delegates: protocol actions,forbidden: forbidden actions)

Output: Return true when a solution delegates canbe found. Otherwise, false.

1: let candidates be the set of all possible actionsthat have transitions in E

((δp|I)∪{(s, s) | s ∈ Sp}

).

2: let adds := delegates {Forced actions, if any}3: delegates := ∅4: let dels := candidates ∩ forbidden5: let I ′ := ∅6: if not ReviseActions(p, I , E , &delegates,

&candidates, &I ′, adds, dels) then7: return false8: end if9: return AddStabilizationRec(p, I , E , &delegates,

candidates, I ′)

AddStabilization. Algorithm 1 is the entry pointof our backtracking algorithm. The AddStabilizationfunction returns true iff a self-stabilizing protocolis found which will then be formed by the actionsin delegates. Initially, the function determines allpossible candidate minimal actions. Next, the func-tion determines which actions are explicitly required(Line 2) or disallowed by additional constraints (Line4). We invoke ReviseActions to include adds in theunder-approximation and remove dels from theover-approximation on Line 6. If the resulting partialsolution is consistent, then the recursive version ofthis function (AddStabilizationRec) is called. Otherwise,a solution does not exist.AddStabilizationRec. Algorithm 2 defines the mainrecursive search. Like AddStabilization, it returns true iffa self-stabilizing protocol is found that is formed bythe actions in delegates. This function continuouslyadds candidate actions to the under-approximationdelegates as long as candidate actions exist. If nocandidates remain, then delegates and the over-approximation delegates∪candidates of the pro-tocol are identical. If ReviseActions does not findanything wrong, then delegates is self-stabilizing,hence the successful return on Line 16.

On Line 2 of AddStabilizationRec, a candidate actionA is chosen by calling PickAction (Algorithm 4). Anycandidate action may be picked without affecting thesearch algorithm’s correctness, but the next sectionexplains a heuristic we use to pick certain candidateactions over others to improve search efficiency. Afterpicking an action, we copy the current partial solutioninto next_delegates and next_candidates, andadd the action A on Line 6. If the resulting partialsolution is consistent, then we recurse by calling



9

Algorithm 2 Recursive backtracking function to addstabilization.AddStabilizationRec(p, I , E , &delegates,

candidates, I ′)Output: Return true if delegates contains the solu-

tion. Otherwise, return false.1: while candidates 6= ∅ do2: let A := PickAction(p, E , delegates,

candidates, I ′)3: let next_delegates := delegates4: let next_candidates := candidates5: let I ′′ := ∅6: if ReviseActions(p, I , E , &next_delegates,

&next_candidates, &I ′′, {A}, ∅) then7: if AddStabilizationRec(p, I , E ,

&next_delegates,next_candidates, I ′′) then

8: delegates := next_delegates {Assignthe actions to be returned}

9: return true10: end if11: end if12: if not ReviseActions(p, I , E , &delegates,

&candidates, &I ′, ∅, {A}) then13: return false14: end if15: end while16: return true

AddStabilizationRec. If that recursive call finds a self-stabilizing protocol, then it will store its actions indelegates and return successfully. Otherwise, if ac-tion A does not yield a solution, we will remove itfrom the candidates on Line 12. If this removal cre-ates a non-stabilizing protocol, then return in failure;otherwise, continue the loop.ReviseActions. Algorithm 3 is a key component ofthe backtracking search. ReviseActions performs fivetasks: it (1) adds actions to the under-approximatedprotocol by moving the adds set from candidatesto delegates; (2) removes forbidden actions fromthe over-approximated protocol by removing thedels set from candidates; (3) enforces self-disablement (Theorem 4.3) and determinism (The-orem 4.4) which results in removing more actionsfrom the over-approximated protocol; (4) computesthe maximal invariant I ′ and transitions (δp′ |I ′) inthe expanded state space to satisfy Constraints 2–4 of Problem 4.1 given the current under/over-approximations, and (5) verifies Constraints 1 and 5of Problem 4.1 by ensuring the invariant I ′ capturesall of I , the under-approximation is livelock-free in¬I ′, and the over-approximation weakly convergesto I ′. If the check fails, then ReviseActions returnsfalse. Finally, ReviseActions invokes the CheckForwardfunction to infer actions that must be added to

Algorithm 3 Add adds to the under-approximationand remove dels from the over-approximation.ReviseActions(p, I , E , &delegates, &candidates,

&I ′, adds, dels)Output: Return true if adds can be added to

delegates and dels can be removed fromcandidates, and I ′ can be revised accordingly.Otherwise, return false.

1: delegates := delegates ∪ adds2: candidates := candidates \ adds3: for A ∈ adds do4: Add each action B ∈ candidates to dels if it

belongs to the same process as A and satisfiesone of the following conditions:• A enables B (enforce self-disabling process)• B enables A (enforce self-disabling process)• A and B are enabled at the same time

(enforce determinism){Find candidate actions that are now triviallyunnecessary for stabilization}

5: end for6: candidates := candidates \ dels7: Compute the maximal I ′ and (δp′ |I ′) such that:

• I ′ ⊆ E(I)• (δp′ |I ′) transitions can be formed by actions

in delegates ∪ candidates• All transitions of delegates beginning in I ′

are included in (δp′ |I ′)• Constraints 2, 3, and 4 of Problem 4.1 hold

8: Check Constraints 1 and 5 of Problem 4.1:• I = H(I ′)• The protocol formed by delegates is

livelock-free in ¬I ′• Every state in ¬I ′ has a computation prefix

by transitions of delegates ∪ candidatesthat reaches some state in I ′

9: if all checks pass then10: adds := ∅11: dels := ∅12: if CheckForward(p, I , E , delegates,

candidates, I ′, &adds, &dels) then13: if adds 6= ∅ or dels 6= ∅ then14: return ReviseActions(p, I , E , &delegates,

&candidates, &I ′, adds, dels)15: end if16: return true17: end if18: end if19: return false

the under-approximation or removed from the over-approximation, and will return false only if it infersthat the current partial solution cannot be used toform a self-stabilizing protocol. A trivial version ofCheckForward can just return true.

A good ReviseActions implementation should pro-



10

vide early detection for when delegates andcandidates cannot be used to form a self-stabilizingprotocol. At the same time, since the function iscalled whenever converting candidates to delegatesor removing candidates, it cannot have a high cost.Thus, we ensure that actions in delegates do notform a livelock and that actions in delegates ∪candidates provide weak stabilization.

A good CheckForward implementation should atleast remove candidate actions that are not needed toresolve deadlocks. This can be performed quickly andallows the AddStabilizationRec function to immediatelyreturn a solution when all deadlocks are resolved.

Theorem 5.1 (Completeness): The AddStabilization al-gorithm is complete.

Proof: We show that if AddStabilization returnsfalse, then no solution exists. Since each candidateaction is minimal and we consider all such actions ascandidates, a subset of the candidate actions wouldform a self-stabilizing protocol iff such a protocolexists. Observe that AddStabilizationRec follows thestandard backtracking [26] procedure where we (1)add a candidate action to the under-approximationin a new decision level, and (2) backtrack and re-move that action from the candidates if an incon-sistency (which cannot be fixed by adding to theunder-approximation) is discovered by ReviseActionsat that new decision level. Even though ReviseActionsremoves candidate actions in order to enforce deter-ministic and self-disabling processes, we know byTheorems 4.4 and 4.3 that this will not affect theexistence of a self-stabilizing protocol. Thus, since wefollow the general template of backtracking [26], thesearch will test every consistent subset of the initial setof candidate actions where processes are deterministicand self-disabling. Therefore, if our search fails, thenno solution exists.

Theorem 5.2 (Soundness): The AddStabilization algo-rithm is sound.

Proof: We show that if AddStabilization returns true,then it has found a self-stabilizing protocol formedby the actions in delegates. Notice that whenAddStabilizationRec returns true, the AddStabilization orAddStabilizationRec function that called it simply re-turns true with the same delegates set. The onlyother case where AddStabilization returns true is whencandidates is empty in AddStabilizationRec (Line 16).Notice that to get to this point, ReviseActions musthave been called and must have returned true af-ter emptying the candidates set and verifying theConstraints of Problem 4.1 on Line 8. Therefore, whenAddStabilization returns true the actions of delegatesform a self-stabilizing protocol.

5.3 Picking Actions via the Minimum RemainingValues MethodThe worst-case complexity of a depth-first backtrack-ing search is determined by the branching factor b and

Algorithm 4 Pick an action using the minimum re-maining values (MRV) method.PickAction(p, E , delegates, candidates, I ′)Output: Next candidate action to pick.

1: let deadlock_sets be a single-element array,where deadlock_sets[0] holds a set of dead-locks in ¬I ′∪E(Pre(δp)) that actions in delegatesdo not resolve.

2: for all action ∈ candidates do3: let i := |deadlock_sets|4: while i > 0 do5: i := i− 16: let resolved := deadlock_sets[i]

∩ Pre(action)7: if resolved 6= ∅ then8: if i = |deadlock_sets| − 1 then9: let deadlock_sets[i + 1] := ∅ {Grow

array by one element}10: end if11: deadlock_sets[i] :=

deadlock_sets[i] \ resolved12: deadlock_sets[i+ 1] :=

deadlock_sets[i+ 1] ∪ resolved13: end if14: end while15: end for16: for i = 1, . . . , |deadlock_sets| − 1 do17: if deadlock_sets[i] 6= ∅ then18: return An action from candidates that

resolves a deadlock in deadlock_sets[i].19: end if20: end for21: return An action from candidates. {Edge case}

depth d of its decision tree, evaluating to O(bd). Wecan tackle this complexity by reducing the branchingfactor. To do this, we use a minimum remainingvalues (MRV) method in PickAction. MRV is classicallyapplied to constraint satisfaction problems [25] byassigning a value to a variable that has the minimalremaining candidate values. In our setting, we pickan action that resolves a deadlock with the minimalnumber of remaining actions available to resolve it.

Algorithm 4 shows the details of PickAction thatkeeps an array deadlock_sets, where each elementdeadlock_sets[i] contains all the deadlocks thatare resolved by exactly i candidate actions. We ini-tially start with array size |deadlock_sets| = 1 andwith deadlock_sets[0] containing all unresolveddeadlocks. We then shift deadlocks to the next highestelement in the array (bubbling up) for each candidateaction that resolves them. After building the array, wefind the lowest index i for which the deadlock setdeadlock_sets[i] is nonempty, and then return anaction that can resolve some deadlock in that set. Line



11

21 can only be reached if either the remaining dead-locks cannot be resolved (but ReviseActions catches thisearlier) or all deadlocks are resolved (but CheckForwardcan catch this earlier).

5.4 Optimizing the Decision Tree

This section presents the techniques that we use toimprove the efficiency of our backtracking algorithm.Conflicts. Every time a new candidate action is in-cluded in delegates, ReviseActions checks for incon-sistencies, which involves cycle detection and reacha-bility analysis. These procedures become very costlyas the complexity of the transition system grows. Tomitigate this problem, whenever an inconsistency isfound, we record a minimal set of decisions (subset ofdelegates) that causes it. We reference these conflictsets [25] in CheckForward to remove candidate actionsthat would cause an inconsistency.Randomization and Restarts. When using the stan-dard control flow of a depth-first search, a bad choicenear the top of the decision tree can lead to infeasibleruntime. This is the case since the bad decision existsin the partial solution until the search backtracksup the tree sufficiently to change the decision. Tolimit the search time in these branches, we employa method outlined by Gomes et al. [27] that combinesrandomization with restarts. In short, we limit theamount of backtracking to a certain height (we use3). If the search backtracks past the height limit, itforgets the current decision tree and restarts from theroot. To avoid trying the same unfruitful decisionsafter a restart, PickAction randomly selects a candidateaction permissible by the MRV method. This approachremains complete since a conflict is recorded for anyset of decisions that causes a restart.Parallel Search. In order to increase the chanceof finding a solution, we instantiate several parallelexecutions of the algorithm in Figure 3; i.e., searchdiversification. As noted in [27], parallel tasks willgenerally avoid overlapping computations due to therandomization used in PickAction. We have observedlinear speedup when the search tasks are expectedto restart several times before finding a solution [19].The parallel tasks share conflicts with each other toprevent the re-exploration of branches that containno solutions. In our MPI implementation, conflictdissemination occurs between tasks using a virtualnetwork topology formed by a generalized Kautzgraph [21] of degree 4. This topology has a diameterlogarithmic in the number of nodes and is fault-tolerant in that multiple paths between two nodesensure message delivery. That is, even if some nodesare performing costly cycle detection and do notcheck for incoming messages, they will not slow thedissemination of new conflicts.

Shadow MPI ComputeProtocol Procs Self-Loops Procs Time

1-Bit Maximal Matching 2–7 Forbidden 1 0.54 secs4-State Token Ring* 2–8 Forbidden 4 1.50 hrs5-State Token Ring 2–9 Forbidden 4 23.75 mins3-State Token Chain* 2–5 Forbidden 4 5.29 secs3-State Token Chain 2–5 Allowed 4 48.93 secs4-State Token Chain [1] 2–4 Forbidden 4 10.11 secs4-State Token Chain [1] 2–4 Allowed 4 1.14 mins3-State Token Ring [1] 2–5 Forbidden 4 1.02 mins3-State Token Ring [1] 2–5 Allowed 4 5.22 mins

* No protocol is found to stabilize for all system sizes under consideration.

Fig. 4: Synthesis runtimes for case studies.

6 CASE STUDIES

In this section, we look for exact lower bounds forconstant-space maximal matching and token passingprotocols using shadow/puppet synthesis. Section6.1 presents an intuitive way to synthesize maximalmatching with shadow variables. Section 6.2 exploresthe unidirectional token ring with a distinguished pro-cess. Section 6.3 explores bidirectional token passingprotocols. Each section gives a new self-stabilizingprotocol that we conjecture uses the minimal numberof states per process to achieve stabilization.

Figure 4 provides the synthesis runtimes of all casestudies. The Procs column indicates the system sizes(numbers of processes) that are simultaneously con-sidered during synthesis. For example, the maximalmatching case shows 2–7, which means that we areresolving deadlocks without introducing livelocks for5 different systems of various sizes. These rangesare chosen to increase the chance of synthesizing ageneralizable protocol. The Shadow Self-Loops columnindicates whether actions within the I ′ invariant mayleave shadow variables unchanged. This is achievedby modifying Line 7 of Algorithm 3 to enforce that(δ′p|I ′) ⊆ E(δp); Forbidding these actions makes everyaction of a token passing protocol actually pass atoken. The MPI Procs column indicates the numberof MPI processes used for synthesis.

6.1 2-State Maximal Matching on a Ring

A matching for a graph is a set of edges that do notshare any common vertices. A matching is maximaliff adding any new edge would violate the matchingproperty. In a ring, a set of edges is a matching as longas at least 1 of every 2 consecutive edges is excludedfrom the set. For the set to be a maximal matching, atleast 1 out of every 3 consecutive edges must be in-cluded. To see that the matching is maximal, considerselecting another edge to create a new matching. Theedge itself cannot be in the current matching nor caneither of the two adjacent edges, but we have alreadyenforced that one of those three is selected, thereforethe new edge cannot be added!

To specify this problem, use a binary shadow vari-able ei to denote whether the edge between adjacent



12

processes Pi−1 and Pi is in the matching (ei = 1 meansto include the edge, otherwise exclude the edge). Theinvariant is therefore the states where at least 1 ofevery 3 consecutive e values equals 1, but at least 1of every 2 consecutive e values equals 0.

I = ∀i ∈ ZN : (ei−1=1∨ei=1∨ei+1=1)∧(ei=0∨ei+1=0)

We would like processes to determine whethertheir neighboring links are included in a matching,therefore we give each Pi write access to ei andei+1. Each ei is a shadow variable since it exists onlyfor specification, therefore processes have write-onlyaccess. Since a matching should not change, there areno shadow actions. For the system’s implementation,we give each process Pi a binary puppet variablexi to read and write along with read access to thexi−1 and xi+1 variables of its neighbors in the ring.We only are guessing that an xi domain size of 2 islarge enough to achieve stabilization and represent(ei, ei+1) values with (xi−1, xi, xi+1) values in someway. Synthesis gives the following protocol, which webelieve to be generalizable after verification of ringsup to size N = 100. Notice that the e values are fullydetermined based on the x values.

Pi : xi−1=1 ∧ xi=1 ∧ xi+1=1 −→ ei:=1; xi:=0; ei+1:=0;

Pi : xi−1=0 ∧ xi=1 ∧ xi+1=1 −→ ei:=0; xi:=0; ei+1:=0;

Pi : xi−1=0 ∧ xi=0 ∧ xi+1=0 −→ ei:=0; xi:=1; ei+1:=1;

Pi : xi−1=1 ∧ xi=0 −→ ei:=1; ei+1:=0;

Pi : xi−1=0 ∧ xi=0 ∧ xi+1=1 −→ ei:=0; ei+1:=0;

Pi : xi−1=0 ∧ xi=1 ∧ xi+1=0 −→ ei:=0; ei+1:=1;

An implementation of this protocol consists onlyof puppet variables, and therefore discards the e vari-ables. Of the 6 actions above, only the first 3 modify xvalues. From these 3, we discard the e variables andcombine the first 2 actions (for simplification). Thisleaves us with the puppet protocol that would be usedfor implementation, where each Pi has the followingactions:

Pi : xi=1 ∧ xi+1=1 −→ xi:=0;

Pi : xi−1=0 ∧ xi=0 ∧ xi+1=0 −→ xi:=1;

Further, we can derive the meaning of the puppetvariables by observing how the value of (ei, ei+1) isassigned for each particular value of (xi−1, xi, xi+1).We could do this by observing all 6 actions, but weonly need to observe the first 3 since they subsumethe last 3. The last 3 actions respectively assign (1, 0),(0, 0), and (0, 1) to (ei, ei+1) to denote that Pi ismatched with (i) Pi−1, (ii) itself, and (iii) Pi+1. Weuse (xi−1, xi, xi+1) values to represent these cases:

xi−1=1 ∧ xi=0 (Pi matched with Pi−1)

xi−1=0 ∧ xi=0 ∧ xi+1=1 (Pi not matched)

xi−1=0 ∧ xi=1 ∧ xi+1=0 (Pi matched with Pi+1)

6.2 5-State Unidirectional Token RingIn the token ring shadow specification in Section 2,each process Pi is given a binary shadow variabletoki that denotes whether the process has a token. The

invariant is all states where exactly one token exists(∃!i : toki = 1), and each process should eventuallypass the token within the invariant (toki = 1 −→toki := 0; toki+1 := 1; ). For synthesis, we give eachPi a puppet variable xi and also let it read xi−1. Likein Dijkstra’s token ring, we distinguish P0 as Bot toallow its actions to differ from the other processes. Wealso force each action in the invariant to pass a tokenby forbidding shadow self-loops. With this restriction,we found that no protocol using 4 states per processis stabilizing for all rings of size N ∈ {2, . . . , 8}.

Using 5 states per process (i.e., each xi has domainZ5), the synthesized protocol is not always general-izable, even when we synthesize for all rings of sizeN ∈ {2, . . . , 9}. However, we can increase our chancesof finding a generalizable version by allowing thesearch to record many solutions and verify correctnessfor larger ring sizes. After sufficient synthesize-and-verify, we are left with the following protocol, whichwe think is generalizable after verification of rings upto size N = 30.

Bot : xN−1 = 0 ∧ x0 = 0 −→ x0:=1; toki:=0; toki+1:=1;

Bot : xN−1 = 1 ∧ x0 ≤ 1 −→ x0:=2; toki:=0; toki+1:=1;

Bot : xN−1 > 1 ∧ x0 > 1 −→ x0:=0; toki:=0; toki+1:=1;

Pi : xi−1 = 0 ∧ xi > 1 −→ xi:=bxi/4c; toki:=0; toki+1:=1;

Pi : xi−1 = 1 ∧ xi 6= 1 −→ xi:=1; toki:=0; toki+1:=1;

Pi : xi−1 = 2 ∧ xi ≤ 1 −→ xi:=2 + xi; toki:=0; toki+1:=1;

Pi : xi−1 ≥ 3 ∧ xi ≤ 1 −→ xi:=4; toki:=0; toki+1:=1;

Our previous work [20] used the approach of Ex-ample 4.2 to synthesize a 6-state token ring and a 3-bittoken ring similar to that of Gouda and Haddix [4]. Indoing so, we observe an interesting efficiency trade-off between process memory, convergence speed, andadherence to the shadow protocol. For example, aprocess may act several times in the 3-bit token ring ofGouda and Haddix [4] without passing a token, butthis protocol converges in fewer steps (faster) thanour 5-state token ring. Dijkstra’s N -state token ringrequires much more process memory, yet it convergesin fewer steps than the 3-bit protocol while alsoensuring that the token will be passed with everyaction.

6.3 3-State Token ChainRather than around a ring, we can also pass a tokenback-and-forth along a linear (chain) topology. For-mally, the token passing behavior can be describedby guarded commands. As with the ring, let therebe N processes, where each process Pi has a binaryshadow variable toki that denotes whether it has atoken. Each process Pi can read and write toki−1 wheni > 0, and each Pi can read and write toki+1 wheni < N−1. Additionally, all processes can read a binaryshadow variable fwd that denotes the direction thetoken is moving (1 means up, 0 means down). Thatis, a process P0<i<N−1 passes the token to Pi+1 whenfwd = 1 and passes it to Pi−1 when fwd = 0. The two



13

end processes P0 and PN−1 must behave differentlythan the others, therefore they are distinguished as Botand Top respectively. Both Bot and Top can write fwd,and do assign it when they pass the token in orderto change the direction of passing. The full shadowprotocol is given by the following actions.

Bot : tok0=1 −→ fwd:=1; tok0:=0; tok1:=1;

Pi : toki=1 ∧ fwd=1 −→ toki:=0; toki+1:=1;

Pi : toki=1 ∧ fwd=0 −→ toki−1:=1; toki:=0;

Top : tokN−1=1 −→ fwd:=0; tokN−2:=1; tokN−1:=0;

For synthesis, give each process a ternary puppetvariable xi. Each P0<i<N−1 can also read xi−1 andxi+1. Likewise, Bot can read x1 and Top can read xN−2.One of the synthesized protocols is as follows, whichwe conjecture to be generalizable after verification ofchains up to size N = 30.

Bot : xi 6=1 ∧ xi+1=2 −→ x0:=1; fwd:=1; toki:=0; toki+1:=1;

Bot : xi 6=0 ∧ xi+1 6=2 −→ xi:=0;

Pi : xi−1=1 ∧ xi 6=1 −→ xi:=1; toki:=0; toki+1:=1;

Pi : xi−1=0 ∧ xi=1 ∧ xi+1=1 −→ xi:=0;

Pi : xi−1=0 ∧ xi=0 ∧ xi+1=2 −→ xi:=2; toki−1:=1; toki:=0;

Top : xi−1=1 ∧ xi 6=1 −→ xi:=1;

Top : xi−1 6=1 ∧ xi 6=2 −→ xN−1:=2; fwd:=0; tokN−2:=1; tokN−1:=0;

This protocol does not always pass the token withinthe invariant, however we found that no such protocolexists. As shown by Dijkstra [1], we can obtain astabilizing protocol with this behavior by either us-ing 4 states per process or allowing Bot and Top tocommunicate. Using the same shadow specificationand the puppet topologies from [1], we synthesized 4-state token chains and 3-state token rings. Both casesappear to give generalizable protocols (verified up toN = 15), regardless of whether we force each actionin the invariant to pass a token.

7 RELATED WORK AND DISCUSSION

This section discusses related work on manual and au-tomated design of fault tolerance in general and self-stabilization in particular. Manual methods are mainlybased on the approach of design and verify, whereone designs a fault-tolerant system and then verifiesthe correctness of (1) functional requirements in theabsence of faults, and (2) fault tolerance requirementsin the presence of faults. For example, Liu and Joseph[28] provide a method for augmenting fault-intolerantsystems with a set of new actions that implement faulttolerance functionalities. Katz and Perry [29] presenta general (but expensive) method for global snap-shot and reset towards adding convergence to non-stabilizing systems. Varghese [30] and Afek et al. [31]provide a method based on local checking for globalrecovery of locally correctable protocols. Varghese [15]also proposes a counter flushing method for detectionand correction of global predicates. Nesterenkol andTixeuil [17] employ a mapping to define all systemstates as legitimate state of an abstract specification.This effectively removes the need for convergence,

but it is not always possible or may require humaningenuity in the specification. Chandy and Misra in-troduce variable superposition [32], where the functionalconcerns are specified on a set of variables, and extraimplementation details are carried out using a setof superposed variables. Our work further decouplesthese two sets but uses their superposition as animplicit mapping.

Since it is unlikely that an efficient method existsfor algorithmic design of self-stabilization [3], mostexisting techniques [5], [6], [7], [33], [8] are based onsound heuristics. For instance, Abujarad and Kulkarni[5], [6] present a heuristic for adding convergenceto locally-correctable systems. Zhu and Kulkarni [33]give a genetic programming approach for the designof fault tolerance, using a fitness function to quantifyhow close a randomly-generated protocol is to beingfault-tolerant. Farahat and Ebnenasir [7] provide alightweight method for designing self-stabilizationeven for non-locally correctable protocols. They alsodevise [8] a swarm method for exploiting the com-putational power of computer clusters towards auto-mated design of self-stabilization. While the swarmsynthesis method inspires the proposed work in thispaper, it has two limitations: it is incomplete andforbids any change in the invariant. Methods forautomated design of fault tolerance [22], [34] have theoption to make deadlock states unreachable. This isnot an option in the addition of self-stabilization; re-covery should be provided from any state in protocolstate space.

8 CONCLUSIONS AND FUTURE WORK

We presented a two-step method for automated de-sign of self-stabilizing systems, called shadow/puppetsynthesis. In the first step, designers provide a shadowspecification consisting of a set of legitimate states,system topology and the expected system behaviorsin the absence of faults. In the second step, we al-gorithmically generate a self-stabilizing system thataccurately implements the behaviors in legitimatestates and provides convergence to legitimate statesin terms of some superposed variables, called thepuppet system. Then, we use a parallel backtrackingsearch to intelligently look for a self-stabilizing solu-tion. We have implemented our approach and haveautomatically generated self-stabilizing protocols thatnone of the existing heuristics can generate (to thebest of our knowledge). These protocols include tokenpassing protocols on the topologies given by Dijkstrain [1], 2-state maximal matching, 5-state token ring,3-state token chain, coloring on Kautz graphs, ringorientation, and leader election on a ring [19].

We are currently investigating several extensions ofthis work. First, we use theorem proving techniquesto figure out why a synthesized protocol may notbe generalizable. Then, we plan to incorporate the



14

feedback received from theorem provers in our back-tracking method. Another extension is to leverage thetechniques used in SMT solvers and apply them inour backtracking search.

ACKNOWLEDGMENTThe authors would like to thank the anonymousreferees for their valuable comments and suggestions.

REFERENCES[1] E. W. Dijkstra, “Self-stabilizing systems in spite of distributed

control,” Communications of the ACM, vol. 17, no. 11, pp. 643–644, 1974.

[2] M. Gouda, “The theory of weak stabilization,” in 5th Interna-tional Workshop on Self-Stabilizing Systems, ser. Lecture Notes inComputer Science, vol. 2194, 2001, pp. 114–123.

[3] A. Klinkhamer and A. Ebnenasir, “On the hardness of addingnonmasking fault tolerance,” IEEE Transactions on Dependableand Secure Computing, vol. 12, no. 3, pp. 338–350, May 2015.

[4] M. G. Gouda and F. F. Haddix, “The stabilizing token ring inthree bits,” Journal of Parallel and Distributed Computing, vol. 35,no. 1, pp. 43–48, May 1996.

[5] F. Abujarad and S. S. Kulkarni, “Multicore constraint-basedautomated stabilization,” in 11th International Symposium onStabilization, Safety, and Security of Distributed Systems, 2009,pp. 47–61.

[6] F. Abujarad and S. S. Kulkarni, “Automated constraint-basedaddition of nonmasking and stabilizing fault-tolerance,” The-oretical Computer Science, vol. 412, no. 33, pp. 4228–4246, 2011.

[7] A. Farahat and A. Ebnenasir, “A lightweight method forautomated design of convergence in network protocols,” ACMTransactions on Autonomous and Adaptive Systems (TAAS), vol. 7,no. 4, pp. 38:1–38:36, Dec. 2012.

[8] A. Ebnenasir and A. Farahat, “Swarm synthesis of conver-gence for symmetric protocols,” in Proceedings of the NinthEuropean Dependable Computing Conference, 2012, pp. 13–24.

[9] F. Faghih and B. Bonakdarpour, “SMT-based synthesis ofdistributed self-stabilizing systems,” in 16th International Sym-posium on Stabilization, Safety, and Security of Distributed Systems(SSS). Springer, 2014.

[10] F. Faghih and B. Bonakdarpour, “SMT-based synthesis ofdistributed self-stabilizing systems,” ACM Transactions on Au-tonomous and Adaptive Systems (TAAS), 2015, to appear.

[11] B. Awerbuch, B. Patt-Shamir, and G. Varghese, “Self-stabilization by local checking and correction,” in Proceedingsof the 31st Annual IEEE Symposium on Foundations of ComputerScience, 1991, pp. 268–277.

[12] M. G. Gouda and N. J. Multari, “Stabilizing communicationprotocols,” IEEE Transactions on Computers, vol. 40, no. 4, pp.448–458, 1991.

[13] F. Stomp, “Structured design of self-stabilizing programs,” inProceedings of the 2nd Israel Symposium on Theory and ComputingSystems, 1993, pp. 167–176.

[14] A. Arora and M. G. Gouda, “Closure and convergence: Afoundation of fault-tolerant computing,” IEEE Transactions onSoftware Engineering, vol. 19, no. 11, pp. 1015–1027, 1993.

[15] G. Varghese, “Self-stabilization by counter flushing,” in The13th Annual ACM Symposium on Principles of Distributed Com-puting, 1994, pp. 244–253.

[16] M. Demirbas and A. Arora, “Specification-based design of self-stabilization,” Parallel and Distributed Systems, IEEE Transac-tions on, vol. 27, no. 1, pp. 263–270, Jan 2016.

[17] M. Nesterenko and S. Tixeuil, “Ideal Stabilization,” Interna-tional Journal of Grid and Utility Computing, vol. 4, no. 4, pp.219–230, Oct. 2013.

[18] B. Finkbeiner and S. Schewe, “Bounded synthesis,” Interna-tional Journal on Software Tools for Technology Transfer, vol. 15,no. 5-6, pp. 519–539, 2013.

[19] A. Klinkhamer and A. Ebnenasir, “Synthesizing self-stabilization through superposition and backtracking,”Michigan Technological University, Tech. Rep. CS-TR-14-01,May 2014. [Online]. Available: http://mtu.edu/cs/research/papers/pdfs/CS-TR-14-01.pdf

[20] A. Klinkhamer and A. Ebnenasir, “Synthesizing self-stabilization through superposition and backtracking,”in 16th International Symposium on Stabilization, Safety, andSecurity of Distributed Systems (SSS). Springer, 2014, pp.252–267.

[21] M. Imase and M. Itoh, “A design for directed graphs withminimum diameter,” IEEE Trans. Computers, vol. 32, no. 8, pp.782–784, 1983.

[22] S. S. Kulkarni and A. Arora, “Automating the addition of fault-tolerance,” in Formal Techniques in Real-Time and Fault-TolerantSystems. London, UK: Springer-Verlag, 2000, pp. 82–93.

[23] E. W. Dijkstra, A Discipline of Programming. Prentice-Hall,1990.

[24] A. Arora and S. S. Kulkarni, “Detectors and correctors: Atheory of fault-tolerance components,” International Conferenceon Distributed Computing Systems, pp. 436–443, May 1998.

[25] S. Russell and P. Norvig, Artificial Intelligence: A Modern Ap-proach. Prentice Hall Press, 2009.

[26] D. E. Knuth, The Art of Computer Programming, Volume I:Fundamental Algorithms. Addison-Wesley, 1968.

[27] C. P. Gomes, B. Selman, H. Kautz et al., “Boosting combina-torial search through randomization,” AAAI/IAAI, vol. 98, pp.431–437, 1998.

[28] Z. Liu and M. Joseph, “Transformation of programs for fault-tolerance,” Formal Aspects of Computing, vol. 4, no. 5, pp. 442–469, 1992.

[29] S. Katz and K. Perry, “Self-stabilizing extensions for messagepassing systems,” Distributed Computing, vol. 7, pp. 17–26,1993.

[30] G. Varghese, “Self-stabilization by local checking and correc-tion,” Ph.D. dissertation, MIT, 1993.

[31] Y. Afek, S. Kutten, and M. Yung, “The local detection paradigmand its application to self-stabilization,” Theoretical ComputerScience, vol. 186, no. 1-2, pp. 199–229, 1997.

[32] K. M. Chandy and J. Misra, Parallel Program Design: A Foun-dation. Addison-Wesley, 1988.

[33] L. Zhu and S. Kulkarni, “Synthesizing round based fault-tolerant programs using genetic programming,” in Stabiliza-tion, Safety, and Security of Distributed Systems. Springer, 2013,pp. 370–372.

[34] A. Ebnenasir, “Automatic synthesis of fault-tolerance,” Ph.D.dissertation, Michigan State University, May 2005.

Alex Klinkhamer Alex Klinkhamer is a PhDstudent in the Department of Computer Sci-ence at Michigan Technological University.Alex received his bachelor’s and master’sdegrees in 2010 and 2013, both from Michi-gan Tech. His research interests include self-stabilization, distributed systems and parallelalgorithms.

Ali Ebnenasir Ali Ebnenasir is an AssociateProfessor of Computer Science at MichiganTechnological University and a senior mem-ber of the ACM. Ali received his bachelor’sand master’s degrees in 1994 and 1998 re-spectively from the University of Isfahan andIran University of Science and Technology. In2005, he received his PhD from the Com-puter Science and Engineering Departmentat Michigan State University (MSU). Afterfinishing his postdoctoral fellowship at MSU

in 2006, he joined the Department of Computer Science at MichiganTech. His research interests include Software Dependability, FormalMethods and Parallel and Distributed Computing.

http://mtu.edu/cs/research/papers/pdfs/CS-TR-14-01.pdf

http://mtu.edu/cs/research/papers/pdfs/CS-TR-14-01.pdf

Documents

Shadow/Puppet Synthesis: A Stepwise Method for the · PDF fileShadow/Puppet Synthesis: A Stepwise Method for the Design of Self ... the proposed method by synthesizing several new