9
Using performability in the design of communication networks A.Sesmun and L.F.Turner Abstract: Conventional network design techniques treat the concepts of performance and reliability separately. This approach ensures that reliability requirements are met and performance specifications are satisfied when all components are operational. However, it does not guarantee a graceful degradation of the performance of the network under conditions of failure. In order to derive a fault-tolerant network, it is necessary to design the network with respect to a combined measure of performance and reliability. Such a measure originated in the early 1980s and is referred to as performability. The authors propose a technique that uses performability in the design of communication networks, with the objective of deriving a design methodology for fault- tolerant networks. The benefits of using this approach, compared with conventional design methods, are illustrated by means of a design example. 1 Introduction Traditionally, performance and reliability are considered separately in network design. Reliability is typically expressed in terms of connectivity and requires that a network remains connected when the most likely failures occur. The performance of the network is optimised sepa- rately assuming that the network is failure-free. Such an approach does not ensure that resources are distributed in such a way as to maintain performance under conditions of failure. In order to design a network that performs well even in the presence of failures, and which delivers the best average performance over a period of time, performance and reliability have to be considered together. In this paper, the concept of performability [1] is used in the design process to generate such fault-tolerant networks. The application of performability to systems has centred mainly on computer-based systems, including degradable computers [2], multiprocessor systems [3], software [4] and distributed database systems [5]. This paper focuses on the application of the concept to network design. 2 Network performability modelling and evaluation Prior to its use in design, a means of evaluating the performability of a network is needed. Performability is a concept that seeks to combine performance and reliability and is formally defined by [6] as follows: for a system S with performance Y taking values in accomplishment set A, the performability of S is the probability measure Perf induced by Y where, for any measurable set B of accom- plishment levels (B A) Perf B PY 2 B the probability that S performs at a level in B: 1 Performability evaluation involves two steps: the construc- tion of a suitable model, and the solution of the model. Model construction consists of specifying the performance variable Y and determining the base model X to be solved. The Markov Reward Model provides a means of combining performance and reliability data, and is used in performability modelling of the network. It consists of a continuous-time Markov Chain which represents the fail- ure/repair process and a reward structure. Each state of the Markov process represents the network operating in a particular configuration, as dictated by the failure of its components. Each state is also associated with a reward which gives an indication of the performance of the network while it operates in that configuration. The rewards are assumed to range from 0 to 1; with the better the performance, the higher the reward. The Markovian model is justified considering that the failure and repair of a component at time t do not affect the failure/repair process at time t 1. Once the model is constructed, performability can be evaluated. Several techniques have been proposed for performability evaluation using different numerical meth- ods [7–10]. A number of modelling tools have also been developed [11, 12] and a survey of a selection of tools is provided in [13]. The randomisation technique as used by De Souza and Gail [14, 15] has been chosen for computa- tion of performability. The technique was implemented in software in C. The measure sought is a distribution of average accumulated reward ACR(t) over a given observa- tion period t, which, given the mapping of performance on # IEE, 2000 IEE Proceedings online no. 20000741 DOI: 10.1049/ip-cdt:20000741 Paper first received 17th September 1999 and in revised form 20th July 2000 A. Sesmun is with Motorola, 16 Euroway, Blagrove, Swindon SN5 8YQ, UK E-mail: [email protected] L.F. Turner is with the Department of Electrical and Electronic Engineering, Imperial College of Science, Technology and Medicine, Exhibition Road, London 5W7 2BT, UK E-mail: [email protected] IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000 355

Using performability in the design of communication networks

  • Upload
    lf

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Using performability in the design ofcommunication networks

A.Sesmun and L.F.Turner

Abstract: Conventional network design techniques treat the concepts of performance andreliability separately. This approach ensures that reliability requirements are met and performancespeci®cations are satis®ed when all components are operational. However, it does not guarantee agraceful degradation of the performance of the network under conditions of failure. In order toderive a fault-tolerant network, it is necessary to design the network with respect to a combinedmeasure of performance and reliability. Such a measure originated in the early 1980s and isreferred to as performability. The authors propose a technique that uses performability in thedesign of communication networks, with the objective of deriving a design methodology for fault-tolerant networks. The bene®ts of using this approach, compared with conventional designmethods, are illustrated by means of a design example.

1 Introduction

Traditionally, performance and reliability are consideredseparately in network design. Reliability is typicallyexpressed in terms of connectivity and requires that anetwork remains connected when the most likely failuresoccur. The performance of the network is optimised sepa-rately assuming that the network is failure-free. Such anapproach does not ensure that resources are distributed insuch a way as to maintain performance under conditions offailure. In order to design a network that performs welleven in the presence of failures, and which delivers the bestaverage performance over a period of time, performanceand reliability have to be considered together. In this paper,the concept of performability [1] is used in the designprocess to generate such fault-tolerant networks.

The application of performability to systems has centredmainly on computer-based systems, including degradablecomputers [2], multiprocessor systems [3], software [4]and distributed database systems [5]. This paper focuses onthe application of the concept to network design.

2 Network performability modelling andevaluation

Prior to its use in design, a means of evaluating theperformability of a network is needed. Performability is a

# IEE, 2000

IEE Proceedings online no. 20000741

DOI: 10.1049/ip-cdt:20000741

Paper ®rst received 17th September 1999 and in revised form 20th July2000

A. Sesmun is with Motorola, 16 Euroway, Blagrove, Swindon SN5 8YQ,UKE-mail: [email protected]

L.F. Turner is with the Department of Electrical and ElectronicEngineering, Imperial College of Science, Technology and Medicine,Exhibition Road, London 5W7 2BT, UKE-mail: [email protected]

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

concept that seeks to combine performance and reliabilityand is formally de®ned by [6] as follows: for a system Swith performance Y taking values in accomplishment set A,the performability of S is the probability measure Perfinduced by Y where, for any measurable set B of accom-plishment levels (B�A)

Perf �B� � P�Y 2 B�� the probability that S performs at a level in B:

�1�Performability evaluation involves two steps: the construc-tion of a suitable model, and the solution of the model.Model construction consists of specifying the performancevariable Y and determining the base model X to be solved.

The Markov Reward Model provides a means ofcombining performance and reliability data, and is usedin performability modelling of the network. It consists of acontinuous-time Markov Chain which represents the fail-ure/repair process and a reward structure. Each state of theMarkov process represents the network operating in aparticular con®guration, as dictated by the failure of itscomponents. Each state is also associated with a rewardwhich gives an indication of the performance of thenetwork while it operates in that con®guration. Therewards are assumed to range from 0 to 1; with the betterthe performance, the higher the reward. The Markovianmodel is justi®ed considering that the failure and repair ofa component at time t do not affect the failure/repairprocess at time t� 1.

Once the model is constructed, performability can beevaluated. Several techniques have been proposed forperformability evaluation using different numerical meth-ods [7±10]. A number of modelling tools have also beendeveloped [11, 12] and a survey of a selection of tools isprovided in [13]. The randomisation technique as used byDe Souza and Gail [14, 15] has been chosen for computa-tion of performability. The technique was implemented insoftware in C. The measure sought is a distribution ofaverage accumulated reward ACR(t) over a given observa-tion period t, which, given the mapping of performance on

355

to reward, represents a distribution of average performance.If the mapping of performance on to reward is a one-to-onemapping, the distributions correspond to each other.However, the computation involved in the solution techni-que is affected by the number of distinct rewards allowedso that a range of performance levels is mapped on to asingle reward. In this case, the distribution of averageaccumulated reward can only give an indication of theaverage performance over a time period.

Assuming that the better the performance, the higher thereward, the ideal distribution of average performance isexpected to be in the form of an ideal step function. Thisideal case corresponds to one where performance in eachstate is within the range that is mapped on to a reward ofvalue 1.

3 Design proposal

3.1 Introductory comments

The proposed design technique consists of three stages: theinitial topology generation stage, the connectivity estab-lishment stage and the fault-tolerance augmentation stage.The ®rst two stages are required to design a network whichmeets initial performance and basic reliability require-ments. The most important stage is the fault-toleranceaugmentation stage, where the network is augmented inorder to satisfy performability speci®cations. The stepsconstituting these stages will now be discussed in moredetail with reference to the overall design method, which isillustrated in Fig. 1. Besides these three stages, a number ofother procedures are required to evaluate the cost andperformability of the network.

node locations

calculate distance

matrix

generation of

initial topology

link costs

switching,

set-up, and

buffer costs

traffic matrix connectivity

establishment

ca culate costl

budget

exceeded?

if no more

possible

augmentation

STOP

compute performence

in all statesfailure/repair

data

compute

performability

meet

specs?

routing

algorithmYES

NO

design complete

fault tolerance

augmentation

YES

NO

Fig. 1 Schematic representation of the overall design method

356

The inputs to the design consist of the node locations,traf®c demands and various attributes which contribute toevaluation of the cost of the network. Given the routingalgorithm and traf®c demands, the performance of thenetwork can be derived in all of the failure states whichare considered. Failure and repair data of the differentcomponents are required to determine the performability ofthe network.

3.2 Assumptions

Throughout the design, the following assumptions aremade:

1. All node locations are known. Design involves connect-ing them so as to achieve the best level of performabilityunder cost constraints.2. A node consists of switching components and buffersfor each of the outgoing links at the node. The queueingdiscipline governing these buffers is First In First Out.3. The traf®c matrix is known prior to design and remains®xed.4. An optimal routing algorithm based on the GradientProjection Method [16] is used.5. Only single link failures are considered. Assuming theaverage repair time is considerably shorter than the meantime between failure of two links, single failures are morelikely than multiple failures.

3.3 Initial topology generation stage

Given the locations of the nodes, the minimal spanning treeconnecting the nodes is derived. Generation of this initialtopology can be based on distance or traf®c demandsbetween the nodes. The traf®c ¯ows on all the links arethen determined and the square root channel capacityassignment algorithm [17] is applied to evaluate thecapacity of the links.

3.4 Connectivity establishment stage

Failure of any link in the topology generated after execu-tion of the ®rst stage causes the network to be discon-nected. In this stage, more links are added in order toimprove the connectivity of the network so that at least a 2-connectivity criterion holds. Addition of links at this stageneed not be performability-based and can depend solely oncost, or demand.

3.5 Fault-tolerance augmentation stage

Fault-tolerance augmentation involves equipping thenetwork with suf®cient resources to enable it to avoidloss of traf®c and large delays when failures occur. Adirect implication is that suf®cient redundancy, coupledwith adequate capacity, has to be built into the network forit to cope with any excess traf®c under conditions offailure. Two fault-tolerance augmentation mechanisms areproposed. They are:

(i) adding links, and(ii) augmenting the capacity of existing links.

This stage is iterative in nature and its main steps areoutlined below. At the beginning of an iteration, there is anetwork whose performance can be assessed in all of itsfailure states; and its performability can thus be evaluated.

Let the performability and cost of the network be Perfold

and Costold , respectively.

Step 1. Choose the state where augmentation is to be tried.Step 2. Identify most heavily loaded links.

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

Step 3. Apply the rules of augmentation to decide onpossible mechanisms. For each possible augmentationmechanism and resulting topology, compute Perfnew 7Perfold /Costnew 7Costold , that is �Perf /�Cost wherePerfnew and Costnew are associated with this new topology.Step 4. Choose the best topology with respect to aboveratio, that is, the one giving max �Perf/min �Cost, andapply augmentation technique.Step 5. Repeat the steps of iteration until performabilityspeci®cations are met within the allocated budget.

3.5.1 Choice of state: The choice of the state whereaugmentation is to be carried out is an important decision.The state of the network here refers to the possiblecon®gurations it can be in, as dictated by the occurrenceof failures and repairs.

In this paper, when considering design, the state chosenis a failure state. Since the main objective is to achieve afault-tolerant design, it is reasonable to optimise thenetwork in a state of failure. Resources are placed in thebest possible way to reduce performance degradation whena link fails. Even if the network is being considered in afailure state, any augmentation applied should contribute toimproving the performance in the fully operational state,although the network might not be optimal in that state.Although only link failures will be considered in thegeneration of results, the design methodology is applicableto node failures where a node failure is treated as a failureof all links connected to it.

Choice of the state of failure can be based on perfor-mance or performability. It makes sense to investigatepossibilities of augmentation in the state where mostdegradation in performance occurs, as in Kang and Tan[18]. However, such an event may be rare and it may bemore bene®cial, over a period of time, to apply augmenta-tion to a state where less degradation occurs, but where thestate occurs more often. The concept of a performabilityindex is proposed as a means of differentiating betweensuch states.

3.5.2 Concept of performability index (PI): Aperformability index (PI) is proposed as a concept whichgives an indication of the relative importance of eachcomponent in the network with importance being relatedto its failure rate, mean repair time and degree of perfor-mance degradation resulting from failure of the compo-nent, over a period of time. The rationale behind the PI isto differentiate for example between a state of failure whereperformance degradation resulting from a component fail-ure is fairly low and the component fails frequently, andone in which performance degradation is drastic but suchan event occurs very rarely, under the assumption that therepair times are the same. In such a situation, the questionis whether it is preferable to apply augmentation to a stateof high degradation and low failure rate or to one of lessdegradation and high failure rate. Mean repair times arealso taken into consideration in the evaluation of the index.

The evaluation of the performability index for a compo-nent involves considering the component on its own. Themodel thus consists of two states: one where all compo-nents are operational and one where the component underconsideration has failed. The index is then de®ned asP[ACR(t)� rp], where ACR(t) is the average accumulatedreward over an observation period t, and rp is a speci®edreward level. Such a de®nition assumes that failures areindependent and that only a single failure occurs at any onetime.

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

In a network, the index associated with each link can bedetermined. Assuming a mapping of performance on toreward which is such that the poorer the performance, thelower the reward the distribution of ACR(t) aimed at is astep function u(r7 1), where the maximum reward isequal to 1. Therefore, at a reward of rp , the higher thevalue of P[ACR(t)� rp], the more important is the link-whose PI is being evaluated. Thus, the state associated withthe highest PI is chosen at this stage.

3.5.3 Augmentation strategy: Once the most heav-ily loaded links have been identi®ed in a particular state,the augmentation strategy consists of either adding links todeviate traf®c from the most heavily loaded links andnodes, or augmenting capacities of the most utilised linksto relieve congestion and improve performance. The rulesdescribed in Section 3.5.4 can be used to identify thescenario being dealt with. In particular, when design isbased on an increase in connectivity, rules 1 to 3 apply andwhen capacity augmentation is used, rules 4 and 5 apply.

Application of the augmentation mechanism generates anetwork topology whose cost and performability can beevaluated. Given a number of possible topologies, theperformability-based descent factor can be used to selectthe best one. The process is repeated until the perform-ability objective is attained, or until the budget is exceeded.

3.5.4 General rules for fault-tolerance augmenta-tion: Increase in connectivity and capacity augmentationcan be combined to generate the best network. Investiga-tion of different possibilities of where to add links in anetwork and where to apply capacity augmentation leads tothe following rules.

1. If a node is particularly congested, then, the mechanismis to try and deviate traf®c away from that node. Forexample, in Fig. 2a, if all existing links into and out ofnode N are overloaded, then potential links can be added soas to avoid the direction of traf®c into node N.2. If a link is identi®ed as being the most heavily loaded,then the node from which it originates is found. Thepotential augmentation links are those from that node toother nodes in the networks. For example, in Fig. 2b, if linkl is overloaded in the direction originating from node N,then there are possible additional links.3. If the most heavily loaded link is the sole connectionbetween a particular node and the rest of the network, thenother links to be considered are those from that node toother nodes in the network. For instance, in Fig. 3a, even ifthe most heavily loaded link leads into node N, then thereare links to be considered for addition.4. If certain links are identi®ed as being particularlyoverloaded as described in rules 1 to 3, then augmentingtheir capacity is worth investigating.5. If overloaded, or most heavily loaded links, are scat-

N

N

l

a b

Fig. 2 Scenarios for application of augmentation rules

a Application of rule 1b Application of rule 2ÐÐ existing link± ± ± potential link

357

tered, as shown in Fig. 3b, then the approach is to assessaugmentation of the capacity of each of them.

3.5.5 Choice of topologyAlthough the aim is to use performability to select the ®naltopology; in this paper, different possibilities are investi-gated for purposes of comparison. The topology chosen atthe end of each iteration can be

1. the one performing best in the fully operational state, asis done traditionally,2. the one which gives the best improvement in perfor-mance in the state being assessed3. or the one which gives the best improvement inperformability.

With 2, performance is optimised in a state of failure but itis possible that the performance in another state of failuremay be poor. With 3, in the evaluation of performability,the performance in all states is taken into consideration,and the topology which, on average, performs best over allof the states in which the network can be is chosen. Eachstate is weighted by the probability that it will operate insuch a con®guration, and by the residence time duringwhich it is in that state. This is dictated by the failure rateand mean repair time associated with the components.

3.6 De®nition of the Performability-baseddescent factor

If the choice of topology is performability-based, then inthe iterations of the fault-tolerance augmentation stage, theperformability-based criterion, or descent factor, is givenby

�performability

�cost� Perf n�1�b; t� ÿ Perf n�b; t�

Costn�1 ÿ Costn�2�

where Perf x�b; t� : performability of the network after x

iterations over time period �0; t�;Perf x�b; t� � P�average accumulated

reward ACR�t� � b� over �0; t�and Costx : cost of the network after iteration x:

3.7 De®nition of the cost function

Two de®nitions of the cost function are now considered.

3.7.1 Cost function±De®nition 1: Cost is based onthe de®nition used in [18]. A link is assumed to consist of anumber of channels of ®xed capacity. The cost of a link istaken to be linearly proportional to its capacity, as well asits physical length. A node consists of switching compo-

a b

N

Fig. 3 Scenarios for application of augmentation rules

a Application of rule 3b Application of rule 5ÐÐ existing link± ± ± potential link±"± existing overloaded or heavily loaded

358

nents and buffers. As suggested by Kang and Tan [18], thenodal hardware switching cost is assumed to be propor-tional to the square of the number of channels connected toit, and the cost of buffers at a node is taken to be linearlyproportional to the size of the buffers. Allocation of buffercapacity has to take into consideration the statistics of thearriving stream of packets, and the rate at which thesepackets can be serviced and directed on to outgoing lines.

The cost function is de®ned as

D�N � �XL

j�1

D�Nj�

�XL

j�1

�j � �jNj � jNj � �jN2j

� � �3�

where Nj is the number of channels constituting simplex

link j;

�j is the initial set-up cost of link j;

�j � sjLj where sj is the cost per channel per km

for a simplex link j; and Lj is length of the link,

j is the cost per buffer per channel for link j and

�j represents the switching costs per link j:

3.7.2 Cost function±De®nition 2: Given thatperformability is evaluated over a period of time, it isreasonable to consider the cost of maintaining the network.The algorithm proposed by Iyer et al. [19] for evaluatingmoments of accumulated reward is used to determine theaverage cost of repair incurred over a period of time. Costis then de®ned as

D2�N � �XL

j�1

D1�Nj�

� Average repair cost over time duration �0; t��4�

The algorithm used is affected neither by the number ofdistinct rewards used nor by the range of these rewards, sothat it is not necessary to map the repair cost for eachcomponent on to a reward.

4 Results

4.1 Comparison between use of the two costfunctions in the descent factor

A 6-node network, as shown in Fig. 4a, is used to illustratethe use of the two cost functions. The links have failurerates which range from 1 to 5 per 100 days. The mean

120

100

80

60

40

20

0

250

200

150

100

50

00 100 200 300

x-co-ordinate100 200 300x-co-ordinate

y-c

o-o

rdin

ate

0

2 3

6 5

41

y-c

o-o

rdin

ate

a b

Fig. 4 6-node ring network and 10-node 2-connected network

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

Table 1: Cost and performability of network

Network P[ACR(10)�0.8] Cost of setting Average repair Total cost of

up the network cost of network network

Initial 0.3171 3001.65 282.22 3283.87

Network1 0.1745 3253.65 480.08 3733.73

Network2 0.2471 3203.38 298.47 3501.85

repair time varies between 24 and 25 hours. Each link alsohas an average repair cost of between 100 and 300 units forthe existing links.

The GPM-based routing algorithm is applied to theabove network in all its possible states. The worst perfor-mance is obtained when links connecting nodes 5 and 6fail. The suggested augmentation mechanisms are theaddition of links between nodes 1 and 4 giving a networkthat will be referred to as network1, or the addition of linksbetween nodes 2 and 4 giving a network referred to asnetwork2. Assume that both potential links have the samefailure rates, and mean repair times. The links betweennodes 1 and 4 are associated with an average repair cost of600 units, whereas those between nodes 2 and 4 have amean repair cost of 75 units. From a performability pointof view, the addition of links between nodes 1 and 4 ismore bene®cial. However, setting up the network is moreexpensive. Data pertaining to performability and cost ofthe network is summarised in Table 1. Performability istaken at the point where reward equals 0.8. The total costof the network includes the cost of setting up the networkand the average repair cost over a time period of 10 days.

Based on the data in Table 1, evaluation of DescentFactor 1 (DF1), which excludes cost of repair is as givenbelow.

For network1; DF1 � ÿ0:000566:

For network2; DF1 � ÿ0:000035:

A comparison of the descent factors leads to choice ofnetwork1. The higher cost of setting up the network isoffset by the improvement in performability.

Consider evaluation of Descent Factor 2 (DF2), whichincludes cost of repair.

For network1; DF2 � ÿ3:17� 10ÿ4

For network2; DF2 � ÿ3:21� 10ÿ4:

When the average cost of repair of the links is taken intoconsideration, network2 is preferred because the invest-ment required by network1 is not justi®ed by the degree ofimprovement in performability. Given that the fault-toler-ance needs to be evaluated over a period of time, it ismeaningful to include the average cost of repair expectedto be incurred. In the example considered, although the twonetworks behave in a similar way in respect of componentfailure and repair, network1 is more costly to maintain.

4.2 Results of application of performability todesign

The capability of the proposed design technique will bedemonstrated with the design of a 10-node network. Giventhe locations of the nodes and the traf®c demands, thecon®guration shown in Fig. 4b is generated after executionof the second stage of design. This topology is subjected tofault-tolerance augmentation. Three different approacheswill be investigated. They are referred to as the pureperformance approach, the joint performance/performabil-

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

ity approach, and the pure performability approach. Thepure performance approach corresponds to conventionaldesign. In the joint performance/performability approach,performance is used in the choice of state and perform-ability is used in the choice of topology. Both the choice ofstate and topology are based on performability in the pureperformability approach.

The results are organised as follows: in sets A and B, thethree different approaches are compared in terms of theperformability of the ®nal network obtained when a certainstopping criterion for design is reached. In set C, theevolution of the fault tolerance augmentation stage in thethree approaches is demonstrated by examining a chosenperformability ®gure of the network generated as theiterations proceed. In set D, the design approaches arecompared in terms of the performance of the ®nal networkin the fully operational state.

4.3 Set A ± Capacity augmentation based design

In this case, capacity augmentation is used as the augmen-tation mechanism. The stopping criterion for design is setto be the stage where performance in the fully operationalstate is less than 20, with performance being measured asthe average number of data units in the network.

The results obtained are summarised in Table 2 and theperformability distributions of the networks are plotted inFig. 5, as a distribution of average accumulated reward fora period of 10 days. The performability of the initialnetwork is shown by the graph labelled initial. Theremaining labels represent the type of approach used indesign.

From Table 2, it can be seen that the performance of theinitial network in some states of failure is highly degradedas compared with the level obtained in a fully operationalstate. On augmentation according to the pure performanceapproach, performance is optimised in the fully operationalstate, but with no regard for performance in states offailure. Given that the capacity of the overall network isincreased, performance is enhanced to various degrees inthe states of failure.

Application of a performability-based design aims atimproving performance in all the possible network states.As shown in Table 2, at a stage where the stopping criterionis satis®ed, performance in the failure states is considerablyimproved when choice of the topology is performability-based. This translates into the distribution of averageaccumulated reward for networks designed using perform-ability-based methods being closer to the ideal step func-tion, than the one generated by a pure performanceapproach (see Fig. 5). In the performability-based design,both the pure performance and joint performance/perform-ability methods, 25 augmentations were suf®cient to meetthe requirements whereas 34 iterations were necessary inthe pure performance approach.

359

Table 2: Performance levels of the networks designed

Con®guration Performance of Performance of network designed according to approach based on

Initial network pure performance/performability pure

performance performability

Fully operational 56.56 19.83 19.67 19.82

Links (1±2) fail 1290657.00 1178432.00 25.09 25.10

Links (1±3) fail 1.21e7 1.06e7 39.74 40.16

Links (2±6) fail 944.89 21.69 259.78 33.81

Links (3±4) fail 1477054.00 1477022.00 41.72 41.46

Links (3±5) fail 8722924.00 8630677.00 45.83 46.45

Links (4±5) fail 478.94 19.87 232.28 441.15

Links (5±6) fail 105.92 38.16 28.59 28.30

Links (5±8) fail 1.27e7 5333499.00 136.52 136.84

Links (6±7) fail 113508.90 25.39 89.85 89.17

Links (7±10) fail 1880191.00 25480.16 51.98 49.10

Links (8±9) fail 8735316.00 2941809.00 38.64 38.22

Links (9±10) 7971638.00 1672424.00 47.59 46.16

4.4 Set B ± Increase in connectivity based design

In this case, the design is stopped when performance in thefully operational state is less than 30. In Figs. 6 and 7, theresults are shown in terms of a distribution of averageaccumulated reward for a time period of 10 days. Thenetworks being compared are obtained after 5 iterations forthe pure performability approach, 6 iterations for the jointperformance/performability approach, and 8 iterationswhen design is based on a pure performance criterion.

From Fig. 6, adding links according to all threeapproaches results in a considerable improvement in thedistribution of average accumulated reward. This is due toperformance being enhanced in all the states in which thenetwork can be. In Fig. 7, where a more detailed consid-eration is given, it is clear that performability-based designmethods yield networks whose distribution of averageaccumulated reward is closer to the ideal step function.The addition of links according to a pure performancecriterion contributes to an enhanced performance in all thestates, but the performability-based methods are better.Furthermore, in this speci®c example, networks that are

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

P[A

CR

(t)

r]

0 1.0

reward r

Fig. 5 Performability distributions of the networks designed using capa-

city augmentation

u initial, d joint performance/performabilitym pure performance, s pure performability

360

0.4 0.5 0.6 0.7 0.8 0.9reward r

P[A

CR

(t)

r]

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0

Fig. 7 Performability distributions of the networks designed using

increase in connectivity over reward range [0.4,0.9]

r performance/performabilitym pure performabilitys pure performance

1.0

0.9

0.8

0.7

0.6

0.5

0.3

0.4

0.2

0.1

00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

P[A

CR

(t)

r]

reward r

Fig. 6 Performability distributions of the networks designed using

increase in connectivity

u initial, d joint performance/performabilitym pure performance, s pure performability

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

more fault-tolerant are obtained on the addition of fewerlinks than when design is carried out according to a pureperformance method. A joint performance/performabilityapproach generates the best network.

4.5 Set C ± Comparison at different numbers ofiterations

In this section, the effects of applying different fault-tolerance mechanisms to the network in Fig. 4b areobserved as the iterations of design proceed. Performabil-ity is taken at the point where reward equals 0.8, and istherefore given by P[ACR(t)� 0.8], where time t is set to be10 days. Figs. 8 and 9 show how this point-performabilityvaries after different numbers of iterations for all threeapproaches. In Fig. 8, results are obtained when the designis based on capacity augmentation only. An increase inconnectivity is used as sole augmentation mechanism inthe generation of results for Fig. 9.

In the results derived for Fig. 9, the cost of the networkwas not taken into consideration. Including cost in thedescent factor gives the results shown in Fig. 10. Giventhat all the potential additional links are assumed to haveequal failure rates, and mean repair times, and require thesame cost of repair, they are all associated with the sameaverage cost of repair over a period of time. This cost isthen not taken into consideration when comparing thedifferent topologies, and De®nition 1 is used in evaluationof the cost of the network and the performability-baseddescent factor.

When cost is neglected, the performability-basedapproaches outperform the pure performance approach,since lower point-performabilities are involved as theiterations proceed. A comparison between the pureperformability and joint performance/performabilityapproaches is not conclusive in the sense that there is noindication as to which approach produces a better networkfaster. If cost is included in the descent factor, the advan-tage of using a performability-based design as compared toa pure performance approach is less apparent.

In the speci®c example considered, the performability-based methods yield networks that are more fault-tolerantafter fewer iterations than is the case when the pureperformance approach is used. The implication is that abetter network can be obtained using reduced resources ifdesign is based on performability. The network is better interms of its ability to withstand failures, that is, in terms ofits performance in its failure states. This does not necessa-rily mean that performance in the fully operational state isbetter, but that it is at an acceptable level. The reason forthis is that by optimising in a state of failure, and by basingthe choice of the topology on performability, resources arebeing used in such a way as to improve the averageperformance over all the allowable states. It is possiblethat in so doing, resources are better positioned so thatfewer additional links (or less extra capacity) are requiredto achieve the acceptable performance levels in all thestates. From an economic viewpoint, the implication isthat, depending on the locations of the additionalresources, a more fault-tolerant network can be obtainedat a lower cost.

4.6 Set D ± Comparison of the designapproaches in terms of performance

In this section, the way in which performance in the fullyoperational state varies as the iterations proceed is inves-tigated. In Figs. 11 and 12, results are shown for networks

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

0.6

0.5

0.3

0.2

0.1

0

0.4

0 3 6 9 12 15 18 21 24 27

number of iterations

P[A

CR

(t)

0.8

]

Fig. 8 Point-performabilities of networks designed using capacity

augmentation after every 3 iterations

u pure performancer performance/performabilitym pure performability

P[A

CR

(t)

0.8

]

0

number of iterations

1 2 3 4 5 6

0.6

0.5

0.4

0.3

0.2

0.1

0

Fig. 9 Point-performabilities of networks designed using increase in

connectivity after each iteration

u pure performancer performance/performabilitym pure performability

0number of iterations

1 2 3 4 5 6

P[A

CR

(t)

0.8

]

0.6

0.5

0.4

0.3

0.2

0.1

0

Fig. 10 Point-performabilities of networks designed using increase in

connectivity when cost is considered

u pure performancer performance/performabilitym pure performability

361

designed using capacity augmentation and increase inconnectivity, respectively.

The pure performance approach aims at optimisingperformance in the fully operational state, and the perform-ability-based approaches do not. Interestingly, in thisspeci®c example, results show that the performability-based approaches generate networks with a better perfor-mance in the fully operational state as iterations proceed.From Fig. 12, the joint performance/performabilityapproach produces the best network, and the pure-perfor-mance and pure-performability approaches alternate inproducing a better network during the course of thedesign. However, in Fig. 11, the performability-basedapproaches outperform conventional design at all itera-tions. The argument is that although performability-basedmethods optimise performance in states of failure, thisdoes not necessarily mean that they are detrimental to theimprovement in performance that can be achieved in thefully operational state.

60

50

40

30

200 1 2 3 4 5 6

number of iterations

perf

orm

an

ce/n

um

ber

ofd

ata

un

its

inn

etw

ork

Fig. 12 Performance of networks designed using increase in connectiv-

ity in the fully operational state

r pure performancej joint performance/performabilitym pure performability

0 3 6 9 12 15 18 21 24 27

number of iterations

60

55

50

45

40

35

30

25

20

15

perf

orm

an

ce/n

um

ber

ofd

ata

un

its

inn

etw

ork

Fig. 11 Performance of networks designed using capacity augmentation

in the fully operational state

r pure performanceu joint performance/performabilitym pure performability

362

5 Conclusions

A design methodology has been proposed which uses aperformability-based descent factor during execution of itsiterations. The main objective has been to investigate theeffects of using a combined measure of performance andreliability on the design of fault-tolerant networks. Thesteps of a design approach have been described in whichan iteration of design consists mainly of choosing a state forassessment, applying augmentation mechanisms andchoosing a topology consecutively. The approach wasevaluated in software and was not tested in practical casestudies.

The two characteristics that distinguish the proposeddesign methodology from conventional design techniquesare that the state chosen for optimisation is a failure stateand performability is used to select the best topology at theend of an iteration. Choice of state may be based onperformance or on performability, using a performabilityindex. A performability index has been proposed as ameans of evaluating the relative importance of a compo-nent in a network, taking into consideration its failure rate,mean repair time and the degradation it causes in perfor-mance when it fails.

The results obtained show that using performability inthe choice of topology at the end of iteration de®nitelygenerates a better network in terms of its ability to with-stand failures. The bene®ts of using performability in thechoice of the topology are clear, but when it comes to thechoice of the state, using a performability index does notalways yield better results. It was expected that by indicat-ing which is the most important component, a perform-ability-based choice of state would result in a more fault-tolerant design than a performance-based choice. This doesnot always happen, and the reason is that augmentation in aparticular state has an effect on performance in all thestates in which the network can be. Therefore, augmenta-tion in the worst state with respect to performance mayhave a more bene®cial effect on performance on all otherstates than augmentation in the worst state from a perform-ability point of view.

Clearly, in most cases, design needs to take into consid-eration the cost of the network. Two de®nitions have beenconsidered. In both de®nitions, cost is de®ned in terms ofthe components of the network, but the second de®nitionalso includes the average cost of repair associated with acomponent.

Overall, the price paid in a performability-based designis in terms of the amount of computation involved. Inconventional network design, only the failure-free state isconsidered. Therefore, if n possible ways of augmentingthe network need to be compared, performance has to beevaluated in n states. If design is based on performability,and if k failure states are allowed, then for each possiblecon®guration, performance has to be evaluated for (k� 1)states, giving a total of n(k� 1) states. Furthermore,comparison between the networks requires evaluation oftheir performability, where computation depends on thesize of the network, the length of the observation period,the number of different rewards considered, and the maxi-mum rate L chosen, if the randomisation technique is used.Therefore, as the size of the network to be designedincreases, the bene®ts of using the performability-basedapproach are offset by the computational and memoryrequirements, but this can be dealt with to some degreeby the application of state-space reduction techniques tothe performability model.

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

6 Acknowledgments

The work reported on in this paper was supported by theBNDO under grant number N00014-96-1-1270, and mana-ged by the United States Of®ce of Naval Research.

7 References

1 MEYER, J.F.: `On evaluating the performability of degradable comput-ing systems', IEEE Trans. Comput., 1980, 29, (8), pp. 720±731

2 CONSTANTINESCU, C., and SANDOVICI, C.: `Performabilityevaluation of a gracefully degrading microcomputer', Comput. Ind.,1993, 22, (2), pp. 181±186

3 LOPEZ-BENITEZ, N., and TRIVEDI, K.S.: `Multiprocessor perform-ability analysis', IEEE Trans. Reliab., 1993, 42, (4), pp. 579±587

4 TAI, A.T., MEYER, J.F., and AVIZIENIS, A.: `Performability enhance-ment of fault-tolerant software', IEEE Trans. Reliab., 1993, 42, (2), pp.227±237

5 TAI, A.T., and MEYER, J.F.: `Performability management in distrib-uted database systems: an adaptive concurrent control protocol'.Proceedings of the Fourth International Workshop on Modelling,Analysis and Simulation of Computer and Telecommunication Systems,MASCOTS' 96, 1996, pp. 212±216

6 MEYER, J.F.: `Performability: a retrospective and some pointers to thefuture', Performance Evaluation 14' 1992, Elsevier Science PublishersB. V., pp. 139±156

7 ISLAM, S.M.R., and AMMAR, H.H.: Numerical solutions of MarkovReward Models using Laguerre functions'. First International Work-shop on Numerical solutions of Markov chains, Stewart, W.J. (Ed.)1990, Marcel Dekker Inc., pp. 645±648

IEE Proc.-Comput. Digit. Tech, Vol. 147, No. 5, September 2000

8 DONATIELLO, L., and GRASSI, V.: `On evaluating the cumulativeperformance distribution of fault-tolerant computer systems', IEEETrans. Comput., 1991, 40, (11), pp. 1301±1307

9 PATTIPATI, K.R., LI, Y., and BLOM, H.A.P.: `A uni®ed framework forthe performability evaluation of fault-tolerant computer systems', IEEETrans. Comput., 1993, 42, (3), pp. 312±326

10 NABLI, H., and SERICOLA, B.: `Performability analysis: a newalgorithm', IEEE Trans. Comput., 1996, 45, (4), pp. 491±494

11 CIARDO, G., MUPPALA, J., and TRIVEDI, K.S.: `SPNP: StochasticPetri net package'. Proc. PNPM'89, IEEE Computer Society Press,1989, pp. 142±151

12 COUVILLION, J.A., FREIRE, R., JOHNSON, R., OBAL, W.D.II,QURESHI, A., RAI, M., SANDERS, W.H., and TVEDT, J.E.:`Performability modelling with UltraSAN', IEEE Softw., 1991, 69±80

13 HAVERKORT, B.R., and NIEMEGEERS, I.G.: `Performability model-ling tools and techniques', Performance Evaluation 25' 1996, ElsevierScience Publishers B. V., pp. 17±40

14 DE SOUZA E SILVA, E., and GAIL, H.R.: `Calculating availabilityand performability measures of repairable computer system usingrandomization', J. Assoc. Comput. Mach., 1989, 36, (1), pp. 171±193

15 DE SOUZA E SILVA, E., and GAIL, H.R.: `Performability analysis ofcomputer systems: from model speci®cation to solution', PerformanceEvaluation 14' 1992, Elsevier Science Publishers B. V., pp. 157±196

16 BERTSEKAS, D., and GALLAGHER, R.: `Data Networks' (Prentice-Hall International Inc., 1987)

17 KLEINROCK, L.: `Queueing systems, vol. II: Computer applications'(John Wiley & sons, 1976)

18 KANG, C.G., and TAN, H.H.: `Fault-tolerant capacity and ¯ow assign-ment in packet switched networks'. Proceedings of IEEE MILCOM 92,San Diego 1992, pp. 165±171

19 IYER, B.R., DONATIELLO, L., and HEIDELBERGER, P.: `Analysisof performability for stochastic models of fault-tolerant systems', IEEETrans. Comput., 1986, 35, (10), pp. 902±907

363