by Armin Ghayoori - University of Toronto T-Space...Acknowledgements This thesis is the result of years of research at University of Toronto. During this period, I had the opportunity

Robust Network Design and Robustness Factor

by

Armin Ghayoori

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

c© Copyright 2013 by Armin Ghayoori

Abstract

Robust Network Design and Robustness Factor

Armin Ghayoori

Doctor of Philosophy

Graduate Department of Electrical and Computer Engineering

University of Toronto

2013

This thesis presents a robust design approach for communication networks that includes capacita-

tion and routing strategy design. Robustness is a mandatory property of core networks to respond to

perturbations in network parameters for performance stability and reliable service delivery to different

customers. Our proposed design approach is applicable to any system that is modelled by a weighted

directed graph. To quantify the robustness measure, we borrow and develop different concepts and

properties from Markov chain literatures as well as graph theory survivability discussions. We propose a

new robustness definition for Markov chains. The new Markov chain robustness definition has different

applications in network design. We define robustness as the sensitivity of the mean first passage time

between any two states of the Markov chain. This sensitivity is measured based on the variations of the

mean first passage times to the perturbations in transition probabilities. We show that this definition

of robustness is related to the sensitivity of the betweenness of a node/state in a Markov chain, which is

defined as the number of visits by a random walker that wanders around in the Markov chain according

to its transition probabilities. It was shown that for an infinite walk, the proportion of number of visits

to the total number of hops converges to the stationary probabilities. Therefore, an analogy can be

seen between the well-known condition number and the robustness factor in a Markov chain. We also

extend the robustness factor definition to network design problems. We show that the robustness factor

can be used as a design criterion. The newly defined robustness factor is a function of the network

capacitation, routing and external input and output traffic. We also emphasize the importance of the

newly discovered graph theoretic metric, called the Kemeny constant, in network design problems. We

discuss that a function of the Kemeny constant and robustness factor limits the sensitivity of network

performance parameters to the perturbations in the network.

ii

Acknowledgements

This thesis is the result of years of research at University of Toronto. During this period, I had the

opportunity to work with great people. Their support and their contribution had a great impact on my

work.

I would like to express my sincere gratitude to my advisor Prof. Leon-Garcia for the continuous sup-

port of my Ph.D study and research, for his patience, motivation, enthusiasm, and immense knowledge.

His guidance helped me in all the time of research and writing of this thesis.

I thank my fellow labmates at University of Toronto: Leila Shayanpour, Hadi Bannazadeh, Ali

Tizghadam, Ali Shariat, Weiwei Li, Tang Tang, Alireza Bigdeli, Hazem Soliman, Houman Rastegarfar,

Hesam Rahimi, Agop Koulakezian, Nadeem Abji and Aakash Nigam for their valuable discussions and

for their help in this thesis.

I would like to thank my committee members, professor B. Liang, professor E. Sousa, professor B.

Li and professor Ilow for their valuable feedbacks on my research work.

I also dedicate this thesis to the memory of my father who I wish was here to share this moment

with me. I would like to pay my greatest gratitude to my mother and my brothers for all their supports

and encouragements.

iii

Contents

1 Introduction 1

1.1 Challenges in Future Networks Management Systems . . . . . . . . . . . . . . . . . . . . . 2

1.2 Research Objectives and Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Literature Review 6

2.1 Future Network Architecture Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Control and Management in Next Generation Networks . . . . . . . . . . . . . . . . . . . 7

2.2.1 Functional Entities in Control and Management Plane . . . . . . . . . . . . . . . . 8

2.2.2 Maintaining Service Continuity in Next Generation Networks . . . . . . . . . . . . 9

2.3 Network Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Networks and Weighted Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Criticality and Random Walk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Robust network design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 Robust Network Design vs. Weights Perturbations . . . . . . . . . . . . . . . . . . 16

2.4.2 Robust Network Design vs. Perturbations in External Traffic . . . . . . . . . . . . 17

3 Robustness and Perturbation Effects in Markov Chains 20

3.1 Markov Chains and Perturbation Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Fundamental Matrix and Kemeny Constant . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Betweenness and a New Definition of Robustness . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Robustness Factor in Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Perturbation Effect and Betweenness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6 Concluding Remarks and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Random Walk Models and Network Design 38

4.1 Random Walk on a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 Robustness Factor and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Other Robustness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.1 Robustness and connectivity in Graphs . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.2 Robustness and Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

iv

5 Robust Network Performance Evaluation 46

5.1 Performance Metrics and Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 Robustness Factor and Network Performance Parameters . . . . . . . . . . . . . . . . . . 50

5.2.1 Robustness Factor Impact on Average Delay and Throughput . . . . . . . . . . . . 51

5.2.2 Power Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.3 Gravity Model and Robustness Factor . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 Network Design Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3.1 Parking-Lot Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3.2 Rocketfuel Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.3 Optimal Routing and Capacity Assignment . . . . . . . . . . . . . . . . . . . . . . 58

5.3.4 Robust Routing and Minimum Delay Routing . . . . . . . . . . . . . . . . . . . . . 61

5.3.5 Resource Assignment with Shortest Path Routing . . . . . . . . . . . . . . . . . . 64

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Resource Allocation Based on the Gravitation Law 70

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.2 Gravitation Model and Resource Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.3 Multinomial Coefficients and Gravity Model . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.4 Gravitation Model with Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.4.1 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.5 Optimizing the Number of Active Access Points . . . . . . . . . . . . . . . . . . . . . . . . 86

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7 Conclusions and Future Works 95

7.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.1.1 Robustness in Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.1.2 Graph Connectivity and Robustness Factor . . . . . . . . . . . . . . . . . . . . . . 96

7.1.3 Robust Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.1.4 Robust Network Design in Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . 97

7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.2.1 Distributed Robust Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

A 103

A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103




A.5 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

B 109

B.1 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

B.2 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

B.3 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

B.4 Proof of Proposition 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

v

C 116

C.1 Proof of Proposition 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

C.2 Proof of Proposition 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116



C.5 Proof of Theorem 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

D 120

D.1 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

D.2 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

D.3 Proof of Theorem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

D.4 Proof of Theorem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Bibliography 124

vi

List of Tables

2.1 Link Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 Graph and Digraph Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2 Optimization Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3 Optimal Weights for Uniform Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

vii

List of Figures

2.1 Network Control and Management Architecture . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Management Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Graph Modeling of a Capacitated Network . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 15-node/21-link Pacific Bell network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Scenario Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.6 Interval Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7 MIRA Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Markov Chain with Linear Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Kemeny Constant for Linear Architecture vs. size . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Ring Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 Kemeny Constant vs. size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5 Tree Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.6 Kemeny Constant for a Tree Markov Chain vs. size . . . . . . . . . . . . . . . . . . . . . . 33

3.7 9× 9 torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.8 Kemeny Constant For a Torus Markov Chain versus size . . . . . . . . . . . . . . . . . . . 34

3.9 Full Mesh Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.10 Kemeny Constant for a Full Mesh Markov Chain vs. size . . . . . . . . . . . . . . . . . . 35

3.11 Kemeny Constant versus average number of connection per node . . . . . . . . . . . . . . 37

4.1 Network Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1 Directed Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 Network Traffic Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3 Parking-lot Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4 Rocketfuel Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5 Average Delay vs. changes in external input traffic . . . . . . . . . . . . . . . . . . . . . . 59

5.6 Average Delay sensitivity vs. perturbation in external input traffic . . . . . . . . . . . . . 60

5.7 Average Delay sensitivity vs. perturbation in routing R . . . . . . . . . . . . . . . . . . . 60

5.8 Average Traffic sensitivity vs. perturbation in routing R . . . . . . . . . . . . . . . . . . . 61

5.9 Average delay Sensitivity of min√

9K2 + 4K ′2∑ijrijpijχi and Minimum Delay Routing vs.

changes in external input traffic duistributions . . . . . . . . . . . . . . . . . . . . . . . . 62

5.10 Average delay Sensitivity of min√

9K2 + 4K ′2∑ijrijpijχi and Minimum Delay Routing vs.

perturbations in routing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

viii

5.11 Total Traffic Sensitivity of min√

9K2 + 4K ′2∑ijrijpijχi and Minimum Delay Routing

vs.changes in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.12 Average Delay Sensitivity of min√

9K2 + 4K ′2∑i χi and minimum delay routing with

respect to the changes in external input traffic distrbution . . . . . . . . . . . . . . . . . . 63


9K2 + 4K ′2∑i χi and minimum delay routing with

respect to the perturbations in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63


9K2 + 4K ′2∑iWiχi and minimum delay routing with

respect to the changes in external input traffic distribution . . . . . . . . . . . . . . . . . 63


9K2 + 4K ′2∑iWiχi and minimum delay routing with

respect to the perturbations in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.16 Average delay Sensitivity of maximum entropy routing and Minimum U1 vs. perturbations

in external input traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.17 Average delay of maximum entropy routing and Minimum U1 vs. perturbations in external

input traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.18 Average delay Sensitivity of maximum entropy routing and Minimum U1 vs. perturbations

in routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.19 Total traffic sensitivity of maximum entropy routing and Minimum U1 vs. perturbations

in routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.20 Optimum resource assignment (minTavg) for CPSF routing . . . . . . . . . . . . . . . . . 67

5.21 Optimum resource assignment (minU1) for CPSF routing . . . . . . . . . . . . . . . . . . 67

5.22 Blocking probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.23 Blocking probability after Node 4 failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.1 Newton’s law of universal gravitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.2 Coulomb’s law of universal gravitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.3 Two Access Points and 3 Demand nodes, pi,n > 0, 1 ≤ i ≤ 2 and 1 ≤ n ≤ 3 . . . . . . . . 766.4 Two Access Points and 1 Demand nodes, 1 ≤ i ≤ 2 and n = 1 . . . . . . . . . . . . . . . 836.5 Demand Distribution λT = 11450 RU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.6 Access Point Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.7 Allocated Resource Units from AP1, C1 = 6000 RU . . . . . . . . . . . . . . . . . . . . . . 90




6.11 Demand Distribution λT = 3808 RU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92



6.14 Allocated Resource Units from AP3 and AP4, C3 = C4 = 1000 RU . . . . . . . . . . . . . 93

7.1 Hierarchical Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.2 Minimum Power Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.3 Transportation Network [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

ix

Chapter 1

Introduction

The complexity and size of today’s network architectures are increasing continuously. That includes the

interconnection of heterogeneous networks with different technologies. In addition, the need for offering

end-to end communications for services such as multimedia applications requires mobility support and

hand-off support across different network technologies.

Applications’ demand for resources magnifies the crucial need to monitor and to manage the func-

tionalities and configurations of the network across a heterogeneous environment. This maintenance of

functionality is achieved by using expert human resources to monitor the performance of the system and

to change the configurations due to different perturbations that may happen in the system.

Future networks are driven by innovations in services and network capabilities and should allow an

evolutionary path from existing networks to a unified network that can support different applications with

different requirements. In addition, applications such as high-bandwidth radio, cloud computing and

peer-to-peer applications with increasing number of users necessitate an efficient utilization of resources

to provide adequate Quality of Service (QoS) and Quality of Experience (QoE). The infrastructure

design of future network architecture has an important role in end-to-end application support among

a wide range of different networks, from mobile networks to fixed networks and different services from

text messaging to high bandwidth multimedia applications and other applications envisaged for Future

Networks.

In addition, newly emerged applications are characterized to have variable traffic that is difficult

to predict. Therefore, designing a static system without any robust design and adaptation using an

estimation of the traffic is not efficient. Therefore, bandwidth guarantees and QoS commitments cannot

be provided for the customers. To overcome the high variability of the traffic, service providers utilize

manual routing adaptation and high over-provisioning.

Hence, due to the explosion of the network size and network heterogeneity, an efficient network man-

agement system with minimum human intervention becomes a challenging research in network studies.

Different control and management systems are proposed for future network architectures and all of these

architectures agree with the need for an autonomous control and management system that can manage

its own adaptation to network changes considering performance, fault, and security concerns without

human administrating intervention.

The main objective of this thesis is to develop a design approach for communication networks that

are supposed to be robust to the perturbations in the system. The perturbations include the changes in

1

Chapter 1. Introduction 2

routing distribution, capacitation and external traffic patterns in the network.

In the following section, we provide a brief discussion of challenges in future network architectures.

1.1 Challenges in Future Networks Management Systems

Future network architectures are migrating to one consolidated IP network (Next Generation Network)

that supports a variety of services from delay sensitive to high bandwidth applications. Service providers

are currently moving towards exploiting a single IP packet network that covers different networks and

services.

Today’s IP networks are suffering from cascading meltdown effect of small local failures. In addition,

the design and implementation of control and management systems in the IP networks are difficult. This

complexity comes from the direct interaction of control and management planes over the heterogeneous

resource pool. In other words, the control and management plane functionalities should deal with a wide

variety of network protocols and technology-dependent network resources.

Different approaches are considered in designing future network control and management architec-

tures. By the growth of IP networks and the need for more advanced quality of service handling than the

current best-effort strategy, many changes are made to control and management planes to adapt to new

requirements. These changes effectively increase complexity and fragility of data networks. Continuing

to use temporary remedies for data networks can cause more problems, which exacerbate the network

control and management difficulties. The new idea underlying management systems is inspired by the

theory of evolution [115].

An autonomic self-organizing system consists of self-managing components. The Self-management

concept can be divided into four different categories: Self-Configuring, Self-Healing, Self-Optimizing for

current state of the system and self-optimizing for future turbulences [80].

The ability of the system to configure itself due to changes is called self-configuring. There should

be minimum or no intervention to deploy a policy or to adapt to the changes in the IT environment.

A system with self-healing capability should be able to detect failures in components such as software

or hardware and to take proper action to resolve the problem. The self-managing system should also

be capable of allocating resources to users with different requirements in order to satisfy service level

requirements and conserve system resources for future changes in the environment such as admission of

new users.

The management system attempts to achieve an optimum steady state that provides a required level

of service. Therefore, the optimization problem considers short term and long term behaviours of the

network illuminating the need for a control system to monitor the system in different time scales and

make appropriate decisions through different control loops.

The concept of an autonomic system and its application in different areas is investigated in different

projects [80], [93], [43], [86]. The pioneering IBM autonomic computing proposal [80] starts a new

wave of autonomics with the concentration of computing resources. Autonomia [93] provides a tool

for application developers to specify management and control schemes for maintaining wide range of

resource requirements. The application developers specify the requirements and maintenance scheme

in the Application Management Editor (AME) and the Autonomic Middle-ware Service (AMS) builds

the execution environment. Automate [43] provides a framework for autonomic grid applications. It

separates policy from the grid infrastructure to organize the mechanisms corresponding to heterogeneity


of resources and applications.

The managed resources can be from a single source or a combination of sources which are monitored

by sensors throughout the network. In [84], an outline of generic autonomic service architecture is

proposed to provide a cost-effective approach for servicing, managing and maintaining various numbers

of incoming applications and services for both computing resources and network resources. In [84] and

[88] with more complete discussions in [41], everything from the application services such as VoIP and

Gaming to other underlying services such as IP packet transport is called a service. Using this definition,

services are divided into basic services, which cannot be divided to other services, and composite services

that are composed of several basic services.

New views in future networks architecture focus on redesigning the control and management space to

cope with future requirements. Different approaches are followed in different research activities around

the world. One of the future network architecture goals is to reduce the complexity of management

systems by separating the decision plane from protocols that govern the interaction between network

elements [115], [41],[83], [31]. Three key principles are proposed in the design considerations of control

and management plane: network level objectives, network-wide views, and direct control [115]. Consid-

ering these critical principles, the control functionalities are divided into 4 different planes: Decision,

Dissemination, Discovery, and Data. The underlying network elements (bridges, routers, etc.) forward

the packets under the control of decision plane. The main assumption is to have direct access and control

over the whole network resources and network elements.

In future network literature, every change is considered as mobility, therefore, to provide the seamless

movement capability for moving networks, the following properties should hold [79]:

• It should have the ability to discover itself and its surrounding environment.

• It should be able to dynamically configure itself under varying and unpredictable changes.

• It should be able to dynamically extract rules and methods of interaction with different neighborswith various characteristics.

• It should be robust to unpredictable events which may cause malfunctioning of some of its controlelements and recover their basic functionalities.

• It should be able to provide safety and security for service applications.

As mentioned earlier, decisions in control and management space are based on three important

principles: Network-level Objectives, Network Wide Views, and Direct Control.

• Network-level Objectives: The implication of performance, reliability and policy should be inde-pendent of the underlying network elements for the entire network. As an example, consider that

network provider A wants to restrict the network provider B users access to some services offered

by some of the nodes in network A. Implementing this policy only in some of the edge routers may

cause the users in network B to be able to violate this restrictions in some other way (e.g. through

some newly added routers which are not properly configured).

• Network Wide Views: The management and control plane should have access to the current stateof data plane, such as network elements and resource limitations. These states are as follows [90]:

– Dynamic State: States which are dynamically accumulated and processed.


– Configuration State: These states are determined by administration unit and are sent in the

form of configuration commands. The link weights, which are important factors in execution

of routing protocols, are examples of configuration states.

This information plays an important role in controlling and managing any data network. Providing

such updated information can be the duty of different nodes in the network, which monitor the

existing communication links and network elements and discover the new added elements and

links that can be installed during network developments. These resources and elements can even

be added and deleted dynamically, which is the case in peer-to-peer networks.

• Direct Control: Direct control reflects the fact that only control and management space functionalentities have the responsibility of configuring the data plane. For example, routing tables in

routers are configured by distributed algorithms. It is very difficult to provide different control and

management functionalities such as traffic engineering on these distributed algorithms. Movement

of decision making responsibility from data plane reduces the complexity for implementing different

requirements of future networks.

In this section, we discussed the design criteria of future network architectures. We illustrated that

the autonomic resource management is an inevitable part of any future network systems. An autonomic

system is capable of adapting itself to the perturbations that can happen in the system environment. An

autonomic system contains several control loops. These control loops monitor the system performance

parameters and take actions when it is necessary. These actions vary from minor modifications to the

whole system reconfiguration based on the changes in the system environment.

The main purpose of this thesis is to design a system that has a low sensitivity to perturbations.

The low sensitivity of performance parameters to the perturbations is a very important property of a

system that is constantly facing perturbations in its parameters. We interpret the insensitivity of the

system to perturbations as robustness. In a robust system, there is less need to reconfigure the system

to adapt to the changes which may occur during system operation.

1.2 Research Objectives and Thesis Contributions

This thesis makes an attempt to provide a solution to a robust design problem in autonomic systems. To

provide a solution, we first investigate the robustness issue in other related literatures such as Markov

chains and graph theory. We propose a new robustness measure for the Markov chain that provides a

qualitative apparatus for robust Markov chain design problems. We extend the solution to the network

design problems and evaluate the performance using several examples.

We can summarize the contributions of this thesis as follows,

• In chapter 3, we provide a new definition of Markov chain robustness as the sensitivity of thetravel time in a Markov chain. This sensitivity definition is directly related to the betweenness

sensitivity of states, when a random walker starts wandering around and visiting different states

multiple times. The general definition of robustness in Markov chains allows the random walker to

continue walking for infinite time and the sensitivity of the Markov chain for each state is defined

as the sensitivity of proportion of total number of visits to that state to total number of hops the

random walker has taken. We change this definition by specifying the destination of the random


walker and calculate the sensitivity of total number of visits to each state when the starting and

stopping states are known.

• We discuss an important factor in Markov chain and Graph theories that is called the Kemenyconstant. We show that robustness of a Markov chain / Weighted graph can be discussed using the

Kemeny constant. We also discuss the relation between the Kemeny constant and the Laplacian

of the graph. We show that for an unweighted d-graph and for weighted graph with equal node

weight for all nodes, the Kemeny constant and summation of the inverse of Laplacian Eigenvalues

are directly proportional. In addition, we show the relation between the connectivity measures of

the graph such as Iso-perimetric constant of the graph and algebraic connectivity with the Kemeny

constant.

• A network design problem can be divided into three different parts. The first part is networkresource distribution. In this part, the network designer distributes resources in a network that

involves topology design of the network as well as network capacitation. The second part of the

network design is to provide a routing strategy between source-destination pairs in a network. In

designing a routing strategy, different considerations are taken into account such as network perfor-

mance optimization as well as capacity constraints. In the third part of a network design problem,

the external demand can be shaped in such a way that the network resources are utilized efficiently

and network performance requirements are satisfied. These different network design problems are

widely discussed in different literatures by considering optimization of different network design

criteria.

In chapters 4 and 5, we propose a new set of design approaches in data networks. We define a

robustness factor for each of the nodes in a network. This robustness factor is a function of network

capacity assignment and routing distribution as well as external input-output traffic pattern. We

show that there are different choices as network design criteria based on the node robustness

factors. We discuss different objective functions and evaluate the performance of the network

designed based on these different objectives. We show how the robustness factors affect the total

average delay and throughput of the network. We show that optimizing the weighted sum of the

robustness factors provides an optimum trade-off between delay and maximum throughput of the

network. We also show the similarity between the weighted sum of robustness factors and the

Kleinrock’s power metric. We prove that the new robustness factor limits the sensitivity of the

average end-to-end delay and traffic distribution in the network.

• In chapter 6, we decompose the external traffic into two parts. The first part is the total trafficentering the node from external source. The other part is a square matrix that describes the

destination preference of traffic entering at each node. We show that when the preference matrix

has equal rows, the model is called the gravity model that is a reasonable estimation of external

traffic distribution in backbone network.

• We also propose a convex Entropy-based robust design that can be calculated using the convexoptimization tools for any distribution of sources and destinations. We develop the optimiza-

tion scheme and prove the robustness property for bipartite graphs which are suitable choices for

modeling last-mile wireless communication networks.

Chapter 2

Literature Review

One of the main challenges of future network architectures is to design an autonomic resource man-

agement system to provide a proper framework for service delivery. To address this challenge, a new

network architecture should be established. In this chapter, we review the next generation network

architecture design. We discuss general considerations for future networks and provide a framework that

is relatively common between different proposed approaches. We show that future management systems

should be autonomic and should be able to handle the perturbations in the system, such as failures in

different parts of network or changes in traffic patterns. To cope with perturbations, control loops are

considered. We discuss how control loops are implemented hierarchically to adapt to different scales of

change. In our proposed approach, we design networks in such a way that the network performance

parameters have low sensitivity to network changes. This insensitivity makes the control loop execution

less frequent and the system more stable.

2.1 Future Network Architecture Design Goals

A network architecture is a set of abstract design guidelines. It helps the design process to meet the

requirements by following design principles. It is located between physical resources and users’ applica-

tions. It should be able to adapt to variations in users’ requests and evolution of network technologies,

configurations and topologies.

To make this architecture stable, we should design a stable network that does not need reconfiguration

frequently due to changes in requests and/or physical resources.

The design requirements of future network architectures should be based on the following consider-

ations [34]:

• Large Switching and Communication Link Capacity : To evaluate design performance of nextgeneration networks, the increasing traffic trend should be considered. It is shown in [34] that the

traffic level in a typical exchange point doubles every 18 months.

• Scalability: Future networks will be composed of several different networks that can be extremelydiverse. In addition to human users, the number of machine to machine communications is expected

to increase considerably, which necessitates a scalable framework for future network architectures.

6

Chapter 2. Literature Review 7

• Openness: The network should be able to maintain an appropriate level of competition. The key tohave a competitive network is to standardize the interfaces and technologies. In addition, different

mechanisms should be developed to enable users to provide innovative services.

• Robustness: A network should not be affected by perturbations and failures that can happen indifferent parts of the network. Network architecture should provide an autonomic approach to deal

with the changes in network topology, traffic patterns, policies and security concerns to maintain

service continuity for users. These adaptation procedures should be transparent to the applications

which are using the network resources.

• Safety: In addition to establishing a secure connection between a source and a destination, networksecurity should be concerned with the identity of users. Security issues have an utmost importance

in banking, credit systems and certifications. It should also prove to be safe and robust during a

catastrophe.

• Diversity: The performance of a communication network should be independent of specific ap-plications and usage trends. It should be able to provide resources for diverse communication

requirements from a computer-centric traffic to telephony applications.

• Ubiquity: The network should comprehensively monitor user activities and network performance.However, in human activity monitoring, privacy concerns should be considered and a balanced

trade-off should be provided between privacy protection and transparency in network activities.

• Integration and Simplification: Network service providers should be able to provide services to awide range of applications by integrating the common parts and collecting various functions. These

integrations and simplifications increase network reliability and facilitate extensibility of network

services.

• Electric Power Conservation: The number of routers and data centres that should be used in anetwork is increasing rapidly. Each network device consumes kilowatts of power, which necessitate

preparation for several megawatt power for a communication network. In addition, the concerns

for green and sustainable energy consumption should also be considered for future network design.

• Extendibility: Networks should be easily extend-able using universal communications to overcomelanguage and physical obstacles for self-reformation.

2.2 Control and Management in Next Generation Networks

Future control and management systems should be capable of integrating different networks to provide a

homogeneous network for users [85]. This transparency of heterogeneity in network structure necessitates

the design of a control and management plane which can make an agreement for cooperation on demand

without a need for offline negotiation and reconfiguration.

This approach should have the capability to be generalized for distinct network architectures, such as

vehicular networks and sensor networks. In addition, end-users in future networks are not just a single

node, they can own a network of devices, in home, office and around the body.


Figure 2.1: Network Control and Management Architecture

2.2.1 Functional Entities in Control and Management Plane

In future network architectures, network is a collection of nodes and network elements which share

a common control space. The control space has proper functionalities to communicate with other

control spaces. This architecture consists of different modular components interacting through interfaces.

These interfaces make components independent of specific implementation and technology. Different

components of the network architecture are shown in fig. 2.1. As it is shown in fig. 2.1, control and

management functionalities are located in network control plane.

The control and management functions have access to network elements and nodes through Resource

Interfaces. The Resource Interface (RI) provides a homogeneous view of access technologies. It virtualizes

the resources provided by different technologies in a unified form.

Different functional entities of the Control Space (CS) between two networks communicate through

a Network Interface (NI). NI plays an important role in composition of networks. Networks can also

trade resources through on the fly compositions. Having network composition capability is an important

feature of future network architectures facilitating the support of seamless handover for mobile users.

Composition is achieved when Composition Agreement (CA) and agreement lifetime are created between

two networks through the negotiation between functional entities in CS.


Services in the application layer profit from network functionalities through the Service Interface

(SI). Using service interface, applications only handle their end-to-end communications with their peers

without a need to increase their complexity of handling transport and network functionalities.

The control functionalities of the CS are grouped into two parts each of which having different

functional entities as [84]:

• Resource management functionalities that support and manage user plane connectivity:

– Bearer and Overlay Management (BOM): BOM offers end-to-end services to applications

through SI. In order to provide QoS for the users, networking and transport protocols utilize

the so-called Service-aware Transport Overlays [51], [16] controlled by Bearer and Overlay

Management.

– Flow and Mobility Management (FMM): FMM ensures that end-to-end bearers are not af-

fected by connectivity changes and movement events in underlying connections.

– Access Management (AM): AM manages resources, configures and establishes flows. It also

monitors and discovers access links.

– Trigger and Context Management (TCM): TCM controls, broadcasts and collects context

information. Mobility related events and state changes are handled by Trigger and Context

Management FE [103].

• Administrative Domain Management Functionalities:

– Security Domain Management: These functions manage resource grouping in a common man-

agement and control plane based on security policies. This allows controllable and authorized

use of Control Space functionalities.

– Composition Control: Composition control functions administrate the composition feature of

networks. They govern negotiation and agreement realization between composed autonomous

CSs

– Compensation and INQA: Compensation and INQA include inter-network quality of service

agreement and service level agreement functionalities.

– Network Management: Network management functions configure and maintain policy databases.

The network management subsystem also collects user preferences, mobility triggers and net-

work status.

2.2.2 Maintaining Service Continuity in Next Generation Networks

Service maintenance is an important design factor in next generation networks. The main concerns are

heterogeneity of underlying networks and high variability of demand traffic.

Therefore, the control and management system is responsible for initiating, configuring, maintaining

and shutting down Service-aware Adaptive Transport. This adaptive transport has all the functionalities

required to provide end-to-end connection. The end-to-end connection in the user plane is established

using three categories of network devices: Routers, Processors and Caches. Routers forward the data

on the network according to the dynamically configured routing tables to enhance QoS by reducing

the risk of suboptimal routing in the overlay network [82]. Processors process incoming data (e.g.


virus scanning) and also provide control functions (e.g. SIP Proxy, Real Time Streaming Protocols

Functionalities). Cashing is done to store incoming data flows for different purposes such as generating

deliberate extra delays, e.g. to compensate jitters.

By interacting with different functional entities, such as maintaining network connectivity, service

authorization and metering and QoS monitoring for Inter-AN Communications, the management system

provides end users with an acceptable Quality of Experience [81].

Routing algorithms provide a service aware routing for QoS-sensitive applications [82]. In route se-

lection, both path and service components are considered to make decision. Routing algorithms consist

of three different subsections. The first subsection is topology which is created after connection setup

and can be modified during application session. Topology design is done based on application require-

ments, e.g. a tree topology is created for multi-cast applications. QoS collector accumulates QoS status

information through network sensors and local resource manager and abstracts them into a form which

could be used by routing algorithms. Route computing module computes the best route based on the

topology generated by the topology module and metrics collected by QoS feedback collectors.

In order to gather information about network state, sensors are used. Sensors are software compo-

nents distributed in heterogeneous networks. These sensors monitor the network resources and context

information. Sensors provide information and simplify integration of resource information. They are

broadly categorized into two types: Node Sensors (e.g. monitor memory and CPU usage of nodes) and

Network Sensors (e.g. BW sensors which monitor the bandwidth usage of links).

Different approaches can be followed for end-to-end bandwidth measurements: Passive and Active. In

the passive measurement, sensors only monitor the passing packets and compute the available bandwidth

as the minimum available bandwidth on each of the links along the routing path. But in the active

measurement, sensors actively send probing packets into the network. Active measurements provide

more efficient and more flexible estimation for wireless networks because of their fast changing topology

[78]. TOPP (Train of Packet Pair) [12], SLoPS (Self-Loading Periodic Stream) [11] and SLOT [5] are

some of the proposed active end-to-end bandwidth measurement schemes which have different estimation

accuracy and probing time. These techniques send streams of probing packets to estimate end-to-end

available bandwidth.

The information provided by sensors are processed by QoS collectors which monitor the QoS of every

flow in overlay networks. QoS collectors provide triggers and signaling for better resource utilization

and QoS preservation. The monitoring and management loops are shown in fig. 2.2. As it can be

seen, control and management functionalities can be divided into two parts. The first part is the local

management that makes short term decisions according to the feedback received from users and sensors

in the network. The other part of the management system is responsible for re-configuring the network

as a long term adaptation.

Similar structural approach is followed by the AKARI project. It proposes five different sub-

architectures that have different goals in the overall network architecture [2]. These five modules are

defined to provide a hierarchical architecture and control loops to react to small perturbations and big

changes in the network. Similar to the architecture discussed previously, AKARI provides interfaces for

the control plane to physical resources, applications and control planes of other autonomous systems.

AKARI introduces a multiple access mechanism that is called Packet Devision Multiple Access (PDMA).

PDMA is a communication scheme that shares all the bandwidth among the access points in the same

interference domain. It uses CSMA/CA for interference control within a cell and among interfering cells.


Figure 2.2: Management Loops

Therefore, bandwidth can be adaptively distributed in the network between different cells.

The SAVI network architecture design aims to provide flexible infrastructure that involves backbone

data centres, optical backhaul, smart edges and wireless access. It provides an effective, efficient, reliable

and adaptive supports for large scale distributed applications with wide range of requirements that

seems inevitable for future network scenarios. This architecture should be adaptive to incoming demand

characteristics and should be able to change characteristics and behaviours of the system to cope with the

perturbations that can happen in every autonomic system. To have reconfigurable, scalable and efficient

management system, SAVI proposes a control and management plane over the physical resources. A

similar approach is followed in other network architecture designs such as GINI and FIND in US [91]

and [26] EuroNGI in Europe [65].

In all of the discussed network architectures, control loops are considered for resource provisioning

in networks. This hierarchical architecture consists of multiple loops managing resources in different

levels based on the scale of perturbations in the network. The purpose of this thesis is to provide a

framework for robust network design in which the performance of the network is not affected by the

perturbations happening in resources, routing and traffic pattern and alleviate the need for frequent

network reconfiguration.

2.3 Network Modeling

In the past few decades, extensive research activities have been dedicated to study and to analyze the

behaviors and properties of complex systems, such as Sociology, Physics, Biology and Data networks. The

results of these research activities illuminate surprising similarities between different complex systems.

To explain these similarities, graph-theoretical approaches are used to demonstrate and to charac-

terize the property of complex systems. Many different complex systems are decomposed into several

different components with a variety of inter-component interactions. These systems that contain different

interacted components can be modeled by a mathematical object called a graph. A graph is a collection

of nodes and edges connecting the nodes. Nodes are considered to be the interacting components of the


Figure 2.3: Graph Modeling of a Capacitated Network

system and edges demonstrate the interaction between the nodes.

For example, in an ecological system, nodes can be considered as different species and links represent

the interaction of the species population.

The resulting graph can be directed or undirected. In directed graphs, the interaction of the different

components of the system is not symmetric. As an example of undirected graph, consider that we

modeled different people in a party by nodes in the graph and we also define the interaction between

the nodes as whether or not they shake hands. This system can be modeled by an undirected graph

in which the interactions of nodes (Hand shaking) are modeled by undirected links. However, if we

define a different interaction of nodes for the same system such as having specific knowledge about other

participants in the party, this interaction is not symmetric and the system cannot be modeled by an

undirected graph. The other example of undirected graph is resistance circuits in electrical networks.

However, if directional elements such as diodes are used in the network, the undirected graph model is

not a proper tool to model the network.

In the next section, we concentrate on data network models and decide how a weighted graph is used

to characterize and to model a capacitated network.

2.3.1 Networks and Weighted Graphs

In general, communication networks are modeled by weighted graphs. Network elements, such as routers

are represented as nodes and network links are represented as edges in the graph. The weight of a link

in the network is considered to be the capacity of that link.

Based on the weighted graph modeling, analysis of the network is simplified and the flow of the

packets can be tracked using graph representation of the network.

As an example consider the network in fig. 2.3.1. As we can see, we have 3 routers in the network

labled as A, B and C. We consider that the communication links are installed in the network based on

the table 2.1.


Table 2.1: Link CapacityLink CapacityA→ B 100 MbpsA→ C 80 MbpsB → C 60 MbpsB → A 100 MbpsC → B 200 MbpsC → A 50 Mbps

This network can be modeled as an undirected graph with the weight matrix W as shown in Eq.

(2.1).

W =

0 wAB wACwBA 0 wBCwCA wCB 0

(2.1)Elements of matrix W are assumed to be the capacity of the links connecting nodes in the network.

The weighted graph modeling of the data networks is not limited to wire-line networks. Wireless

networks are also modeled by directed weighted graphs. However, in constructing the network and de-

signing the topology, different constraints should be considered. One of the constraints is the interference

in the network. Since wireless channels are shared among different users, topology design should be done

in a way that limits the interference on other links.

Topology design in wireless networks has many different goals, such as reducing the energy con-

sumption, minimizing interference, increasing connectivity, communication efficiency and so on. Some

of these design aspects are contradictory. For example, to increase the connectivity of the network, we

may want to establish a full mesh network. However, increasing the number of active links is not an

optimum choice when considering the minimum power or minimum interference design goals.

In addition, power management algorithms should be employed in conjunction with topology design

algorithms to provide a power efficient and low interference topology.

To consider the interference effect of wireless links, [49] proposed the concept of conflict graph. The

conflict graph shows the interfering groups of links that cannot be active simultaneously.

It is shown that the conflict graph modeling has the capability to model different aspects of wireless

networks such as multiple radio interfaces and multiple radio channels per user.

Graph modeling has also been used in optical networks. One of the simplest methods of network

modeling in optical networks is to model the network by a weighted graph in which the weights of links

are set to be one without considering the availability of wavelengths. Routing in this network model

is done on hop distance based shortest path first routing (HD-SPF). In another approach, links are

weighted based on the total number of wavelengths and link length. This modeling approach is called

hybrid weighting and routing is accomplished based on hybrid weighted shortest path first algorithm

(HW-SPF) [114]. Fig 2.4 shows the 15-node Pacific Bell optical network.

However, in these network modelings, there is always an uncertainty about the network parameters.

This uncertainty results in an unexpected network performance. Therefore, in any network design

problem, we should also consider the sensitivity of the network performance to the changes in design


Figure 2.4: 15-node/21-link Pacific Bell network

parameters. In the next section, we discuss different approaches to the network robust design problem.

2.3.2 Criticality and Random Walk Analysis

In [108], a new approach in traffic engineering is introduced. This approach is based on the betweenness

concept borrowed from graph theory. Based on the concept of betweenness, the criticality of path and

link are defined. The criticality concept is a measure of the centrality of a link and shows how critical a

link is for connecting different flows in a network.

The main core of the criticality based approach is to determine critical links in the network and

to try to avoid routing over those links. In addition, the criticality method can be used as a network

design criterion to distribute resources in a network. Using criticality based approach, more bandwidth

is assigned to critical links to avoid congestion in the network.

The simulation results show the superiority of the proposed designing approach over other strategies

such as MIRA and constrained shortest path. It shows that the criticality based design gives us a better

blocking probability compared to other well-known strategies such as MIRA and shortest path.

To analyze robustness property of the criticality based design, [107] considers random walk analysis

based on random walk on the undirected weighted graph of resources. It shows that for uniform external

traffic, the criticality is a global metric in the network that can be considered as an optimization criterion.

Minimum criticality network has the minimum sensitivity to the perturbations in resource distributions.

The applications of random walk routing are discussed in different networks such as wireless sensor

networks. In wireless sensor networks, random walk model is gaining popularity because of its simplicity,

low overhead and inherent robustness [69], [105]. In wireless sensor networks, a large number of small

nodes are used. These networks are subject to structural changes due to channel fluctuations, node

failures and other factors. Random walk algorithms can provide a robust routing strategy at the expense

of QoS support in wireless sensor networks

Random walk routing analysis is a common model to analyze different aspects of communication

networks. In [15], random walk model is used to propose an approach for topology inference. This paper

goes further and shows that the proposed approach also works for general routing strategies.


In this thesis, we propose a new robustness factor and analyze its property using an open Jackson

network.

2.4 Robust network design

There is a transition from best effort networks to multi-service networks that provide service for wide

range of applications with different Quality of Service requirements. The rapid development of high-

speed links and switches enables the service providers to consider different over-provisioning strategies

to maintain service continuity for the users.

The problem in routing strategy design is to avoid bottlenecks in the network. It is shown in [48]

that the bottleneck can happen in both inter-domain and intra-domain topologies illuminating the need

for both efficient capacitation and routing in data networks.

New challenges also arise due to QoS requirements of network applications, which include and are

not limited to bandwidth, delay and jitter requirements. In addition, node/link failures and other

perturbations in the network can occur frequently. Therefore, minimization of the perturbation effect

on the network performance should also be considered.

In this thesis, robustness is studied by considering perturbations in different parameters of the net-

work. The perturbations can happen in routing distribution, capacitation and the external input-output

traffic patterns. In addition, there is always uncertainty about the design parameters in the design

process of every real-world system. To overcome this uncertainty problem, different approaches are

followed.

In this section, we address the network design problem. We aim to design a network in such a way

that its performance parameters become robust to network perturbations. The network perturbations

include changes in capacity distribution and in routing tables as well as changes in traffic patterns.

Classical network design optimization problems include minimizing network performance parameters

such as average delay and throughput [60]. Another design aspect that can be considered is the stability

of the operating point. In other words, if changes happen in network settings, stability describes how

far the operating point deviates from the optimum point. This perturbation can occur with respect to

changes in capacity of links, demand traffic variations and routing. These variations can affect flow of

traffic (the links traffic load) and/or the total average delay as defined in [71], [60].

The robustness definition is tied to system sensitivity [27]. A system is called robust if the variations

in system environment result in minimum variations in the system performance parameters. Therefore,

measuring and minimizing sensitivity of a system is one of the main concerns in robust system design

literatures [107], [27]. However, considering robustness may not always result in the best performance

system. Thus, trade-off always exists among the robustness, performance and cost. Due to the scale

of current networks, many resource management schemes have been developed aiming for robust design

[107], [13], [92], [89].

System robustness is defined as the system resilience to perturbation occurred during the system

operation. Robustness can also be defined as the capability of the system to adapt to new conditions

after turbulences happen in the system regarding the input-output and/or the system parameters. The

authors in [111] pulled together the robustness literatures in which robustness is called adaptation,

coping and resilience. These expressions have similar definitions, however, in different disciplines possess

distinct perspectives. In the next section, we discuss the robust design strategies in the network.


2.4.1 Robust Network Design vs. Weights Perturbations

Routing is a critical operation in networks. Computing an optimal route between source-destination

pairs is one of the main concerns in different networks from transportation to data networks. In data

networks, different routing strategies such as shortest path and multi-path routing are developed. These

different routing strategies have several benefits and impairments which should be addressed in network

design problems. For example, the shortest path routing is the most common routing in the network

that minimizes total traffic in the network. The classical algorithm for the shortest path routing strategy

is Dijkstra’s algorithm that iteratively calculates all the shortest paths in the network. The calculation

of shortest path in a network has been one of the most intensely studied problems in communication

networks. However, dynamic routings that consider alternative paths rather than shortest path reduce

the congestion probabilities. In addition, in wireless networks with variable link qualities, the shortest

path is not recommended. The other routing strategy is to balance the load distribution and to mini-

mize total traffic in the network. Load balancing strategies improve network service capability by not

overloading central links and nodes in the network.

In [112], routing strategies are classified as follows,

• Intra-domain and inter-domain network optimization.

• Circuit switch based routing (e.g. multi-protocol label switching) and IP-based routing.

• Offline and online routing based on the time-scale of traffic estimation and capacitation validity.

• Source-Destination type: Single source-Single Destination, Single Source-Multiple Destinations.and Multiple sources -Multiple Destinations.

The efficiency of any routing strategy dependends on availability of information of network capacita-

tion, topology and traffic matrix (TM). The traffic matrix demonstrates end-to-end traffic between any

source-destination pair in a network. Therefore, the traffic matrix estimations play an important role in

network management. There are two sources of information for traffic estimators. The first source is the

service level agreements between user and service provider and the second source is network monitor-

ing system of the network which provides an aggregated information of total amount of traffic passing

through each node and link. However, due to the network and traffic dynamics in the form of perturba-

tions in traffic distribution and network resources, adaptive and robust managing strategies should be

applied in networking systems.

Uncertainty about the system parameters appears in different analysis. The complexity lies in the

number of parameters that are allowed to vary. In the context of routing, uncertainty can happen consid-

ering link costs and traffic demands in a network. One of the famous routing strategies in communication

networks is the shortest path routing. However, the shortest path routing is defined for a network with

known and constant link costs.

To overcome the uncertainty in networks, robust shortest path routing is discussed in different lit-

eratures [39],[117]. In [39],[117], a set of known scenarios corresponding to a predetermined set of link

costs is considered in which each cost scenario can be realized with a probability. Therefore, the robust

shortest path algorithm is defined in such a way that finds a path in which the maximum cost of the

path between different scenarios is minimized. Fig. 2.5 shows an example of a discrete set of scenarios.

As an example for different scenarios in the network, it can be seen in fig. 2.5 that there are 3 different

costs for the route ADE (i.e. AD +DE) in the network, i.e. 1 + 1, 2 + 2, 3 + 3.


Figure 2.5: Scenario Graph Model

Figure 2.6: Interval Graph

However, changes in network parameters are not discrete and limited. Therefore, the scenario model

is not a proper choice for real network design. In the other model, an interval of cost values is considered

for each link [30],[53]. Using this model, a robust deviation criterion is proposed [53], [77], [120] in

which the routing is done for single-source destination pair by considering an interval [Lij , Uij ] for link

ij. An example of interval graph is shown in fig 2.6. It is shown that the robust deviation problem in

the context of the shortest path routing is NP-hard [118], [120]. An integer programming formulation

of the robust deviation shortest path problem is proposed in [53] that can be applied only on an acyclic

directed graph with a small width (Acyclic graphs are the graphs without any cycle).

2.4.2 Robust Network Design vs. Perturbations in External Traffic

The other source of perturbation in network is the external traffic pattern. The external traffic fluctuation

is an inevitable nature of the network traffic. This fluctuation can cover a wide range of changes in the

traffic volume. Different approaches are considered to account for the variability of the traffic. In this

section, we first discuss the concept of oblivious routing that considers multi-path between source and

destination pairs and design the network in order to be prepared for the worst case scenarios in network

traffic loads. The other approach which is discussed is to consider a cutoff threshold to limit the expected

amount of changes in network.

To consider perturbations in input traffic, oblivious routing (i.e. without any knowledge of the


Figure 2.7: MIRA Example

current state of the network) is proposed. Oblivious routing is designed based on the worst case scenario

of the input traffic distribution and dependends only on source, destination and path randomization,

if randomization is allowed. In oblivious routing, a set of optional paths is selected for each source-

destination pair and a packet can only select from this predetermined set to travel from a source to

a destination. It is shown that for an undirected graph, we can use a polynomial time algorithm to

construct the oblivious routing strategy [14].

Other popular adaptive routing algorithms are the Minimum Interference Routing Algorithm (MIRA)

[64], and the Profile Based Routing (PBR) [101] that are based on minimum-interference theory. Min-

imum interference routing is based on MPLS traffic engineering and tries to do the routing based on

the minimization of interference to other source-destination pairs traffic. The concept of interference in

routing is the basis of MIRA algorithm. The interference caused by a flow between one source destination

pair is the impact of that flow on the maximum routable traffic between other sources and destinations.

Consider a source destination pair (S1,D1) in fig. 2.7. The max flow of (S1,D1) is the upper-bound on

total traffic that can be routed from S1 to D1 by considering constraints on network capacity distribution.

It can be seen that other flows associated with (S2,D2) and (S3,D3), can cause interference and decrease

the maximum routable flow of (S1,D1). Critical links in MIRA are defined as the links in which routing

traffic over critical links can cause interference on other source-destination pairs. Finding minimum

interference routing is a NP-hard problem and there are many research activities that provide heuristic

algorithms to decrease the complexity of MIRA [52].

There are several limitations associated with using interference as a routing criterion [107]. The first

one is that MIRA is capable of considering interference on single source-destination pair and cannot

find critical links for aggregated traffic from different source-destination pairs. In addition, MIRA only

concentrates on the interference effect without considering available capacity for passing the traffic.

Therefore, there is a chance that a flow request is rejected despite the fact that there are enough

resources to admit it. The other limitation of the MIRA algorithm is its complexity whick makes it

computationally inefficient to implement for large networks.

In [102], three different topologies, i.e. Parking Lot, Concentrator and Distributor, are discussed and


it is shown that the MIRA routing cannot perform well on these structures and can cause high dropping

rate of flow requests. These structures are common structures that can happen frequently in different

parts of the network.

In addition, the main concern of MIRA is bandwidth and other QoS requirements are not considered

which is not appropriate when multi-service network is considered.

Another approach to consider the traffic variations is to design the network based on distribution of

changes in the traffic pattern. In this approach, the variation of external traffic is limited by a threshold.

The decision of how much this threshold should be away from the mean traffic volume is made based

on the acceptable risk of the service provider and the distribution of traffic volume. In this approach, a

continuous distribution is considered for the demand volume between sources and destinations. Different

mathematical models are designed to avoid the risk of under-utilized resources, which impose unnecessary

extra charges on service providers by the worst case scenarios that can happen with low probabilities.

In [74], a stochastic traffic engineering framework is proposed that considers a continuous Gaussian

distribution for external traffic patterns between any source and destination in a network. Its objective

is to maximize serving demands considering the probability of occurrence of the demand pattern. The

mean-risk model is used that enables service providers to maximize the revenue of serving demands as

well as to avoid the risk of revenue falls by considering a risk measure. However, considering a Gaussian

distribution for the external traffic is not applicable in traffic patterns with a heavy tail. To consider a

heavy-tail traffic, [40] proposes a new risk measure to capture the long tail of traffic pattern.

Chapter 3

Robustness and Perturbation Effects

in Markov Chains

Many empirical systems are modeled by Markov chains. Markov chain analysis simplifies the analysis

of complex systems. Therefore, providing an approach in designing a robust Markov chain enhances the

robustness analysis of a wide range of systems that can be modeled by Markov chains.

In this chapter, we provide a theoretical framework for robust design of Markov chain. We present

a new definition for robustness that is tightly related to the network robust design. The new definition

is based on the amount of changes of node betweenness when a perturbation happens in transition

probabilities. We propose a new robustness factor that is more comprehensive than condition numbers

as a measure of Markov chain sensitivity. We will also discuss the applications of the proposed robustness

factor in betweenness centrality of weighted and unweighted graphs in the next sections.

3.1 Markov Chains and Perturbation Effects

A Markov chain is a mathematical model that contains a finite and countable number of states in which

the transitions between states is based on a transition probability matrix R = [rij ]. The next transition

in the Markov chain is only dependent on its current state. The stationary distribution of Markov chain

is a probability distribution ν such that

νR = ν (3.1)

Markov chain is said to be irreducible if we can get to any state from any state in finite number of

steps. In homogeneous Markov chain, the transition matrix R doesn’t change over time.

Rn(i, j) = Rn+1(i, j) (3.2)

where Rn(i, j) is the transition probability from state i to state j at step n. Markov chains are frequently

used to model real word complex systems. These systems are subject to perturbations. The perturbation

effect and Markov chain robustness are discussed in Markov Chain literatures. It is assumed that

20

Chapter 3. Robustness and Perturbation Effects in Markov Chains 21

transition matrix R of an irreducible and homogeneous Markov chain with stationary distribution ν

is perturbed to a matrix R′ with a stationary distribution ν̂ by a perturbation matrix E = R − R′.Extensive research activities focus on describing the effect of perturbation on the stationary distribution,

i.e. ν. Nearly, all previous studies show the upper-bounds for the changes in stationary distribution as

a function of absolute perturbation vector, i.e. ‖E‖∞ (E = [eij ], ‖E‖∞ = maxi∑j |eij |) as shown in

(3.3).

∥∥νT − ν̂T∥∥ ≤ κ ‖E‖∞ (3.3)where κ is called the condition number and is used as the measure of sensitivity. Element-wise condition

numbers and relative condition numbers are also considered which have the following forms

|νj − ν̂j | ≤ κj ‖E‖∞ (3.4)

|νj − ν̂j |νj

≤ κj ‖E‖∞ (3.5)

Different condition numbers have been proposed. The majority of perturbation bounds are functions

of the fundamental matrix of the Markov chain, i.e. Z. The fundamental matrix Z is the group inverse

of the I−R, i.e. Z = (I−R)]. We will discuss the definition of fundamental matrix and its properties inthe next parts of this chapter. Some examples of proposed condition numbers are shown in the following

equations, [96], [47], [36], [58].

κ1 = ‖Z‖∞, (3.6)

κ2 =maxj(z

]jj −mini z

]ij)

2, (3.7)

κ3 = maxij|zij |, (3.8)

These condition numbers are mostly calculated as a function of fundamental matrix elements. Al-

though these bounds provide good numerical measures of robustness of Markov chain against the per-

turbations, they do not convey qualitative information about the structure of the Markov chain. In

other words, the robustness of the Markov chain is not clear from the structure of the Markov chain.

The other condition numbers considering the passage time as a measure of Markov chain robustness are

shown in following [23], [18]

κ4 =1

2maxj

[maxi 6=jmij

mjj

](3.9)

κ5 = K, K =∑j

mijνj (3.10)

Mean first passage time (MFPT) can sometimes help to discuss the expected sensitivity of the Markov

chain models merely by observing the structure of the chain without the need for computing different


Markov chain parameters such as fundamental matrix and stationary distributions.

In this chapter, we propose a new definition of robustness of a Markov chain. In order to define the

new robustness measure, we should first define the node (state) betweenness for each node in the Markov

chain. We define the node betweenness as the number of times a random walker that wanders around in

the network, visits the node. It can be shown that if the number of hops of the random walker goes to

infinity, the proportion of times the random walker visits a node to total number of hops converges to the

stationary distribution of the Markov chain. However, we consider a case in which the random walker

starts at a specific node and ends walking at a specific destination. Therefore, the node betweenness

becomes very dependent on the structure of the Markov chain and the source-destination positions. In

future chapters, we use the analogy between the packet flows in the packet networks and the random

walk model to relate the Markov chain robust design to the network design problem.

We define the Markov chain robustness as the sensitivity of node betweenness to the perturbation

happens in transition matrix R. By using this definition, we propose a robustness factor analogous to

the condition number that can be used as a design criterion. We show that the new proposed robustness

factor is tightly related to the passage time of the Markov chain that gives us a powerful design tool in

the context of robust Markov chain design. As it will be discussed in future chapters, this consideration

of the Markov chain robustness has interesting applications in communication network design problems.

As it is shown in condition number examples, fundamental matrix and Kemeny constant play an

important role in characterizing the robustness of the Markov chain. In the next section, we define the

fundamental matrix and the Kemeny constant and discuss their interesting characteristics.

3.2 Fundamental Matrix and Kemeny Constant

To continue our discussion, we first define the fundamental matrix in the Markov Chain Theory. In

order to define the fundamental matrix we begin by defining

A = I −R

where R is a row stochastic matrix. Since, A belongs to a multiplicative group, its inverse is called the

group inverse of A and is denoted by A] [24]. The powers Rn of a regular transition matrix approach

a probability matrix R∞ with the rows of the same probability vector ν, i.e.

R∞ = ξn×1ν

where ξn×1 is all ones n× 1 vector. In particular the relation between the group inverse A] and thelimiting matrix R∞ is given by [44].

R∞ = I −AA] (3.11)

The group inverse matrix A] is a unique matrix satisfying the following three equalities

AA]A = A A]AA] = A] AA] = A]A (3.12)

The group inverse matrix A] is also called resolvent, fundamental matrix or Green’s function, i.e. Z = A]


[32]

Z = (I −R∞) + (R−R∞) + (R2 −R∞) + ... = (I − (R−R∞))−1 −R∞ (3.13)

This definition of fundamental matrix is different from the variant used in [44, 56, 73], in which Z ′ =

(R∞+ ξn×1νT )−1 = A] + ξn×1ν

T . However, in most applications involving the fundamental matrix, the

term ξn×1νT is redundant [73]. The elements of fundamental matrix , zij , defined in (3.13) have the

following interpretation as the difference between the number of visits to state j, starting in state i and

in equilibrium.

zij = νj(mνj −mij), mνj =∑i

mijνi (3.14)

Accordingly, we have the following properties for the fundamental matrix,∑i

νizij = 0 (3.15)∑j

zij = 0 (3.16)

An interesting result is shown in [72] to describe the perturbation effect on the fundamental matrix.

Consider that a Markov Chain with transition matrix R is perturbed by an error matrix E = R − R′,(Eξn×1 = 0). Therefore, the fundamental matrix of the perturbed Markov Chain has the following form

(A+ E)] = A](I + EA])−1 − (I −AA])(I + EA])−1A](I + EA])−1 (3.17)

We use the equation (3.17) to introduce the robustness measure.

In the following part, we discuss the mean first passage time concept and introduce the Kemeny

constant of a connected graph. The Kemeny constant is an interesting characteristic of the graph and

we show that it also has a robustness interpretation in graph theory.

By starting a Markov chain at state X0 = i, mij is the average waiting time to hit state Xt = j for

the first time.

mij = min {t > 0, Xt = j|X0 = i} (3.18)

The stationary distribution of this Markov chain is a row vector ν such that νR = ν. Therefore, the

mean first passage time from state i to equilibrium is

miν =∑j

mijνj

It is proven that miν is constant and does not depend on i [32] and the value of miν = K is the Kemeny

constant (seek time of the chain). Therefore we have,

K =∑j

νj∑i

νimij =∑j

mijνj

Kemeny constant is an interesting quantity for ergodic Markov chains and is independent of the initial


state of the Markov chain. In terms of the fundamental matrix, the Kemeny constant can be described

as

K =∑j

zjj = trace(Z) (3.19)

considering that∑j zij = 0.

Now consider the eigenvalues of the transition matrix R as

|βn| ≤ ... ≤ |β2| ≤ β1 = 1 (3.20)

the following lemma illustrates the relation between the Kemeny constant and the eigenvalues of the

transition matrix [68].

Lemma 1

K =

n∑i=2

1

1− βi

The Kemeny constant plays an important role in our future discussion of the robustness of the Markov

architecture.

In the next section, we discuss the random walk betweenness based on the transition probability

matrix. Random walk betweenness

Documents

by Armin Ghayoori - University of Toronto T-Space...Acknowledgements This thesis is the result of years of research at University of Toronto. During this period, I had the opportunity