140
Robust Network Design and Robustness Factor by Armin Ghayoori A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto c Copyright 2013 by Armin Ghayoori

by Armin Ghayoori - University of Toronto T-Space...Acknowledgements This thesis is the result of years of research at University of Toronto. During this period, I had the opportunity

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Robust Network Design and Robustness Factor

    by

    Armin Ghayoori

    A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

    Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

    c© Copyright 2013 by Armin Ghayoori

  • Abstract

    Robust Network Design and Robustness Factor

    Armin Ghayoori

    Doctor of Philosophy

    Graduate Department of Electrical and Computer Engineering

    University of Toronto

    2013

    This thesis presents a robust design approach for communication networks that includes capacita-

    tion and routing strategy design. Robustness is a mandatory property of core networks to respond to

    perturbations in network parameters for performance stability and reliable service delivery to different

    customers. Our proposed design approach is applicable to any system that is modelled by a weighted

    directed graph. To quantify the robustness measure, we borrow and develop different concepts and

    properties from Markov chain literatures as well as graph theory survivability discussions. We propose a

    new robustness definition for Markov chains. The new Markov chain robustness definition has different

    applications in network design. We define robustness as the sensitivity of the mean first passage time

    between any two states of the Markov chain. This sensitivity is measured based on the variations of the

    mean first passage times to the perturbations in transition probabilities. We show that this definition

    of robustness is related to the sensitivity of the betweenness of a node/state in a Markov chain, which is

    defined as the number of visits by a random walker that wanders around in the Markov chain according

    to its transition probabilities. It was shown that for an infinite walk, the proportion of number of visits

    to the total number of hops converges to the stationary probabilities. Therefore, an analogy can be

    seen between the well-known condition number and the robustness factor in a Markov chain. We also

    extend the robustness factor definition to network design problems. We show that the robustness factor

    can be used as a design criterion. The newly defined robustness factor is a function of the network

    capacitation, routing and external input and output traffic. We also emphasize the importance of the

    newly discovered graph theoretic metric, called the Kemeny constant, in network design problems. We

    discuss that a function of the Kemeny constant and robustness factor limits the sensitivity of network

    performance parameters to the perturbations in the network.

    ii

  • Acknowledgements

    This thesis is the result of years of research at University of Toronto. During this period, I had the

    opportunity to work with great people. Their support and their contribution had a great impact on my

    work.

    I would like to express my sincere gratitude to my advisor Prof. Leon-Garcia for the continuous sup-

    port of my Ph.D study and research, for his patience, motivation, enthusiasm, and immense knowledge.

    His guidance helped me in all the time of research and writing of this thesis.

    I thank my fellow labmates at University of Toronto: Leila Shayanpour, Hadi Bannazadeh, Ali

    Tizghadam, Ali Shariat, Weiwei Li, Tang Tang, Alireza Bigdeli, Hazem Soliman, Houman Rastegarfar,

    Hesam Rahimi, Agop Koulakezian, Nadeem Abji and Aakash Nigam for their valuable discussions and

    for their help in this thesis.

    I would like to thank my committee members, professor B. Liang, professor E. Sousa, professor B.

    Li and professor Ilow for their valuable feedbacks on my research work.

    I also dedicate this thesis to the memory of my father who I wish was here to share this moment

    with me. I would like to pay my greatest gratitude to my mother and my brothers for all their supports

    and encouragements.

    iii

  • Contents

    1 Introduction 1

    1.1 Challenges in Future Networks Management Systems . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Research Objectives and Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Literature Review 6

    2.1 Future Network Architecture Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2 Control and Management in Next Generation Networks . . . . . . . . . . . . . . . . . . . 7

    2.2.1 Functional Entities in Control and Management Plane . . . . . . . . . . . . . . . . 8

    2.2.2 Maintaining Service Continuity in Next Generation Networks . . . . . . . . . . . . 9

    2.3 Network Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.3.1 Networks and Weighted Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.3.2 Criticality and Random Walk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.4 Robust network design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.4.1 Robust Network Design vs. Weights Perturbations . . . . . . . . . . . . . . . . . . 16

    2.4.2 Robust Network Design vs. Perturbations in External Traffic . . . . . . . . . . . . 17

    3 Robustness and Perturbation Effects in Markov Chains 20

    3.1 Markov Chains and Perturbation Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.2 Fundamental Matrix and Kemeny Constant . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.3 Betweenness and a New Definition of Robustness . . . . . . . . . . . . . . . . . . . . . . . 24

    3.4 Robustness Factor in Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.5 Perturbation Effect and Betweenness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    3.6 Concluding Remarks and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    4 Random Walk Models and Network Design 38

    4.1 Random Walk on a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    4.2 Robustness Factor and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.3 Other Robustness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.3.1 Robustness and connectivity in Graphs . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.3.2 Robustness and Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    iv

  • 5 Robust Network Performance Evaluation 46

    5.1 Performance Metrics and Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    5.2 Robustness Factor and Network Performance Parameters . . . . . . . . . . . . . . . . . . 50

    5.2.1 Robustness Factor Impact on Average Delay and Throughput . . . . . . . . . . . . 51

    5.2.2 Power Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    5.2.3 Gravity Model and Robustness Factor . . . . . . . . . . . . . . . . . . . . . . . . . 54

    5.3 Network Design Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    5.3.1 Parking-Lot Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    5.3.2 Rocketfuel Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    5.3.3 Optimal Routing and Capacity Assignment . . . . . . . . . . . . . . . . . . . . . . 58

    5.3.4 Robust Routing and Minimum Delay Routing . . . . . . . . . . . . . . . . . . . . . 61

    5.3.5 Resource Assignment with Shortest Path Routing . . . . . . . . . . . . . . . . . . 64

    5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    6 Resource Allocation Based on the Gravitation Law 70

    6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    6.2 Gravitation Model and Resource Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    6.3 Multinomial Coefficients and Gravity Model . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    6.4 Gravitation Model with Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

    6.4.1 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

    6.5 Optimizing the Number of Active Access Points . . . . . . . . . . . . . . . . . . . . . . . . 86

    6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    7 Conclusions and Future Works 95

    7.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    7.1.1 Robustness in Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    7.1.2 Graph Connectivity and Robustness Factor . . . . . . . . . . . . . . . . . . . . . . 96

    7.1.3 Robust Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    7.1.4 Robust Network Design in Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . 97

    7.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    7.2.1 Distributed Robust Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    7.2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    A 103

    A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    A.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

    A.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    A.4 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    A.5 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    B 109

    B.1 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

    B.2 Proof of Theorem 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

    B.3 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

    B.4 Proof of Proposition 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    v

  • C 116

    C.1 Proof of Proposition 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    C.2 Proof of Proposition 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    C.3 Proof of Proposition 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    C.4 Proof of Proposition 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    C.5 Proof of Theorem 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

    D 120

    D.1 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

    D.2 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

    D.3 Proof of Theorem 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    D.4 Proof of Theorem 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

    Bibliography 124

    vi

  • List of Tables

    2.1 Link Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    4.1 Graph and Digraph Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    5.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    5.2 Optimization Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    5.3 Optimal Weights for Uniform Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    vii

  • List of Figures

    2.1 Network Control and Management Architecture . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.2 Management Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.3 Graph Modeling of a Capacitated Network . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.4 15-node/21-link Pacific Bell network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.5 Scenario Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.6 Interval Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.7 MIRA Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    3.1 Markov Chain with Linear Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.2 Kemeny Constant for Linear Architecture vs. size . . . . . . . . . . . . . . . . . . . . . . . 29

    3.3 Ring Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.4 Kemeny Constant vs. size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.5 Tree Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.6 Kemeny Constant for a Tree Markov Chain vs. size . . . . . . . . . . . . . . . . . . . . . . 33

    3.7 9× 9 torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.8 Kemeny Constant For a Torus Markov Chain versus size . . . . . . . . . . . . . . . . . . . 34

    3.9 Full Mesh Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.10 Kemeny Constant for a Full Mesh Markov Chain vs. size . . . . . . . . . . . . . . . . . . 35

    3.11 Kemeny Constant versus average number of connection per node . . . . . . . . . . . . . . 37

    4.1 Network Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    5.1 Directed Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    5.2 Network Traffic Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    5.3 Parking-lot Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    5.4 Rocketfuel Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    5.5 Average Delay vs. changes in external input traffic . . . . . . . . . . . . . . . . . . . . . . 59

    5.6 Average Delay sensitivity vs. perturbation in external input traffic . . . . . . . . . . . . . 60

    5.7 Average Delay sensitivity vs. perturbation in routing R . . . . . . . . . . . . . . . . . . . 60

    5.8 Average Traffic sensitivity vs. perturbation in routing R . . . . . . . . . . . . . . . . . . . 61

    5.9 Average delay Sensitivity of min√

    9K2 + 4K ′2∑ijrijpijχi and Minimum Delay Routing vs.

    changes in external input traffic duistributions . . . . . . . . . . . . . . . . . . . . . . . . 62

    5.10 Average delay Sensitivity of min√

    9K2 + 4K ′2∑ijrijpijχi and Minimum Delay Routing vs.

    perturbations in routing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    viii

  • 5.11 Total Traffic Sensitivity of min√

    9K2 + 4K ′2∑ijrijpijχi and Minimum Delay Routing

    vs.changes in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    5.12 Average Delay Sensitivity of min√

    9K2 + 4K ′2∑i χi and minimum delay routing with

    respect to the changes in external input traffic distrbution . . . . . . . . . . . . . . . . . . 63

    5.13 Average Delay Sensitivity of min√

    9K2 + 4K ′2∑i χi and minimum delay routing with

    respect to the perturbations in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    5.14 Average Delay Sensitivity of min√

    9K2 + 4K ′2∑iWiχi and minimum delay routing with

    respect to the changes in external input traffic distribution . . . . . . . . . . . . . . . . . 63

    5.15 Average Delay Sensitivity of min√

    9K2 + 4K ′2∑iWiχi and minimum delay routing with

    respect to the perturbations in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    5.16 Average delay Sensitivity of maximum entropy routing and Minimum U1 vs. perturbations

    in external input traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    5.17 Average delay of maximum entropy routing and Minimum U1 vs. perturbations in external

    input traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    5.18 Average delay Sensitivity of maximum entropy routing and Minimum U1 vs. perturbations

    in routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    5.19 Total traffic sensitivity of maximum entropy routing and Minimum U1 vs. perturbations

    in routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    5.20 Optimum resource assignment (minTavg) for CPSF routing . . . . . . . . . . . . . . . . . 67

    5.21 Optimum resource assignment (minU1) for CPSF routing . . . . . . . . . . . . . . . . . . 67

    5.22 Blocking probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    5.23 Blocking probability after Node 4 failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

    6.1 Newton’s law of universal gravitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    6.2 Coulomb’s law of universal gravitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    6.3 Two Access Points and 3 Demand nodes, pi,n > 0, 1 ≤ i ≤ 2 and 1 ≤ n ≤ 3 . . . . . . . . 766.4 Two Access Points and 1 Demand nodes, 1 ≤ i ≤ 2 and n = 1 . . . . . . . . . . . . . . . 836.5 Demand Distribution λT = 11450 RU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    6.6 Access Point Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    6.7 Allocated Resource Units from AP1, C1 = 6000 RU . . . . . . . . . . . . . . . . . . . . . . 90

    6.8 Allocated Resource Units from AP2, C2 = 6000 RU . . . . . . . . . . . . . . . . . . . . . . 90

    6.9 Allocated Resource Units from AP3, C3 = 1000 RU . . . . . . . . . . . . . . . . . . . . . . 91

    6.10 Allocated Resource Units from AP4, C4 = 1000 RU . . . . . . . . . . . . . . . . . . . . . . 91

    6.11 Demand Distribution λT = 3808 RU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

    6.12 Allocated Resource Units from AP1, C1 = 6000 RU . . . . . . . . . . . . . . . . . . . . . . 92

    6.13 Allocated Resource Units from AP2, C2 = 6000 RU . . . . . . . . . . . . . . . . . . . . . . 93

    6.14 Allocated Resource Units from AP3 and AP4, C3 = C4 = 1000 RU . . . . . . . . . . . . . 93

    7.1 Hierarchical Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    7.2 Minimum Power Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

    7.3 Transportation Network [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    ix

  • Chapter 1

    Introduction

    The complexity and size of today’s network architectures are increasing continuously. That includes the

    interconnection of heterogeneous networks with different technologies. In addition, the need for offering

    end-to end communications for services such as multimedia applications requires mobility support and

    hand-off support across different network technologies.

    Applications’ demand for resources magnifies the crucial need to monitor and to manage the func-

    tionalities and configurations of the network across a heterogeneous environment. This maintenance of

    functionality is achieved by using expert human resources to monitor the performance of the system and

    to change the configurations due to different perturbations that may happen in the system.

    Future networks are driven by innovations in services and network capabilities and should allow an

    evolutionary path from existing networks to a unified network that can support different applications with

    different requirements. In addition, applications such as high-bandwidth radio, cloud computing and

    peer-to-peer applications with increasing number of users necessitate an efficient utilization of resources

    to provide adequate Quality of Service (QoS) and Quality of Experience (QoE). The infrastructure

    design of future network architecture has an important role in end-to-end application support among

    a wide range of different networks, from mobile networks to fixed networks and different services from

    text messaging to high bandwidth multimedia applications and other applications envisaged for Future

    Networks.

    In addition, newly emerged applications are characterized to have variable traffic that is difficult

    to predict. Therefore, designing a static system without any robust design and adaptation using an

    estimation of the traffic is not efficient. Therefore, bandwidth guarantees and QoS commitments cannot

    be provided for the customers. To overcome the high variability of the traffic, service providers utilize

    manual routing adaptation and high over-provisioning.

    Hence, due to the explosion of the network size and network heterogeneity, an efficient network man-

    agement system with minimum human intervention becomes a challenging research in network studies.

    Different control and management systems are proposed for future network architectures and all of these

    architectures agree with the need for an autonomous control and management system that can manage

    its own adaptation to network changes considering performance, fault, and security concerns without

    human administrating intervention.

    The main objective of this thesis is to develop a design approach for communication networks that

    are supposed to be robust to the perturbations in the system. The perturbations include the changes in

    1

  • Chapter 1. Introduction 2

    routing distribution, capacitation and external traffic patterns in the network.

    In the following section, we provide a brief discussion of challenges in future network architectures.

    1.1 Challenges in Future Networks Management Systems

    Future network architectures are migrating to one consolidated IP network (Next Generation Network)

    that supports a variety of services from delay sensitive to high bandwidth applications. Service providers

    are currently moving towards exploiting a single IP packet network that covers different networks and

    services.

    Today’s IP networks are suffering from cascading meltdown effect of small local failures. In addition,

    the design and implementation of control and management systems in the IP networks are difficult. This

    complexity comes from the direct interaction of control and management planes over the heterogeneous

    resource pool. In other words, the control and management plane functionalities should deal with a wide

    variety of network protocols and technology-dependent network resources.

    Different approaches are considered in designing future network control and management architec-

    tures. By the growth of IP networks and the need for more advanced quality of service handling than the

    current best-effort strategy, many changes are made to control and management planes to adapt to new

    requirements. These changes effectively increase complexity and fragility of data networks. Continuing

    to use temporary remedies for data networks can cause more problems, which exacerbate the network

    control and management difficulties. The new idea underlying management systems is inspired by the

    theory of evolution [115].

    An autonomic self-organizing system consists of self-managing components. The Self-management

    concept can be divided into four different categories: Self-Configuring, Self-Healing, Self-Optimizing for

    current state of the system and self-optimizing for future turbulences [80].

    The ability of the system to configure itself due to changes is called self-configuring. There should

    be minimum or no intervention to deploy a policy or to adapt to the changes in the IT environment.

    A system with self-healing capability should be able to detect failures in components such as software

    or hardware and to take proper action to resolve the problem. The self-managing system should also

    be capable of allocating resources to users with different requirements in order to satisfy service level

    requirements and conserve system resources for future changes in the environment such as admission of

    new users.

    The management system attempts to achieve an optimum steady state that provides a required level

    of service. Therefore, the optimization problem considers short term and long term behaviours of the

    network illuminating the need for a control system to monitor the system in different time scales and

    make appropriate decisions through different control loops.

    The concept of an autonomic system and its application in different areas is investigated in different

    projects [80], [93], [43], [86]. The pioneering IBM autonomic computing proposal [80] starts a new

    wave of autonomics with the concentration of computing resources. Autonomia [93] provides a tool

    for application developers to specify management and control schemes for maintaining wide range of

    resource requirements. The application developers specify the requirements and maintenance scheme

    in the Application Management Editor (AME) and the Autonomic Middle-ware Service (AMS) builds

    the execution environment. Automate [43] provides a framework for autonomic grid applications. It

    separates policy from the grid infrastructure to organize the mechanisms corresponding to heterogeneity

  • Chapter 1. Introduction 3

    of resources and applications.

    The managed resources can be from a single source or a combination of sources which are monitored

    by sensors throughout the network. In [84], an outline of generic autonomic service architecture is

    proposed to provide a cost-effective approach for servicing, managing and maintaining various numbers

    of incoming applications and services for both computing resources and network resources. In [84] and

    [88] with more complete discussions in [41], everything from the application services such as VoIP and

    Gaming to other underlying services such as IP packet transport is called a service. Using this definition,

    services are divided into basic services, which cannot be divided to other services, and composite services

    that are composed of several basic services.

    New views in future networks architecture focus on redesigning the control and management space to

    cope with future requirements. Different approaches are followed in different research activities around

    the world. One of the future network architecture goals is to reduce the complexity of management

    systems by separating the decision plane from protocols that govern the interaction between network

    elements [115], [41],[83], [31]. Three key principles are proposed in the design considerations of control

    and management plane: network level objectives, network-wide views, and direct control [115]. Consid-

    ering these critical principles, the control functionalities are divided into 4 different planes: Decision,

    Dissemination, Discovery, and Data. The underlying network elements (bridges, routers, etc.) forward

    the packets under the control of decision plane. The main assumption is to have direct access and control

    over the whole network resources and network elements.

    In future network literature, every change is considered as mobility, therefore, to provide the seamless

    movement capability for moving networks, the following properties should hold [79]:

    • It should have the ability to discover itself and its surrounding environment.

    • It should be able to dynamically configure itself under varying and unpredictable changes.

    • It should be able to dynamically extract rules and methods of interaction with different neighborswith various characteristics.

    • It should be robust to unpredictable events which may cause malfunctioning of some of its controlelements and recover their basic functionalities.

    • It should be able to provide safety and security for service applications.

    As mentioned earlier, decisions in control and management space are based on three important

    principles: Network-level Objectives, Network Wide Views, and Direct Control.

    • Network-level Objectives: The implication of performance, reliability and policy should be inde-pendent of the underlying network elements for the entire network. As an example, consider that

    network provider A wants to restrict the network provider B users access to some services offered

    by some of the nodes in network A. Implementing this policy only in some of the edge routers may

    cause the users in network B to be able to violate this restrictions in some other way (e.g. through

    some newly added routers which are not properly configured).

    • Network Wide Views: The management and control plane should have access to the current stateof data plane, such as network elements and resource limitations. These states are as follows [90]:

    – Dynamic State: States which are dynamically accumulated and processed.

  • Chapter 1. Introduction 4

    – Configuration State: These states are determined by administration unit and are sent in the

    form of configuration commands. The link weights, which are important factors in execution

    of routing protocols, are examples of configuration states.

    This information plays an important role in controlling and managing any data network. Providing

    such updated information can be the duty of different nodes in the network, which monitor the

    existing communication links and network elements and discover the new added elements and

    links that can be installed during network developments. These resources and elements can even

    be added and deleted dynamically, which is the case in peer-to-peer networks.

    • Direct Control: Direct control reflects the fact that only control and management space functionalentities have the responsibility of configuring the data plane. For example, routing tables in

    routers are configured by distributed algorithms. It is very difficult to provide different control and

    management functionalities such as traffic engineering on these distributed algorithms. Movement

    of decision making responsibility from data plane reduces the complexity for implementing different

    requirements of future networks.

    In this section, we discussed the design criteria of future network architectures. We illustrated that

    the autonomic resource management is an inevitable part of any future network systems. An autonomic

    system is capable of adapting itself to the perturbations that can happen in the system environment. An

    autonomic system contains several control loops. These control loops monitor the system performance

    parameters and take actions when it is necessary. These actions vary from minor modifications to the

    whole system reconfiguration based on the changes in the system environment.

    The main purpose of this thesis is to design a system that has a low sensitivity to perturbations.

    The low sensitivity of performance parameters to the perturbations is a very important property of a

    system that is constantly facing perturbations in its parameters. We interpret the insensitivity of the

    system to perturbations as robustness. In a robust system, there is less need to reconfigure the system

    to adapt to the changes which may occur during system operation.

    1.2 Research Objectives and Thesis Contributions

    This thesis makes an attempt to provide a solution to a robust design problem in autonomic systems. To

    provide a solution, we first investigate the robustness issue in other related literatures such as Markov

    chains and graph theory. We propose a new robustness measure for the Markov chain that provides a

    qualitative apparatus for robust Markov chain design problems. We extend the solution to the network

    design problems and evaluate the performance using several examples.

    We can summarize the contributions of this thesis as follows,

    • In chapter 3, we provide a new definition of Markov chain robustness as the sensitivity of thetravel time in a Markov chain. This sensitivity definition is directly related to the betweenness

    sensitivity of states, when a random walker starts wandering around and visiting different states

    multiple times. The general definition of robustness in Markov chains allows the random walker to

    continue walking for infinite time and the sensitivity of the Markov chain for each state is defined

    as the sensitivity of proportion of total number of visits to that state to total number of hops the

    random walker has taken. We change this definition by specifying the destination of the random

  • Chapter 1. Introduction 5

    walker and calculate the sensitivity of total number of visits to each state when the starting and

    stopping states are known.

    • We discuss an important factor in Markov chain and Graph theories that is called the Kemenyconstant. We show that robustness of a Markov chain / Weighted graph can be discussed using the

    Kemeny constant. We also discuss the relation between the Kemeny constant and the Laplacian

    of the graph. We show that for an unweighted d-graph and for weighted graph with equal node

    weight for all nodes, the Kemeny constant and summation of the inverse of Laplacian Eigenvalues

    are directly proportional. In addition, we show the relation between the connectivity measures of

    the graph such as Iso-perimetric constant of the graph and algebraic connectivity with the Kemeny

    constant.

    • A network design problem can be divided into three different parts. The first part is networkresource distribution. In this part, the network designer distributes resources in a network that

    involves topology design of the network as well as network capacitation. The second part of the

    network design is to provide a routing strategy between source-destination pairs in a network. In

    designing a routing strategy, different considerations are taken into account such as network perfor-

    mance optimization as well as capacity constraints. In the third part of a network design problem,

    the external demand can be shaped in such a way that the network resources are utilized efficiently

    and network performance requirements are satisfied. These different network design problems are

    widely discussed in different literatures by considering optimization of different network design

    criteria.

    In chapters 4 and 5, we propose a new set of design approaches in data networks. We define a

    robustness factor for each of the nodes in a network. This robustness factor is a function of network

    capacity assignment and routing distribution as well as external input-output traffic pattern. We

    show that there are different choices as network design criteria based on the node robustness

    factors. We discuss different objective functions and evaluate the performance of the network

    designed based on these different objectives. We show how the robustness factors affect the total

    average delay and throughput of the network. We show that optimizing the weighted sum of the

    robustness factors provides an optimum trade-off between delay and maximum throughput of the

    network. We also show the similarity between the weighted sum of robustness factors and the

    Kleinrock’s power metric. We prove that the new robustness factor limits the sensitivity of the

    average end-to-end delay and traffic distribution in the network.

    • In chapter 6, we decompose the external traffic into two parts. The first part is the total trafficentering the node from external source. The other part is a square matrix that describes the

    destination preference of traffic entering at each node. We show that when the preference matrix

    has equal rows, the model is called the gravity model that is a reasonable estimation of external

    traffic distribution in backbone network.

    • We also propose a convex Entropy-based robust design that can be calculated using the convexoptimization tools for any distribution of sources and destinations. We develop the optimiza-

    tion scheme and prove the robustness property for bipartite graphs which are suitable choices for

    modeling last-mile wireless communication networks.

  • Chapter 2

    Literature Review

    One of the main challenges of future network architectures is to design an autonomic resource man-

    agement system to provide a proper framework for service delivery. To address this challenge, a new

    network architecture should be established. In this chapter, we review the next generation network

    architecture design. We discuss general considerations for future networks and provide a framework that

    is relatively common between different proposed approaches. We show that future management systems

    should be autonomic and should be able to handle the perturbations in the system, such as failures in

    different parts of network or changes in traffic patterns. To cope with perturbations, control loops are

    considered. We discuss how control loops are implemented hierarchically to adapt to different scales of

    change. In our proposed approach, we design networks in such a way that the network performance

    parameters have low sensitivity to network changes. This insensitivity makes the control loop execution

    less frequent and the system more stable.

    2.1 Future Network Architecture Design Goals

    A network architecture is a set of abstract design guidelines. It helps the design process to meet the

    requirements by following design principles. It is located between physical resources and users’ applica-

    tions. It should be able to adapt to variations in users’ requests and evolution of network technologies,

    configurations and topologies.

    To make this architecture stable, we should design a stable network that does not need reconfiguration

    frequently due to changes in requests and/or physical resources.

    The design requirements of future network architectures should be based on the following consider-

    ations [34]:

    • Large Switching and Communication Link Capacity : To evaluate design performance of nextgeneration networks, the increasing traffic trend should be considered. It is shown in [34] that the

    traffic level in a typical exchange point doubles every 18 months.

    • Scalability: Future networks will be composed of several different networks that can be extremelydiverse. In addition to human users, the number of machine to machine communications is expected

    to increase considerably, which necessitates a scalable framework for future network architectures.

    6

  • Chapter 2. Literature Review 7

    • Openness: The network should be able to maintain an appropriate level of competition. The key tohave a competitive network is to standardize the interfaces and technologies. In addition, different

    mechanisms should be developed to enable users to provide innovative services.

    • Robustness: A network should not be affected by perturbations and failures that can happen indifferent parts of the network. Network architecture should provide an autonomic approach to deal

    with the changes in network topology, traffic patterns, policies and security concerns to maintain

    service continuity for users. These adaptation procedures should be transparent to the applications

    which are using the network resources.

    • Safety: In addition to establishing a secure connection between a source and a destination, networksecurity should be concerned with the identity of users. Security issues have an utmost importance

    in banking, credit systems and certifications. It should also prove to be safe and robust during a

    catastrophe.

    • Diversity: The performance of a communication network should be independent of specific ap-plications and usage trends. It should be able to provide resources for diverse communication

    requirements from a computer-centric traffic to telephony applications.

    • Ubiquity: The network should comprehensively monitor user activities and network performance.However, in human activity monitoring, privacy concerns should be considered and a balanced

    trade-off should be provided between privacy protection and transparency in network activities.

    • Integration and Simplification: Network service providers should be able to provide services to awide range of applications by integrating the common parts and collecting various functions. These

    integrations and simplifications increase network reliability and facilitate extensibility of network

    services.

    • Electric Power Conservation: The number of routers and data centres that should be used in anetwork is increasing rapidly. Each network device consumes kilowatts of power, which necessitate

    preparation for several megawatt power for a communication network. In addition, the concerns

    for green and sustainable energy consumption should also be considered for future network design.

    • Extendibility: Networks should be easily extend-able using universal communications to overcomelanguage and physical obstacles for self-reformation.

    2.2 Control and Management in Next Generation Networks

    Future control and management systems should be capable of integrating different networks to provide a

    homogeneous network for users [85]. This transparency of heterogeneity in network structure necessitates

    the design of a control and management plane which can make an agreement for cooperation on demand

    without a need for offline negotiation and reconfiguration.

    This approach should have the capability to be generalized for distinct network architectures, such as

    vehicular networks and sensor networks. In addition, end-users in future networks are not just a single

    node, they can own a network of devices, in home, office and around the body.

  • Chapter 2. Literature Review 8

    Figure 2.1: Network Control and Management Architecture

    2.2.1 Functional Entities in Control and Management Plane

    In future network architectures, network is a collection of nodes and network elements which share

    a common control space. The control space has proper functionalities to communicate with other

    control spaces. This architecture consists of different modular components interacting through interfaces.

    These interfaces make components independent of specific implementation and technology. Different

    components of the network architecture are shown in fig. 2.1. As it is shown in fig. 2.1, control and

    management functionalities are located in network control plane.

    The control and management functions have access to network elements and nodes through Resource

    Interfaces. The Resource Interface (RI) provides a homogeneous view of access technologies. It virtualizes

    the resources provided by different technologies in a unified form.

    Different functional entities of the Control Space (CS) between two networks communicate through

    a Network Interface (NI). NI plays an important role in composition of networks. Networks can also

    trade resources through on the fly compositions. Having network composition capability is an important

    feature of future network architectures facilitating the support of seamless handover for mobile users.

    Composition is achieved when Composition Agreement (CA) and agreement lifetime are created between

    two networks through the negotiation between functional entities in CS.

  • Chapter 2. Literature Review 9

    Services in the application layer profit from network functionalities through the Service Interface

    (SI). Using service interface, applications only handle their end-to-end communications with their peers

    without a need to increase their complexity of handling transport and network functionalities.

    The control functionalities of the CS are grouped into two parts each of which having different

    functional entities as [84]:

    • Resource management functionalities that support and manage user plane connectivity:

    – Bearer and Overlay Management (BOM): BOM offers end-to-end services to applications

    through SI. In order to provide QoS for the users, networking and transport protocols utilize

    the so-called Service-aware Transport Overlays [51], [16] controlled by Bearer and Overlay

    Management.

    – Flow and Mobility Management (FMM): FMM ensures that end-to-end bearers are not af-

    fected by connectivity changes and movement events in underlying connections.

    – Access Management (AM): AM manages resources, configures and establishes flows. It also

    monitors and discovers access links.

    – Trigger and Context Management (TCM): TCM controls, broadcasts and collects context

    information. Mobility related events and state changes are handled by Trigger and Context

    Management FE [103].

    • Administrative Domain Management Functionalities:

    – Security Domain Management: These functions manage resource grouping in a common man-

    agement and control plane based on security policies. This allows controllable and authorized

    use of Control Space functionalities.

    – Composition Control: Composition control functions administrate the composition feature of

    networks. They govern negotiation and agreement realization between composed autonomous

    CSs

    – Compensation and INQA: Compensation and INQA include inter-network quality of service

    agreement and service level agreement functionalities.

    – Network Management: Network management functions configure and maintain policy databases.

    The network management subsystem also collects user preferences, mobility triggers and net-

    work status.

    2.2.2 Maintaining Service Continuity in Next Generation Networks

    Service maintenance is an important design factor in next generation networks. The main concerns are

    heterogeneity of underlying networks and high variability of demand traffic.

    Therefore, the control and management system is responsible for initiating, configuring, maintaining

    and shutting down Service-aware Adaptive Transport. This adaptive transport has all the functionalities

    required to provide end-to-end connection. The end-to-end connection in the user plane is established

    using three categories of network devices: Routers, Processors and Caches. Routers forward the data

    on the network according to the dynamically configured routing tables to enhance QoS by reducing

    the risk of suboptimal routing in the overlay network [82]. Processors process incoming data (e.g.

  • Chapter 2. Literature Review 10

    virus scanning) and also provide control functions (e.g. SIP Proxy, Real Time Streaming Protocols

    Functionalities). Cashing is done to store incoming data flows for different purposes such as generating

    deliberate extra delays, e.g. to compensate jitters.

    By interacting with different functional entities, such as maintaining network connectivity, service

    authorization and metering and QoS monitoring for Inter-AN Communications, the management system

    provides end users with an acceptable Quality of Experience [81].

    Routing algorithms provide a service aware routing for QoS-sensitive applications [82]. In route se-

    lection, both path and service components are considered to make decision. Routing algorithms consist

    of three different subsections. The first subsection is topology which is created after connection setup

    and can be modified during application session. Topology design is done based on application require-

    ments, e.g. a tree topology is created for multi-cast applications. QoS collector accumulates QoS status

    information through network sensors and local resource manager and abstracts them into a form which

    could be used by routing algorithms. Route computing module computes the best route based on the

    topology generated by the topology module and metrics collected by QoS feedback collectors.

    In order to gather information about network state, sensors are used. Sensors are software compo-

    nents distributed in heterogeneous networks. These sensors monitor the network resources and context

    information. Sensors provide information and simplify integration of resource information. They are

    broadly categorized into two types: Node Sensors (e.g. monitor memory and CPU usage of nodes) and

    Network Sensors (e.g. BW sensors which monitor the bandwidth usage of links).

    Different approaches can be followed for end-to-end bandwidth measurements: Passive and Active. In

    the passive measurement, sensors only monitor the passing packets and compute the available bandwidth

    as the minimum available bandwidth on each of the links along the routing path. But in the active

    measurement, sensors actively send probing packets into the network. Active measurements provide

    more efficient and more flexible estimation for wireless networks because of their fast changing topology

    [78]. TOPP (Train of Packet Pair) [12], SLoPS (Self-Loading Periodic Stream) [11] and SLOT [5] are

    some of the proposed active end-to-end bandwidth measurement schemes which have different estimation

    accuracy and probing time. These techniques send streams of probing packets to estimate end-to-end

    available bandwidth.

    The information provided by sensors are processed by QoS collectors which monitor the QoS of every

    flow in overlay networks. QoS collectors provide triggers and signaling for better resource utilization

    and QoS preservation. The monitoring and management loops are shown in fig. 2.2. As it can be

    seen, control and management functionalities can be divided into two parts. The first part is the local

    management that makes short term decisions according to the feedback received from users and sensors

    in the network. The other part of the management system is responsible for re-configuring the network

    as a long term adaptation.

    Similar structural approach is followed by the AKARI project. It proposes five different sub-

    architectures that have different goals in the overall network architecture [2]. These five modules are

    defined to provide a hierarchical architecture and control loops to react to small perturbations and big

    changes in the network. Similar to the architecture discussed previously, AKARI provides interfaces for

    the control plane to physical resources, applications and control planes of other autonomous systems.

    AKARI introduces a multiple access mechanism that is called Packet Devision Multiple Access (PDMA).

    PDMA is a communication scheme that shares all the bandwidth among the access points in the same

    interference domain. It uses CSMA/CA for interference control within a cell and among interfering cells.

  • Chapter 2. Literature Review 11

    Figure 2.2: Management Loops

    Therefore, bandwidth can be adaptively distributed in the network between different cells.

    The SAVI network architecture design aims to provide flexible infrastructure that involves backbone

    data centres, optical backhaul, smart edges and wireless access. It provides an effective, efficient, reliable

    and adaptive supports for large scale distributed applications with wide range of requirements that

    seems inevitable for future network scenarios. This architecture should be adaptive to incoming demand

    characteristics and should be able to change characteristics and behaviours of the system to cope with the

    perturbations that can happen in every autonomic system. To have reconfigurable, scalable and efficient

    management system, SAVI proposes a control and management plane over the physical resources. A

    similar approach is followed in other network architecture designs such as GINI and FIND in US [91]

    and [26] EuroNGI in Europe [65].

    In all of the discussed network architectures, control loops are considered for resource provisioning

    in networks. This hierarchical architecture consists of multiple loops managing resources in different

    levels based on the scale of perturbations in the network. The purpose of this thesis is to provide a

    framework for robust network design in which the performance of the network is not affected by the

    perturbations happening in resources, routing and traffic pattern and alleviate the need for frequent

    network reconfiguration.

    2.3 Network Modeling

    In the past few decades, extensive research activities have been dedicated to study and to analyze the

    behaviors and properties of complex systems, such as Sociology, Physics, Biology and Data networks. The

    results of these research activities illuminate surprising similarities between different complex systems.

    To explain these similarities, graph-theoretical approaches are used to demonstrate and to charac-

    terize the property of complex systems. Many different complex systems are decomposed into several

    different components with a variety of inter-component interactions. These systems that contain different

    interacted components can be modeled by a mathematical object called a graph. A graph is a collection

    of nodes and edges connecting the nodes. Nodes are considered to be the interacting components of the

  • Chapter 2. Literature Review 12

    Figure 2.3: Graph Modeling of a Capacitated Network

    system and edges demonstrate the interaction between the nodes.

    For example, in an ecological system, nodes can be considered as different species and links represent

    the interaction of the species population.

    The resulting graph can be directed or undirected. In directed graphs, the interaction of the different

    components of the system is not symmetric. As an example of undirected graph, consider that we

    modeled different people in a party by nodes in the graph and we also define the interaction between

    the nodes as whether or not they shake hands. This system can be modeled by an undirected graph

    in which the interactions of nodes (Hand shaking) are modeled by undirected links. However, if we

    define a different interaction of nodes for the same system such as having specific knowledge about other

    participants in the party, this interaction is not symmetric and the system cannot be modeled by an

    undirected graph. The other example of undirected graph is resistance circuits in electrical networks.

    However, if directional elements such as diodes are used in the network, the undirected graph model is

    not a proper tool to model the network.

    In the next section, we concentrate on data network models and decide how a weighted graph is used

    to characterize and to model a capacitated network.

    2.3.1 Networks and Weighted Graphs

    In general, communication networks are modeled by weighted graphs. Network elements, such as routers

    are represented as nodes and network links are represented as edges in the graph. The weight of a link

    in the network is considered to be the capacity of that link.

    Based on the weighted graph modeling, analysis of the network is simplified and the flow of the

    packets can be tracked using graph representation of the network.

    As an example consider the network in fig. 2.3.1. As we can see, we have 3 routers in the network

    labled as A, B and C. We consider that the communication links are installed in the network based on

    the table 2.1.

  • Chapter 2. Literature Review 13

    Table 2.1: Link CapacityLink CapacityA→ B 100 MbpsA→ C 80 MbpsB → C 60 MbpsB → A 100 MbpsC → B 200 MbpsC → A 50 Mbps

    This network can be modeled as an undirected graph with the weight matrix W as shown in Eq.

    (2.1).

    W =

    0 wAB wACwBA 0 wBCwCA wCB 0

    (2.1)Elements of matrix W are assumed to be the capacity of the links connecting nodes in the network.

    The weighted graph modeling of the data networks is not limited to wire-line networks. Wireless

    networks are also modeled by directed weighted graphs. However, in constructing the network and de-

    signing the topology, different constraints should be considered. One of the constraints is the interference

    in the network. Since wireless channels are shared among different users, topology design should be done

    in a way that limits the interference on other links.

    Topology design in wireless networks has many different goals, such as reducing the energy con-

    sumption, minimizing interference, increasing connectivity, communication efficiency and so on. Some

    of these design aspects are contradictory. For example, to increase the connectivity of the network, we

    may want to establish a full mesh network. However, increasing the number of active links is not an

    optimum choice when considering the minimum power or minimum interference design goals.

    In addition, power management algorithms should be employed in conjunction with topology design

    algorithms to provide a power efficient and low interference topology.

    To consider the interference effect of wireless links, [49] proposed the concept of conflict graph. The

    conflict graph shows the interfering groups of links that cannot be active simultaneously.

    It is shown that the conflict graph modeling has the capability to model different aspects of wireless

    networks such as multiple radio interfaces and multiple radio channels per user.

    Graph modeling has also been used in optical networks. One of the simplest methods of network

    modeling in optical networks is to model the network by a weighted graph in which the weights of links

    are set to be one without considering the availability of wavelengths. Routing in this network model

    is done on hop distance based shortest path first routing (HD-SPF). In another approach, links are

    weighted based on the total number of wavelengths and link length. This modeling approach is called

    hybrid weighting and routing is accomplished based on hybrid weighted shortest path first algorithm

    (HW-SPF) [114]. Fig 2.4 shows the 15-node Pacific Bell optical network.

    However, in these network modelings, there is always an uncertainty about the network parameters.

    This uncertainty results in an unexpected network performance. Therefore, in any network design

    problem, we should also consider the sensitivity of the network performance to the changes in design

  • Chapter 2. Literature Review 14

    Figure 2.4: 15-node/21-link Pacific Bell network

    parameters. In the next section, we discuss different approaches to the network robust design problem.

    2.3.2 Criticality and Random Walk Analysis

    In [108], a new approach in traffic engineering is introduced. This approach is based on the betweenness

    concept borrowed from graph theory. Based on the concept of betweenness, the criticality of path and

    link are defined. The criticality concept is a measure of the centrality of a link and shows how critical a

    link is for connecting different flows in a network.

    The main core of the criticality based approach is to determine critical links in the network and

    to try to avoid routing over those links. In addition, the criticality method can be used as a network

    design criterion to distribute resources in a network. Using criticality based approach, more bandwidth

    is assigned to critical links to avoid congestion in the network.

    The simulation results show the superiority of the proposed designing approach over other strategies

    such as MIRA and constrained shortest path. It shows that the criticality based design gives us a better

    blocking probability compared to other well-known strategies such as MIRA and shortest path.

    To analyze robustness property of the criticality based design, [107] considers random walk analysis

    based on random walk on the undirected weighted graph of resources. It shows that for uniform external

    traffic, the criticality is a global metric in the network that can be considered as an optimization criterion.

    Minimum criticality network has the minimum sensitivity to the perturbations in resource distributions.

    The applications of random walk routing are discussed in different networks such as wireless sensor

    networks. In wireless sensor networks, random walk model is gaining popularity because of its simplicity,

    low overhead and inherent robustness [69], [105]. In wireless sensor networks, a large number of small

    nodes are used. These networks are subject to structural changes due to channel fluctuations, node

    failures and other factors. Random walk algorithms can provide a robust routing strategy at the expense

    of QoS support in wireless sensor networks

    Random walk routing analysis is a common model to analyze different aspects of communication

    networks. In [15], random walk model is used to propose an approach for topology inference. This paper

    goes further and shows that the proposed approach also works for general routing strategies.

  • Chapter 2. Literature Review 15

    In this thesis, we propose a new robustness factor and analyze its property using an open Jackson

    network.

    2.4 Robust network design

    There is a transition from best effort networks to multi-service networks that provide service for wide

    range of applications with different Quality of Service requirements. The rapid development of high-

    speed links and switches enables the service providers to consider different over-provisioning strategies

    to maintain service continuity for the users.

    The problem in routing strategy design is to avoid bottlenecks in the network. It is shown in [48]

    that the bottleneck can happen in both inter-domain and intra-domain topologies illuminating the need

    for both efficient capacitation and routing in data networks.

    New challenges also arise due to QoS requirements of network applications, which include and are

    not limited to bandwidth, delay and jitter requirements. In addition, node/link failures and other

    perturbations in the network can occur frequently. Therefore, minimization of the perturbation effect

    on the network performance should also be considered.

    In this thesis, robustness is studied by considering perturbations in different parameters of the net-

    work. The perturbations can happen in routing distribution, capacitation and the external input-output

    traffic patterns. In addition, there is always uncertainty about the design parameters in the design

    process of every real-world system. To overcome this uncertainty problem, different approaches are

    followed.

    In this section, we address the network design problem. We aim to design a network in such a way

    that its performance parameters become robust to network perturbations. The network perturbations

    include changes in capacity distribution and in routing tables as well as changes in traffic patterns.

    Classical network design optimization problems include minimizing network performance parameters

    such as average delay and throughput [60]. Another design aspect that can be considered is the stability

    of the operating point. In other words, if changes happen in network settings, stability describes how

    far the operating point deviates from the optimum point. This perturbation can occur with respect to

    changes in capacity of links, demand traffic variations and routing. These variations can affect flow of

    traffic (the links traffic load) and/or the total average delay as defined in [71], [60].

    The robustness definition is tied to system sensitivity [27]. A system is called robust if the variations

    in system environment result in minimum variations in the system performance parameters. Therefore,

    measuring and minimizing sensitivity of a system is one of the main concerns in robust system design

    literatures [107], [27]. However, considering robustness may not always result in the best performance

    system. Thus, trade-off always exists among the robustness, performance and cost. Due to the scale

    of current networks, many resource management schemes have been developed aiming for robust design

    [107], [13], [92], [89].

    System robustness is defined as the system resilience to perturbation occurred during the system

    operation. Robustness can also be defined as the capability of the system to adapt to new conditions

    after turbulences happen in the system regarding the input-output and/or the system parameters. The

    authors in [111] pulled together the robustness literatures in which robustness is called adaptation,

    coping and resilience. These expressions have similar definitions, however, in different disciplines possess

    distinct perspectives. In the next section, we discuss the robust design strategies in the network.

  • Chapter 2. Literature Review 16

    2.4.1 Robust Network Design vs. Weights Perturbations

    Routing is a critical operation in networks. Computing an optimal route between source-destination

    pairs is one of the main concerns in different networks from transportation to data networks. In data

    networks, different routing strategies such as shortest path and multi-path routing are developed. These

    different routing strategies have several benefits and impairments which should be addressed in network

    design problems. For example, the shortest path routing is the most common routing in the network

    that minimizes total traffic in the network. The classical algorithm for the shortest path routing strategy

    is Dijkstra’s algorithm that iteratively calculates all the shortest paths in the network. The calculation

    of shortest path in a network has been one of the most intensely studied problems in communication

    networks. However, dynamic routings that consider alternative paths rather than shortest path reduce

    the congestion probabilities. In addition, in wireless networks with variable link qualities, the shortest

    path is not recommended. The other routing strategy is to balance the load distribution and to mini-

    mize total traffic in the network. Load balancing strategies improve network service capability by not

    overloading central links and nodes in the network.

    In [112], routing strategies are classified as follows,

    • Intra-domain and inter-domain network optimization.

    • Circuit switch based routing (e.g. multi-protocol label switching) and IP-based routing.

    • Offline and online routing based on the time-scale of traffic estimation and capacitation validity.

    • Source-Destination type: Single source-Single Destination, Single Source-Multiple Destinations.and Multiple sources -Multiple Destinations.

    The efficiency of any routing strategy dependends on availability of information of network capacita-

    tion, topology and traffic matrix (TM). The traffic matrix demonstrates end-to-end traffic between any

    source-destination pair in a network. Therefore, the traffic matrix estimations play an important role in

    network management. There are two sources of information for traffic estimators. The first source is the

    service level agreements between user and service provider and the second source is network monitor-

    ing system of the network which provides an aggregated information of total amount of traffic passing

    through each node and link. However, due to the network and traffic dynamics in the form of perturba-

    tions in traffic distribution and network resources, adaptive and robust managing strategies should be

    applied in networking systems.

    Uncertainty about the system parameters appears in different analysis. The complexity lies in the

    number of parameters that are allowed to vary. In the context of routing, uncertainty can happen consid-

    ering link costs and traffic demands in a network. One of the famous routing strategies in communication

    networks is the shortest path routing. However, the shortest path routing is defined for a network with

    known and constant link costs.

    To overcome the uncertainty in networks, robust shortest path routing is discussed in different lit-

    eratures [39],[117]. In [39],[117], a set of known scenarios corresponding to a predetermined set of link

    costs is considered in which each cost scenario can be realized with a probability. Therefore, the robust

    shortest path algorithm is defined in such a way that finds a path in which the maximum cost of the

    path between different scenarios is minimized. Fig. 2.5 shows an example of a discrete set of scenarios.

    As an example for different scenarios in the network, it can be seen in fig. 2.5 that there are 3 different

    costs for the route ADE (i.e. AD +DE) in the network, i.e. 1 + 1, 2 + 2, 3 + 3.

  • Chapter 2. Literature Review 17

    Figure 2.5: Scenario Graph Model

    Figure 2.6: Interval Graph

    However, changes in network parameters are not discrete and limited. Therefore, the scenario model

    is not a proper choice for real network design. In the other model, an interval of cost values is considered

    for each link [30],[53]. Using this model, a robust deviation criterion is proposed [53], [77], [120] in

    which the routing is done for single-source destination pair by considering an interval [Lij , Uij ] for link

    ij. An example of interval graph is shown in fig 2.6. It is shown that the robust deviation problem in

    the context of the shortest path routing is NP-hard [118], [120]. An integer programming formulation

    of the robust deviation shortest path problem is proposed in [53] that can be applied only on an acyclic

    directed graph with a small width (Acyclic graphs are the graphs without any cycle).

    2.4.2 Robust Network Design vs. Perturbations in External Traffic

    The other source of perturbation in network is the external traffic pattern. The external traffic fluctuation

    is an inevitable nature of the network traffic. This fluctuation can cover a wide range of changes in the

    traffic volume. Different approaches are considered to account for the variability of the traffic. In this

    section, we first discuss the concept of oblivious routing that considers multi-path between source and

    destination pairs and design the network in order to be prepared for the worst case scenarios in network

    traffic loads. The other approach which is discussed is to consider a cutoff threshold to limit the expected

    amount of changes in network.

    To consider perturbations in input traffic, oblivious routing (i.e. without any knowledge of the

  • Chapter 2. Literature Review 18

    Figure 2.7: MIRA Example

    current state of the network) is proposed. Oblivious routing is designed based on the worst case scenario

    of the input traffic distribution and dependends only on source, destination and path randomization,

    if randomization is allowed. In oblivious routing, a set of optional paths is selected for each source-

    destination pair and a packet can only select from this predetermined set to travel from a source to

    a destination. It is shown that for an undirected graph, we can use a polynomial time algorithm to

    construct the oblivious routing strategy [14].

    Other popular adaptive routing algorithms are the Minimum Interference Routing Algorithm (MIRA)

    [64], and the Profile Based Routing (PBR) [101] that are based on minimum-interference theory. Min-

    imum interference routing is based on MPLS traffic engineering and tries to do the routing based on

    the minimization of interference to other source-destination pairs traffic. The concept of interference in

    routing is the basis of MIRA algorithm. The interference caused by a flow between one source destination

    pair is the impact of that flow on the maximum routable traffic between other sources and destinations.

    Consider a source destination pair (S1,D1) in fig. 2.7. The max flow of (S1,D1) is the upper-bound on

    total traffic that can be routed from S1 to D1 by considering constraints on network capacity distribution.

    It can be seen that other flows associated with (S2,D2) and (S3,D3), can cause interference and decrease

    the maximum routable flow of (S1,D1). Critical links in MIRA are defined as the links in which routing

    traffic over critical links can cause interference on other source-destination pairs. Finding minimum

    interference routing is a NP-hard problem and there are many research activities that provide heuristic

    algorithms to decrease the complexity of MIRA [52].

    There are several limitations associated with using interference as a routing criterion [107]. The first

    one is that MIRA is capable of considering interference on single source-destination pair and cannot

    find critical links for aggregated traffic from different source-destination pairs. In addition, MIRA only

    concentrates on the interference effect without considering available capacity for passing the traffic.

    Therefore, there is a chance that a flow request is rejected despite the fact that there are enough

    resources to admit it. The other limitation of the MIRA algorithm is its complexity whick makes it

    computationally inefficient to implement for large networks.

    In [102], three different topologies, i.e. Parking Lot, Concentrator and Distributor, are discussed and

  • Chapter 2. Literature Review 19

    it is shown that the MIRA routing cannot perform well on these structures and can cause high dropping

    rate of flow requests. These structures are common structures that can happen frequently in different

    parts of the network.

    In addition, the main concern of MIRA is bandwidth and other QoS requirements are not considered

    which is not appropriate when multi-service network is considered.

    Another approach to consider the traffic variations is to design the network based on distribution of

    changes in the traffic pattern. In this approach, the variation of external traffic is limited by a threshold.

    The decision of how much this threshold should be away from the mean traffic volume is made based

    on the acceptable risk of the service provider and the distribution of traffic volume. In this approach, a

    continuous distribution is considered for the demand volume between sources and destinations. Different

    mathematical models are designed to avoid the risk of under-utilized resources, which impose unnecessary

    extra charges on service providers by the worst case scenarios that can happen with low probabilities.

    In [74], a stochastic traffic engineering framework is proposed that considers a continuous Gaussian

    distribution for external traffic patterns between any source and destination in a network. Its objective

    is to maximize serving demands considering the probability of occurrence of the demand pattern. The

    mean-risk model is used that enables service providers to maximize the revenue of serving demands as

    well as to avoid the risk of revenue falls by considering a risk measure. However, considering a Gaussian

    distribution for the external traffic is not applicable in traffic patterns with a heavy tail. To consider a

    heavy-tail traffic, [40] proposes a new risk measure to capture the long tail of traffic pattern.

  • Chapter 3

    Robustness and Perturbation Effects

    in Markov Chains

    Many empirical systems are modeled by Markov chains. Markov chain analysis simplifies the analysis

    of complex systems. Therefore, providing an approach in designing a robust Markov chain enhances the

    robustness analysis of a wide range of systems that can be modeled by Markov chains.

    In this chapter, we provide a theoretical framework for robust design of Markov chain. We present

    a new definition for robustness that is tightly related to the network robust design. The new definition

    is based on the amount of changes of node betweenness when a perturbation happens in transition

    probabilities. We propose a new robustness factor that is more comprehensive than condition numbers

    as a measure of Markov chain sensitivity. We will also discuss the applications of the proposed robustness

    factor in betweenness centrality of weighted and unweighted graphs in the next sections.

    3.1 Markov Chains and Perturbation Effects

    A Markov chain is a mathematical model that contains a finite and countable number of states in which

    the transitions between states is based on a transition probability matrix R = [rij ]. The next transition

    in the Markov chain is only dependent on its current state. The stationary distribution of Markov chain

    is a probability distribution ν such that

    νR = ν (3.1)

    Markov chain is said to be irreducible if we can get to any state from any state in finite number of

    steps. In homogeneous Markov chain, the transition matrix R doesn’t change over time.

    Rn(i, j) = Rn+1(i, j) (3.2)

    where Rn(i, j) is the transition probability from state i to state j at step n. Markov chains are frequently

    used to model real word complex systems. These systems are subject to perturbations. The perturbation

    effect and Markov chain robustness are discussed in Markov Chain literatures. It is assumed that

    20

  • Chapter 3. Robustness and Perturbation Effects in Markov Chains 21

    transition matrix R of an irreducible and homogeneous Markov chain with stationary distribution ν

    is perturbed to a matrix R′ with a stationary distribution ν̂ by a perturbation matrix E = R − R′.Extensive research activities focus on describing the effect of perturbation on the stationary distribution,

    i.e. ν. Nearly, all previous studies show the upper-bounds for the changes in stationary distribution as

    a function of absolute perturbation vector, i.e. ‖E‖∞ (E = [eij ], ‖E‖∞ = maxi∑j |eij |) as shown in

    (3.3).

    ∥∥νT − ν̂T∥∥ ≤ κ ‖E‖∞ (3.3)where κ is called the condition number and is used as the measure of sensitivity. Element-wise condition

    numbers and relative condition numbers are also considered which have the following forms

    |νj − ν̂j | ≤ κj ‖E‖∞ (3.4)

    |νj − ν̂j |νj

    ≤ κj ‖E‖∞ (3.5)

    Different condition numbers have been proposed. The majority of perturbation bounds are functions

    of the fundamental matrix of the Markov chain, i.e. Z. The fundamental matrix Z is the group inverse

    of the I−R, i.e. Z = (I−R)]. We will discuss the definition of fundamental matrix and its properties inthe next parts of this chapter. Some examples of proposed condition numbers are shown in the following

    equations, [96], [47], [36], [58].

    κ1 = ‖Z‖∞, (3.6)

    κ2 =maxj(z

    ]jj −mini z

    ]ij)

    2, (3.7)

    κ3 = maxij|zij |, (3.8)

    These condition numbers are mostly calculated as a function of fundamental matrix elements. Al-

    though these bounds provide good numerical measures of robustness of Markov chain against the per-

    turbations, they do not convey qualitative information about the structure of the Markov chain. In

    other words, the robustness of the Markov chain is not clear from the structure of the Markov chain.

    The other condition numbers considering the passage time as a measure of Markov chain robustness are

    shown in following [23], [18]

    κ4 =1

    2maxj

    [maxi 6=jmij

    mjj

    ](3.9)

    κ5 = K, K =∑j

    mijνj (3.10)

    Mean first passage time (MFPT) can sometimes help to discuss the expected sensitivity of the Markov

    chain models merely by observing the structure of the chain without the need for computing different

  • Chapter 3. Robustness and Perturbation Effects in Markov Chains 22

    Markov chain parameters such as fundamental matrix and stationary distributions.

    In this chapter, we propose a new definition of robustness of a Markov chain. In order to define the

    new robustness measure, we should first define the node (state) betweenness for each node in the Markov

    chain. We define the node betweenness as the number of times a random walker that wanders around in

    the network, visits the node. It can be shown that if the number of hops of the random walker goes to

    infinity, the proportion of times the random walker visits a node to total number of hops converges to the

    stationary distribution of the Markov chain. However, we consider a case in which the random walker

    starts at a specific node and ends walking at a specific destination. Therefore, the node betweenness

    becomes very dependent on the structure of the Markov chain and the source-destination positions. In

    future chapters, we use the analogy between the packet flows in the packet networks and the random

    walk model to relate the Markov chain robust design to the network design problem.

    We define the Markov chain robustness as the sensitivity of node betweenness to the perturbation

    happens in transition matrix R. By using this definition, we propose a robustness factor analogous to

    the condition number that can be used as a design criterion. We show that the new proposed robustness

    factor is tightly related to the passage time of the Markov chain that gives us a powerful design tool in

    the context of robust Markov chain design. As it will be discussed in future chapters, this consideration

    of the Markov chain robustness has interesting applications in communication network design problems.

    As it is shown in condition number examples, fundamental matrix and Kemeny constant play an

    important role in characterizing the robustness of the Markov chain. In the next section, we define the

    fundamental matrix and the Kemeny constant and discuss their interesting characteristics.

    3.2 Fundamental Matrix and Kemeny Constant

    To continue our discussion, we first define the fundamental matrix in the Markov Chain Theory. In

    order to define the fundamental matrix we begin by defining

    A = I −R

    where R is a row stochastic matrix. Since, A belongs to a multiplicative group, its inverse is called the

    group inverse of A and is denoted by A] [24]. The powers Rn of a regular transition matrix approach

    a probability matrix R∞ with the rows of the same probability vector ν, i.e.

    R∞ = ξn×1ν

    where ξn×1 is all ones n× 1 vector. In particular the relation between the group inverse A] and thelimiting matrix R∞ is given by [44].

    R∞ = I −AA] (3.11)

    The group inverse matrix A] is a unique matrix satisfying the following three equalities

    AA]A = A A]AA] = A] AA] = A]A (3.12)

    The group inverse matrix A] is also called resolvent, fundamental matrix or Green’s function, i.e. Z = A]

  • Chapter 3. Robustness and Perturbation Effects in Markov Chains 23

    [32]

    Z = (I −R∞) + (R−R∞) + (R2 −R∞) + ... = (I − (R−R∞))−1 −R∞ (3.13)

    This definition of fundamental matrix is different from the variant used in [44, 56, 73], in which Z ′ =

    (R∞+ ξn×1νT )−1 = A] + ξn×1ν

    T . However, in most applications involving the fundamental matrix, the

    term ξn×1νT is redundant [73]. The elements of fundamental matrix , zij , defined in (3.13) have the

    following interpretation as the difference between the number of visits to state j, starting in state i and

    in equilibrium.

    zij = νj(mνj −mij), mνj =∑i

    mijνi (3.14)

    Accordingly, we have the following properties for the fundamental matrix,∑i

    νizij = 0 (3.15)∑j

    zij = 0 (3.16)

    An interesting result is shown in [72] to describe the perturbation effect on the fundamental matrix.

    Consider that a Markov Chain with transition matrix R is perturbed by an error matrix E = R − R′,(Eξn×1 = 0). Therefore, the fundamental matrix of the perturbed Markov Chain has the following form

    (A+ E)] = A](I + EA])−1 − (I −AA])(I + EA])−1A](I + EA])−1 (3.17)

    We use the equation (3.17) to introduce the robustness measure.

    In the following part, we discuss the mean first passage time concept and introduce the Kemeny

    constant of a connected graph. The Kemeny constant is an interesting characteristic of the graph and

    we show that it also has a robustness interpretation in graph theory.

    By starting a Markov chain at state X0 = i, mij is the average waiting time to hit state Xt = j for

    the first time.

    mij = min {t > 0, Xt = j|X0 = i} (3.18)

    The stationary distribution of this Markov chain is a row vector ν such that νR = ν. Therefore, the

    mean first passage time from state i to equilibrium is

    miν =∑j

    mijνj

    It is proven that miν is constant and does not depend on i [32] and the value of miν = K is the Kemeny

    constant (seek time of the chain). Therefore we have,

    K =∑j

    νj∑i

    νimij =∑j

    mijνj

    Kemeny constant is an interesting quantity for ergodic Markov chains and is independent of the initial

  • Chapter 3. Robustness and Perturbation Effects in Markov Chains 24

    state of the Markov chain. In terms of the fundamental matrix, the Kemeny constant can be described

    as

    K =∑j

    zjj = trace(Z) (3.19)

    considering that∑j zij = 0.

    Now consider the eigenvalues of the transition matrix R as

    |βn| ≤ ... ≤ |β2| ≤ β1 = 1 (3.20)

    the following lemma illustrates the relation between the Kemeny constant and the eigenvalues of the

    transition matrix [68].

    Lemma 1

    K =

    n∑i=2

    1

    1− βi

    The Kemeny constant plays an important role in our future discussion of the robustness of the Markov

    architecture.

    In the next section, we discuss the random walk betweenness based on the transition probability

    matrix. Random walk betweenness