UNIVERSITI PUTRA MALAYSIA - psasir.upm.edu.mypsasir.upm.edu.my/id/eprint/56191/1/FK 2013 125RR.pdfProcess management is a fundamental problem in distributed systems. It is fast becoming

UNIVERSITI PUTRA MALAYSIA

PEYMAN BAYAT

FK 2013 125

NEW SYNCHRONIZATION PROTOCOL FOR DISTRIBUTED SYSTEM WITH TCP EXTENSION

© COPYRIG

HT UPM

NEW SYNCHRONIZATION PROTOCOL FOR DISTRIBUTED SYSTEM

WITH TCP EXTENSION

By

PEYMAN BAYAT

Thesis Submitted to the School of Graduate Studies, Universiti Putra Malaysia,

in Fulfillment of the Requirement for the Degree of Doctor of Philosophy

August 2013

© COPYRIG

HT UPM

i

COPYRIGHT

All materials contained within the thesis, including without limitation text, logos,

icons, photographs and all other artwork, is copyright material of Universiti Putra

Malaysia unless otherwise stated. Use maybe made of any material contained within

the thesis for non-commercial purposes from the copyright holder. Commercial use

of material may only be made with the express, prior, written permission of

Universiti Putra Malaysia.

Copyright © Universiti Putra Malaysia

© COPYRIG

HT UPM

ii

To my Parents and my Wife

For their Love

© COPYRIG

HT UPM

iii

Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfilment

of the requirement for the degree of Doctor of Philosophy

NEW SYNCHRONIZATION PROTOCOL FOR DISTRIBUTED SYSTEM

WITH TCP EXTENSION

By

PEYMAN BAYAT

August 2013

Chairman: Associate Professor Abd. Rahman Ramli, PhD

Faculty: Engineering

Process management is a fundamental problem in distributed systems. It is fast

becoming a major performance and design issue for concurrent programming on

modern architectures and for the design of distributed systems. One of the major

duties in process management (synchronization) is mutual exclusion control.

Previous studies related to the topic do not use enough messages in the distributed

system. Hence, the presented solutions are unable to identify active processes that are

currently related to critical sections and dead in the network communication.

Specially, if a fault happened for a process that has a coordinator role or in a critical

section, the inability to detect the faults would cause the system to crash. Faulty

processes may cause some other active processes to become inactive during the

queuing or within the using a critical section time. In addition, the system requires

adding some messages to avoid starvation and deadlock.

On the other hand, previous researches have focused only on time stamp, which is

unfair because there are other critical parameters that are not being considered. The

© COPYRIG

HT UPM

iv

thesis shows that it is possible to model the processes of distributed system in the

form of a three dimensional matrix, so as to optimize fault-tolerant for mutual

exclusion and critical section management. In this regard, the research also presents a

new approach of the race models for distributed mutual exclusion. The new

components such as participation of time stamp, time action, and other parameters

such as special priority are introduced. The matrix is defined as such based on

weight, which is able to solve problems of critical sections in order to obtain better

state of fault-tolerance. After embedding the presented solution, a new protocol is

created and applied. Thus, the new protocol is available to the communicating

packets across all computer networks. The new messages and parameters could add

to the available option part in the TCP packet header format, which has some free

places.

Another aspect of distributed systems, which considered in the thesis, is process

allocation to resources (such as critical sections). Optimization of process allocation

causes to decrease the network traffic and fairer allocation, and therefore,

optimization of fault-tolerance.

The new protocol is simulated with using the OPNET and the MATLAB software.

According to previous researches that evaluated performance to measure fault-

tolerance in this thesis, the network performances are evaluated and compared. The

achieved results should continue to reach a steady state. These results show that the

proposed algorithm, on average, has 7.97% higher fault-tolerance. In the worst

condition, it optimized 3.99% higher than the most fault-tolerant previous works.

© COPYRIG

HT UPM

v

The resulting algorithm does not involve variability in the hardware type nor is it

based on specific distributed application software or databases. Finally, in sensitive

industries, this protocol is capable to offer services on a fault tolerant infrastructure.

© COPYRIG

HT UPM

vi

Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia sebagai

memenuhi keperluan untuk ijazah Doktor Falsafah

PROTOKOL PENYEGERAKAN BARU UNTUK SISTEM TERAGIH

DENGAN TCP TAMBAHAN

Oleh

PEYMAN BAYAT

Ogos 2013

Pengerusi: Profesor Madya Abd. Rahman Ramli, PhD

Fakulti: Kejuruteraan

Proses pengurusan adalah masalah asas dalam sistem teragih. Ia pantas menjadi

persembahan utama dan isu reka bentuk untuk pengaturcaraan serentak pada seni

bina moden dan reka bentuk sistem teragih. Salah satu tugas utama dalam

pengurusan proses (penyegerakan) adalah kawalan pengecualian bersama.

Kajian sebelum ini yang berkaitan dengan topik yang tidak menggunakan mesej yang

cukup dalam sistem teragih. Oleh itu, penyelesaian yang dikemukakan tidak dapat

mengenal pasti proses aktif yang kini yang berkaitan dengan bahagian-bahagian

kritikal dan mati dalam komunikasi rangkaian. Khas, jika kerosakan yang berlaku

untuk proses yang mempunyai peranan penyelaras atau di bahagian yang kritikal,

ketidakupayaan untuk mengesan kesilapan akan menyebabkan sistem untuk

kemalangan. Proses yang rosak boleh menyebabkan beberapa proses aktif yang lain

untuk menjadi aktif semasa beratur itu atau dalam menggunakan seksyen kritikal

masa. Di samping itu, sistem ini memerlukan menambah beberapa mesej untuk

mengelakkan kebuluran dan kebuntuan.

© COPYRIG

HT UPM

vii

Sebaliknya, kajian sebelum ini telah memberi tumpuan hanya pada setem masa, yang

tidak adil kerana terdapat parameter kritikal lain yang tidak dipertimbangkan. Tesis

ini menunjukkan bahawa ia adalah mungkin untuk model proses sistem teragih

dalam bentuk matriks tiga dimensi, untuk mengoptimumkan kesalahan-toleran untuk

pengecualian bersama dan kritikal pengurusan bahagian. Dalam hal ini, kajian ini

juga memberi pendekatan baru model perlumbaan untuk pengecualian bersama

diedarkan. Komponen baru seperti penyertaan setem masa, tindakan masa, dan

parameter lain seperti keutamaan khas akan diperkenalkan. Matriks ditakrifkan

sebagai apa-apa berdasarkan berat badan, yang mampu menyelesaikan masalah

seksyen kritikal dalam usaha untuk mendapatkan keadaan yang lebih baik daripada

kesalahan-toleransi. Selepas menerapkan penyelesaian yang dikemukakan, protokol

baru dibuat dan digunakan. Oleh itu, protokol baru ini boleh didapati untuk paket

berkomunikasi di semua rangkaian komputer. Mesej baru dan parameter boleh

menambah pilihan bahagian yang terdapat di paket TCP header format, yang

mempunyai beberapa tempat percuma.

Satu lagi aspek sistem teragih, yang dianggap dalam tesis, adalah proses untuk

memperuntukkan sumber-sumber (seperti bahagian kritikal). Mengoptimumkan

peruntukan menyebabkan proses untuk mengurangkan trafik rangkaian dan

pengagihan lebih adil, dan oleh itu, pengoptimuman kesalahan-toleransi.

Protokol baru adalah simulasi dengan menggunakan Opnet dan perisian MATLAB.

Menurut kajian sebelumnya yang dinilai prestasi untuk mengukur kesalahan-

toleransi dalam tesis ini, persembahan rangkaian dinilai dan dibandingkan.

© COPYRIG

HT UPM

viii

Keputusan yang dicapai perlu diteruskan untuk mencapai keadaan yang stabil.

Keputusan ini menunjukkan bahawa algoritma yang dicadangkan itu, secara purata,

mempunyai 7.97% lebih tinggi kesalahan-toleransi. Dalam keadaan yang paling

teruk, ia dioptimumkan 3.99% lebih tinggi daripada kebanyakan karya kesalahan-

bertoleransi sebelumnya.

Algoritma yang terhasil tidak melibatkan kepelbagaian dalam jenis perkakasan tidak

juga berdasarkan perisian aplikasi khusus diedarkan atau pangkalan data. Akhirnya,

dalam industri-industri sensitif, protokol ini mampu untuk menawarkan

perkhidmatan di atas infrastruktur kesalahan toleran.

© COPYRIG

HT UPM

ix

ACKNOWLEDGEMENT

The author wishes to express his gratitude to his supervisor, Assoc. Prof. Dr. Abd

Rahman Ramli, who was abundantly helpful and offered invaluable guidance and

support. Deepest gratitude is also due to the committee members, Assoc. Prof. Dr.

Mohammad Iqbal Saripan and Dr. Syamsiah Mashohor whose advice, knowledge,

and experience provided a path of success for this research.

© COPYRIG

HT UPM

x

I certify that a Thesis Examination Committee has met on 23th

August, 2013 to

conduct the final examination of Peyman Bayat on his thesis entitled “new

synchronization protocol for distributed system with tcp extension” in accordance

with the Universities and University College Act 1971 and the Constitution of the

Universiti Putra Malaysia [P.U.(A) 106] 15 March 1998. The committee

recommends that the student be awarded the Doctor of Philosophy.

Members of the Thesis Examination Committee were as follows:

Mohd Adzir Mahdi, PhD

Professor

Faculty of Engineering

University Putra Malaysia

(Chairman)

Mohammad Hamiruc Marhaban, PhD

Associate Professor



(Internal Examiner)

Nasir Sulaiman, PhD

Associate Professor

Faculty of Computer Science and IT


(Internal Examiner)

Zahir Tari, PhD

Professor

School of Computer Science and IT

Royal Melbourne Institute of Technology, Melbourne

Australia

(External Examiner)

NORITAH OMAR, PhD

Associate Professor and Deputy Dean

School of Graduate Studies

Universiti Putra Malaysia

Date: 17 October 2013

© COPYRIG

HT UPM

xi

This thesis was submitted to the Senate of Universiti Putra Malaysia and has been

accepted as fulfillment of the requirement for the degree of Doctor of Philosophy.

The members of the Supervisory Committee were as follows:

Abd. Rahman Ramli, PhD

Associate Professor



(Chairman)

Mohammad Iqbal Saripan, PhD

Associate Professor



(Member)

Syamsiah Mashohor, PhD

Senior Lecturer



(Member)

_____________________________________

BUJANG BIN KIM HUAT, PhD Professor and Dean

School of Graduate Studies


Date:

© COPYRIG

HT UPM

xii

DECLARATION

I hereby confirm that:

- This thesis is my original work;

- Quotations, illustrations and citations have been dully referenced;

- This thesis has not been submitted previously or concurrently for any other

degree at any other institutions;

- Intellectual properly from the thesis and copyright of thesis are fully owned

by Universiti Putra Malaysia, as according to the University Putra Malaysia

(Research) Rules 2012;

- Written permission must be obtain from supervisor and the office of Deputy

Vice-Chancellor (Research and Innovation) before thesis is published (in the

form of written, printed or in electronic form) including books, journals,

modules, proceedings, popular writings, seminar papers, manuscripts, posters,

reports, lecture notes, learning modules or any other materials as stated in the

Universiti Putra Malaysia (Research) Rules 2012;

- There is no plagiarism or data falsification/fabrication in the thesis, and

scholarly integrity is upheld as according to the Universiti Putra Malaysia

(research) Rules 2012. He thesis has undergone plagiarism detection

software.

Signature: _____________ Date: 23 August 2013

Name and Matric No.: Peyman Bayat, GS19481

© COPYRIG

HT UPM

xiii

TABLE OF CONTENTS

Page

ABSTRACT iii

ABSTRAK vi

ACKNOWLEDGEMENTS ix

APPROVAL x

DECLARATION xii

LIST OF TABLES xvi

LIST OF FIGURES xvii

LIST OF ABBREVIATIONS xx

CHAPTER

1 INTRODUCTION 1

1.1 Introduction 1

1.2 Problem statement 3

1.3 Objectives 4

1.4 Contribution 6

1.5 Scope of research 6

1.6 Thesis organization 7

2 LITERATURE REVIEW 9

2.1 Introduction 9

2.2 Distributed systems architecture 10

2.3 Message Passing in Distributed System 11

2.4 Fault-tolerance 12

2.4.1 Synchronization 14 2.4.2 Process management 18

2.5 Centralized algorithms 18

2.6 Process Allocation 25 2.6.1 Pigeon Hole Principle (PHP) 29

2.7 TCP model 30 2.7.1 TCP performance Analysis 31

2.7.2 Fairness optimization based on wireless TCP 32

2.8 Summary 33

3 RESEARCH METHODOLOGY 35

© COPYRIG

HT UPM

xiv

3.1 Introduction 35

3.2 Distributed system model 37

3.3 DMX algorithm features and measures 40 3.3.1 Properties of DMX approaches 40 3.3.2 Measures of DMX approaches 42

4 DESIGN AND SIMULATION OF A NEW PROTOCOL 43

Distributed system simulation 43

4.1 State diagrams in OPNET 46

4.2 New queue creation and normalization 52

4.3 Calculation of model‟s parameters 56

4.4 A practical example 57

4.5 Scheduler simulation in MATLAB 58 4.5.1 Tightly-coupled distributed systems 58 4.5.2 Loosely-coupled distributed systems 61

4.6 Job (process) allocation extension 61 4.6.1 PHP extension 62 4.6.2 Extension of job allocation competitive model 63

4.7 IPC based complementary algorithm 65 4.7.1 The new protocol steps 68 4.7.2 The new protocol operations 74 4.7.3 Optimization 76

4.8 Important scenarios in the absence of coordinator 80 4.8.1 A process crash 80 4.8.2 An in-region process leaves the CS 83 4.8.3 A process requests for CS 85 4.8.4 Start-up time 87

4.9 Extended centralized algorithms 89 4.9.1 Extension 1: coordinator crashes 90

4.9.2 Extension 2: unrelated process crashes 92 4.9.3 Extension 3: requested process crashes 92

4.9.4 Extension 4: in-region process crashes 94 4.9.5 Extension 5: receiving requests in disorder 96

4.10 Apply the new presented protocol into the TCP 98

4.11 Processing complexity in coordinator 98

4.12 Correctness and proof 99

4.13 Errors in distributed systems 102

4.14 Reliability and mathematical discussion 103

4.15 Summary 103

5 RESULTS AND DISCUSSIONS 105

© COPYRIG

HT UPM

xv

5.1 Introduction 105

5.2 Crashing and fault-tolerance 106

5.3 Fault-tolerance in a restart able distributed system 106

5.4 Fault-tolerance in non-restart able distributed systems 107

5.5 Distributed System Error Modeling Results 108

5.6 Errors in simulated distributed systems 109

5.7 Results from adding the IPC messages to the system 110

5.8 Results from adding the other parameters to the system 112

5.9 Network traffic evaluation 113

5.10 Errors 115

5.11 Fault-tolerance in a non-bottleneck (Single point of failure) 117

5.12 The process (job) allocation results 118

5.13 Deadlocks in the new distributed system 120

5.14 Some of assumed network characteristics 121

5.15 Comparison of the new and the past network behaviors 123

5.16 Packet loss 125

5.17 Comparison of the presented protocol and Attiya et al.‟s solution 128

with Regression 128

5.18 Summary 131

6 CONCLUSIONS AND SUGGESTION FOR FUTURE WORKS 132

6.1 Conclusions 132

6.2 Comparison with Previous Researches 134

6.3 Discussion 134

6.4 Suggestion for future works 135

REFERENCES 140

BIODATA OF STUDENT 151

LIST OF PUBLICATION 152

© COPYRIG

HT UPM

xvi

LIST OF TABLES

Table Page

2.1 Failure Models (Cristian, 1991; Hadzilacos and Toueg, 1993) 14

2.2 Characteristics of TCP (Akyildiz et al., 2002; Bhandarkar

and Reddy, 2004)

31

3.1 Five processes and their related parameter's values 54

5.1 Coefficients of Attiya et al. algorithm 130

5.2 Corporation of the three parameters 130

© COPYRIG

HT UPM

xvii

LIST OF FIGURES

Figure Page

2.1 A client-server concept which can develop to peer to peer

architecture. 11

2.2

Conditions of: (a) a safe server, (b) after execute a process server

crashes which for the previous request does not problem but next

requests will down and (c) the server after receiving a request

crashed, so the system is down.

12

2.3 A basic centralized algorithm, (a) Process 1 asks, (b) Process 2

asks, (c) Process 1 exits. 20

2.4

In the Chang et al. algorithm, a dynamic star is exists and the

mentioned data structure performs on machines with critical

section.

23

2.5 Work flow of philosophers situations (Attiya et al.) 24

2.6 TCP parts and characteristics. 32

3.1 Flow diagram of the research methodology 38

4.1 Infrastructure of the created distributed system in OPNET. 45

4.3 Second layer of a client in OPNET 46

4.4 Second layer of server in OPNET 47

4.5 The state diagram of application 49

4.6 The state diagram of TCP in TCP layer 49

4.7 Queues for lower and higher layers communication designed in

TCP layer 50

4.8 The state diagram of wireless LAN MAC 51

4.9 The state diagram of Queue 1 – 2 51

4.10 The state diagram of Queue Fair 52

4.11 The state diagram of CPU 53

4.12

Source codes in MATLAB, to create 500 ordered experiments by

multiple the generated random numbers to the five obtained

processes‟ parameters.

55

4.13 A scheduler assigns the jobs to machines in a distributed system. 63

4.14 Induction lemma 63

4.15 Extension of PHP in terms of number. 66

4.16

New protocol and process behaviors, the numbers inside the

circles are process numbers in different statuses, C is coordinator,

the right rectangle (with the process number, which is executing

the CS) is CS and the left rectangle is the flag to control the CS.

67

4.17 The new protocol staring, considering coordinator status during

the related sequence time. 68

© COPYRIG

HT UPM

xviii

4.18 Situations after coordinator election. 72

4.19 Steps one and two in process operations. 72

4.20 Steps one and two of new protocol. 72

4.21 Step 3 in the process level. 74

4.22 Step three of the new complementary protocol. 74

4.23 Coordinator and flag behavior in the new complementary

protocol. 76

4.24 New epoch in the new complementary protocol. 77

4.25 Optimization for the new complementary protocol. 78

4.26 The new complementary protocol for a centralized DMX

algorithm. 80

4.27 Different conditions of a crashed process. 82

4.28 The solution when an in-region process crashes, during the

coordinator‟s absence. 82

4.29 In-region process crash scenario, while coordinator is down. 83

4.30 Requesting process crash scenario, while coordinator is down. 76

4.31 An in-region process leaves the CS during coordinator‟s absence. 86

4.32 A process before send a “Request”, receives a “New Epoch”. 87

4.33 A process requests CS during coordinator‟s absence. 88

4.34 A new coordinator at the distributed system‟s starting time. 90

4.35 Crashed coordinator discovery, by the distributed system‟s

processes. 92

4.36 Extension 1 92

4.37 When a process, which is not related to a critical section crashes,

the system do nothing. 93

4.38 When a requested process in the queue of a critical section‟s

crashes. 95

4.39 Extension 3. 95

4.40 The protocol, when an in-region process crashes. 97

4.41 Extension 4 97

4.42 Ordering received disorder requests, in a critical section queue. 98

5.1 Comparison of performance in restart able and the new presented

algorithm in a distributed system 107

5.2 Comparison of performance in non-restart able distributed system 109

5.3 Errors in the distributed system 109

5.4 Scope and non-scope errors 111

© COPYRIG

HT UPM

xix

5.5 Performance from adding messages to the system 112

5.6 Inverse relationship between system error and performance 113

5.7

Comparison of system performance with the proposed algorithm

and Attiya et al.‟s algorithm without crashing error and based on

traffic evaluation

115

5.8 Error rate for k = 0.1 117

5.9 Error rate for k = 0.9 117

5.10 Performance of three machines which are working concurrently 118

5.11 During the 20 Second the number of requests is less than the

distributed system capacity 120

5.12 Some of messages lost because of hardware limitation 121

5.13 The number of deadlocks which cause to crash in Ding et al. at a

long time 122

5.14 Throughput in the case of bits per second and packets per second 123

5.15 Number of collisions in the distributed system 123

5.16 Queuing delay in the whole of the system 124

5.17

the result of optimized process allocation, the blue line is

applying the new presented algorithm in a computer member of

the system

125

5.18 The result of optimized process allocation, the blue line is

applying the new presented algorithm in the whole of the system 126

5.19 Packet loss, the blue line is the Attiya et al.‟s algorithm loss rang

and the red line is the presented loss 127

5.20 Traffic in the distributed system after crashing. The blue line is

the Attiya et al.‟s algorithm and the red line is the new algorithm 128

5.21

Difference between utilization of the distributed system, the blue

line is Attiya et al.‟s algorithm and the red line is the presented

algorithm

129

5.22

The increased values, which are achieved for performance

optimization by applying the embedded fault-tolerant protocol

into the distributed system.

131

© COPYRIG

HT UPM

xx

LIST OF ABBREVIATIONS

IPC Inter Process Communication

TCP Transmission Control Protocol

DCS Distributed Control Systems

PHP Pigeon Hole Principle

ISO International Standard Organization

MIMD Multi Instruction Multi Data

OS Operating System

OSI Open Systems Interconnection

TSL Test Set Lock

DMX/DME Distributed Mutual Exclusion

LAN Local Area Network

PDA Personal Digital Assistant

MAC Medium Access Control

STFQ Start Time Fair Queuing

DCF Distributed Coordination Function

NTP Network Time Protocol

PTP Precision Time Protocol

DTM Dynamic synchronous Transfer Mode

GPS Global Positioning System

WAN Wide Area Network

RPC Remote Procedure Call

CA Cristian Algorithm

BMC Best Master Clock

QoS Quality of Service

CS Critical Section

Ack Acknowledgement

DBMS Data Base Management System

AYA Are You Alive?

Gb Gigabit

LB Laxity Based

COTRC Channel Occupancy Time based Rate Control

BSS Basic Service Set

http://en.wikipedia.org/wiki/Dynamic_synchronous_Transfer_Mode

© COPYRIG

HT UPM

xxi

BDP Bandwidth Delay Product

NS Network Simulator

IF Initialization Function

IA I‟m Alive

IP Internet Protocol

© COPYRIG

HT UPM

CHAPTER 1

1 INTRODUCTION

1.1 Introduction

Yearly, large numbers of people die due to machinery accidents such as aircraft and

air flight control, train control, sensitive industrial in factories or electricity

generations faulty (Giua and Seatzu, 2008). All crashes cost persons and their money

and time. Today, most of these systems are computerized and network-based,

therefore, sensitive networks (like mentioned systems) are needed to ensure survival

and fairness resulting from fault-tolerant systems (Eger and Killat, 2007). Crash-free

systems are possible through the design of distributed systems in sensitive networks

(Lau, 2005). Fault-tolerance in general is the ability of a system to continue operating

as expected, despite internal or external failures.

To achieve fault-tolerance, distributed systems should address different types of

uncertainties: processor non-determinism, uncertain message delivery times,

unknown message ordering in a queue, and processor/communication failures

(Lodha and Kshemkalyani, 2000). Processor non-determinism is caused by process

scheduling algorithms (Alonso et al., 2007). Processor and communication failures

occur due to hardware and software failures.

Mutual exclusion algorithms for distributed systems are divided in three groups

consist of: centralized algorithms, distributed algorithms and token ring algorithms

© COPYRIG

HT UPM

2

(Li et al., 2010). One of the most practical groups is centralized algorithms

(Vincenzo et al., 1994; Agrawal and El-Abbadi, 1991), but they have disadvantages

of having lower fault-tolerance and being a single point of failure in distributed

systems, hence leading to the decrease in fault-tolerance of the entire system (Attiya

et al., 2010; Simon et al., 2001).

In a centralized algorithm, in which a centralized machine or process performs

synchronization and all other processes, should communicate with this process or

machine. Each process has to get permission from this centralized coordinator to

enter a critical section (Agrawal and El-Abbadi, 1991).

Centralized algorithms onto a special part of a distributed system to decide on mutual

exclusion and on whose turn it is to enter the Critical Section (CS). These algorithms

are divided into two types. First, those algorithms that have a central referee to

decide and second those do not have central referee, in which rather they decide

according to participating most of the workstations (Tanenbaum, 2008).

As a base and important protocol in TCP/IP protocol status, TCP has been

extensively verified to data transmission on these networks (Biaz and Vaidya, 2005).

TCP utilizes the technology of positive acknowledgement with retransmission to

solve the instability under level IP protocol and offers reliable data transmission

service for application protocols (Fu et al., 2004). For the networks that are based on

TCP, there are two communication peers, which are client(s) and server(s) (Zabir et

al., 2004). A TCP connection is normally initiated by client‟s request and by server‟s

© COPYRIG

HT UPM

3

response messages. After establishing the TCP connection, the server will offer data

service to clients through this connection (Casetti et al., 2002).

1.2 Problem statement

The following are problems that exist in centralized algorithms‟ fault-tolerance,

which will be solved in this thesis:

The first problem statement is process crash, based on process synchronization‟s

algorithms (IPC), including below different situations:

If the coordinator crashes, in this case, there is no process to manage or to control

critical section, hence the past queue will be lost (De Sa et al., 2012; Attiya et al.,

2010).

If a process in a critical section queue crashes, so, when the process reached to the

critical section and entered the section, starvation will happen to other processes.

Next, after this process, they will not know if the process inside the critical section is

dead or otherwise (De Sa et al., 2012; Attiya et al., 2010; Ding et al., 2008).

If a process in the critical section crashes, a starvation happens for other processes in

the related queue, because they never can enter the critical section ((De Sa et al.,

2012; Attiya et al., 2010; Ding et al., 2008).

© COPYRIG

HT UPM

4

The second problem statement is a system crash due to lack of some parameters for

critical section entrance (De Sa et al., 2012; Attiya et al., 2010). There are some other

important parameters in addition to the timestamps in distributed systems, such as,

time action (execution time), emergency priority or delay (Yaashuwanth and

Ramesh, 2010; Li and Zhou, 2009; Alpcan and Bas, 2005; Iordache et al., 2002;

Galli, 2000). Additionally, previous works did not consider these parameters

together, which is important for aspect of fairness (De Sa et al., 2012; Attiya et al.,

2010; Ding et al., 2008).

The third problem statement is process allocating to resources in distributed systems

(Ding et al., 2008). A fault-tolerant distributed system need to be fair in process

allocation to it‟s resources (Lodha and Kshemkalyani, 2000). For a critical section,

an optimized process allocation can be fairer and keeps distributed system alive,

because of avoid to some faults in such system (Yumin et al., 2010).

1.3 Objectives

The main goal is to optimize fault-tolerance. Of course, according to the previous

researches, to evaluate fault-tolerant in a system, the system performance should be

evaluated (Ekwall and Schiper, 2011; Wang et al., 2008; Wang and Wu, 2007; Yang

et al., 2005; Wang et al., 2005; Pleisch and Schiper, 2003).

The first objective is optimization in the process crash problem of mutual exclusion

and critical section in centralized algorithms. The new protocol in centralized

approach should cover coordinator crash so that if a coordinator crashes, all of the

© COPYRIG

HT UPM

5

processes in the past queue should be recoverable and the ordering problem should

be solves by some addition messages in Inter Process Communication (IPC). Also, if

any process in the critical section or in the related queue crashes, it should not to

causes crash whole the system. In these cases, identifying the dead processes in

critical and sensitive areas is important to ensure there will be no starvation.

The second objective is to propose additional parameters such as priority, time action

(or duration of the activation process in critical section), and delay. These parameters

complement the timestamp parameter especially when they cause a process into the

critical section queue or coordinator does not crash, then the system will be more

fault-tolerant. While previous researches only depend on one variable (time stamp)

(De Sa et al., 2012; Attiya et al., 2010; Ding et al., 2008), the new approach is

depend on additional variables .With this, at least one of four conditions of deadlock

may be avoided and the system becomes a deadlock-free system (Li et al., 2008; Chu

et al., 1997). In many cases, this also means that processes will need to grant mutual

exclusive access to the critical section (Conga and Bader, 2006). Of course, the new

obtained queue should be reordered according to the important parameters by using a

fast sorter algorithm.

The third objective is optimization of process allocation into resources, especially for

processes in a race condition to critical sections entrance in a distributed system. The

presented solution for a resource is based on Pigeon Hole Principle (PHP) algorithm

and in the case of extension, the approach should be conflict free, and it is based on

induction lemma.

© COPYRIG

HT UPM

6

1.4 Contribution

The primary contribution of this thesis is fault-tolerant and efficient mutual exclusion

algorithms. The new protocol has the following advantages:

Identify a dead process whether it is in the critical section or it‟s queue,

and if the dead process has a coordinator effect. This is to avoid starvation

in the network and related to the first objective (Lawley and Reveliotis,

2001).

Increase fairness with involve other important parameters beyond the

timestamp depending on characteristics of a distributed system. This

contribution is related to second objective, and causes to have a deadlock

free and so, more fault-tolerant distributed system.

Allocate jobs or processes better to the existing machine(s) based on

mathematical proofs, and avoid to resource conflicting related to the third

objective.

1.5 Scope of research

From the practical view, attention to characteristics in distributed systems is very

important. Distributed systems are known to exhibit certain differences in a network.

In this dissertation, the scope of the study lies within the software network and its

related layers. The choosing of a protocol and tracking its changes in the open source

© COPYRIG

HT UPM

7

condition (for example Linux OS) will achieve fairness and is able to extend beyond

the single dimension of time space. This implies that adding new messages to the

distributed system in suitable layer(s) is possible.

The proposed algorithm is scoped to areas related to a distributed operating system

with a micro kernel and so, other distributed operating systems (based on monolithic

kernel) are not in scope.

Hardware problems are not including to the project and also, assumed there are N

processes in the distributed system. The system is based on asynchronous message

passing and process assumed work correctly which they are not various (Joung,

2000). The distributed system has FIFO channels for data (packet) communications,

(Lee et al., 2008; Leung and Li, 2003).

There are several errors can happen in a distributed system but in this research will

focused only on errors related to the critical section.

1.6 Thesis organization

Chapter 2 details out the literature reviews of previous works related to the

centralized algorithm, distributed mutual exclusion, as well as optimization.

Optimizations are performed in the proposed algorithms and approaches in order to

obtain better fault-tolerance in centralized algorithms and fault-tolerant process

allocation.

© COPYRIG

HT UPM

8

Chapter 3 consists of the methodology including the system model, distributed

system assumptions, identification of new message(s) and improvement to the

previous algorithms.

Chapter 4 further deliberates on fault-tolerance, steps and points of the new protocol,

extensions in the new DMX algorithm, race conditions as well as the competitive

model for processes involving the new parameters. It also presents a simulation in

two simulation software, OPNET and MATLAB using various Distributed

Computing Toolbox and programming. Finally, this chapter presents the process

allocation with Pigeon Hole Principle (PHP) and its related aspects to the entire

distributed system using the Induction Lemma.

Chapter 5 illustrates the results of simulation with figures and considers in depth the

effects of the new protocol and its parameters through comparison with findings

using previous solutions.

Finally, Chapter 6 discusses the analysis from a comparison of result with previous

algorithms. The chapter finally concludes with some indication for future practical

works based on the proposed presented algorithm and approach.

© COPYRIG

HT UPM

140

REFERENCES

Agrawal, D. and El-Abbadi, A. (1991). An efficient and fault-tolerant solution of

distributed mutual exclusion. ACM Transaction Journal on Computer Systems.

9: 1-20.

Akyildiz, I.F., Zhang, X. and Fang, J. (2002). TCP-Peach: enhancement of TCP-

peach for satellite IP networks. IEEE Communications Letters. 6(7): 303-305.

Akyildiz, L. A., Morabito, G. and Palauo, S. (2001). TCP-Peach: A new congestion

control scheme for satellite IP networks. IEEE/ACM Transactions on

Networking. 9(3): 301-321.

Alonso, S. H., Perez, M. R. and Viega, M. F. (2007). Fairness in presence of

backward traffic. IEEE Communications Letters. 11(3): 273-285.

Alpcan, T. and Bas, T. (2005). A utility-based congestion control scheme for

Internet-style networks with delay. IEEE Transactions on Networks. 13(6):

1261-1274.

Atreya, R., Mittal, N. and Peri, S. (2007). A quorum-based group mutual exclusion

algorithm for a distributed system with dynamic group set. IEEE Transactions

on Parallel and Distributed Systems. 18(10): 1345-1360.

Attiya, H., Kogan, A. and Jennifer, L. (2010). Efficient and robust local mutual

exclusion in mobile Ad Hoc networks. IEEE Transactions on Mobile

Computing. 9 (3): 361-375.

Bae, C. S. and Cho, D. H. (2007). Fairness-aware adaptive resource allocation

scheme in multihop OFDMA systems. IEEE Letters on Communications. 11(2):

134-136.

Banaszak, Z. and Krogh, B. H. (1990). Deadlock avoidance in flexible

manufacturing systems with concurrently competing process flows. IEEE

Transactions on Robot Automatics. 6(6): 724-734.

Barkaoui, J. F. and Peyre, P. (1996). On liveness and controlled siphons in Petri

Nets. Springer-Verlag. 1091: 57-72.

Basile, F. ,Chiacchio, P. and Giua, A. (2007). An optimization approach to Petri Net

monitor design. IEEE Transactions on Automatic Control. 52(2): 306-311.

Basile, F., Carbone, C. and Chiacchio, P. (2007). Feedback control logic for

backward conflict free choice nets. IEEE Transactions on Automatic Control.

52(3): 387-400.

Beauquier, J., Cantarell, S., Datta, A.K. and Petit, F. (2003). Group mutual exclusion

in tree networks. Journal of Information Science and Engineering (JISE). 19(3):

415–432.

© COPYRIG

HT UPM

141

Bhandarkar, S. and Reddy, A.L.N. (2004). TCP-DCR: making TCP robust to non-

congestion events. Lecture Notes in Computer Science. 3 (4): 712-724.

Bianchi G. (2000). Performance analysis of the IEEE 802.11 distributed coordination

function. IEEE Journal Selected Areas Communications. 18(3): 535-547.

Biaz, S. and Vaidya, N.H. (2005). De-randomizing congestion losses to improve

TCP performance over wired-wireless networks. IEEE/ ACM Transactions on

Networking. 13(3): 596-608.

Bini, E. and Buttazzo, G. (2004). Schedule ability analysis of periodic fixed priority

systems. IEEE Transactions on Computers. 53(11): 1462-1473.

Boer, E. R. and Murata, T. (1994). Generating basis siphons and traps of Petri Nets

using the sign incidence matrix. IEEE Transactions on Circuits Systems. 41(4):

266-271.

Bohacek, S., Hespanha, J.P., Lee, J. L. C. and Obraczka, K. (2006). A New TCP for

persistent packet reordering. IEEE/ACM Transactions on Networking. 14(2):

369-382.

Bouyekhf, R. and Moudni, A. E. (2005). On the analysis of some structural

properties of Petri Nets. IEEE Transactions on Systems, MAN and Cybernetics.

35(6): 748-794.

Bradley L. (2005). Testing Concurrent Java Components. Published doctoral

dissertation, University of Queensland, Australia.

Cali, F., Conti, M. and Gregori, E. (2000). Dynamic tuning of the IEEE 802.11

protocol to achieve a theoretical throughput limit. IEEE/ACM Transactions on

Networks. 8(6): 785-799.

Cao, G. and Singhal, M. (2001). A delay-optimal quorum-based mutual exclusion

algorithm for distributed systems. IEEE Transactions on Parallel and

Distributed Systems. 12(12): 1256-1268.

Casetti, C., Gerla, M., Mascolo, S., Sanadidi, M.Y. and Wang, R. (2002). TCP west

wood: end-to-end congestion control for wired/wireless networks. Wireless

Networks. 8(5): 467-479.

Chen, X., Wong, S. C. and Tse, C. K. (2010). Adding randomness to modeling

Internet TCP-RED systems with interactive gateways. IEEE Transactions on

Circuits and Systems-II. 57(4): 300-304.

Choi, J., Yoo, J. and Kim, C. K. (2006). A novel performance analysis model for an

IEEE 802.11 wireless LAN. IEEE Communications Letter. 10(5): 335-337.

Choi, L., Yoo, J. and Kim, C. K. (2008). A distributed fair scheduling scheme with a

new analysis model in IEEE 802.11 wireless LANs. IEEE Transactions on

Vehicular Technology. 57(5): 3083-3093.

© COPYRIG

HT UPM

142

Chu, F. and Xie, X. L. (1997). Deadlock analysis of Petri Nets using siphons and

mathematical programming. IEEE Transactions on Robot Automatics. 13(6):

793-804.

Conga, G. and Bader, D. (2006). Designing irregular parallel algorithms with mutual

exclusion and lock-free protocols. Journal of Parallel and Distributed

Computing. 66: 854-866.

Cortes, J., Martinez, S. and Bullo, F. (2006). Robust rendezvous for mobile

autonomous agents via proximity graphs in arbitrary dimensions. IEEE

Transactions on. Automatic Control. 51(8): 1289-1298.

Cristian, F. (1989). A probabilistic approach to distributed clock synchronization.

IEEE Distributed Computing Systems. 11(1): 288-296.

Cuevas, A., Moreno, J. I., Vidales, P. and Einsiedler, H. (2006). The IMS platform:

A solution for next generation network operators to be more than bit pipes. IEEE

Communication Nagezine, Special Issue on Advances of Service Platform

Technologies. 44(8): 75-81.

De Sa, A. G. C, Heimfarth, T., De Oliviera, H. E. and De Freitas, E. P. (2012). An

algorithm to coordinate missions in wireless sensor networks. IEEE Latin

America Transaction. 10(2): 1595-1602.

Delporte-Gallet, C., Fauconnier, H., Guerraoui, R. and Kouznetsov, P. (2005).

Mutual exclusion in asynchronous systems with failure detectors. Journal of

Parallel and Distributed Computing (JPDC). 65(4): 492–505.

Deng, J. and Chang, R. S. (1999). A priority scheme for IEEE 802.11 DCF access

method. IEICE Transactions on Communications. 82(1): 96-102.

Dhillon, B. S. (2005). Reliability, Quality and Safety for Engineers. Boca Raton,

Florida: CRC Press.

Dimarogonas, D. V., Loizou, S. G., Kyriakopoulos, K. J. and Zavlanos, M. M.

(2006). A feedback stabilization and collision avoidance scheme for multiple

independent non-point agents, Automatica Journal. 42(2): 229-243.

Ding, Z. J., Jiang, C. and Zhou, M. C. (2008). Deadlock checking for oneplace

unbounded Petri nets based on modified reach ability trees. IEEE Transactions

on Systems, MAN and Cybernetics. 38(3): 881-883.

Eddy, W. M., Ostermann, S. and Allman, M. (2004). New techniques for making

transport protocols robust to corruption-based LOSS. ACM Computer

Communication. 4(9): 87-96.

Eger, K. and Killat, U. (2007). Fair resource allocation in peer-to-peer networks

(extended version). Journal of Computer Communications. 30(16): 3046-3054.

© COPYRIG

HT UPM

143

Eidson, J. (2002). Synchronizing measurement and control systems. Sensors

Magazine. 19(11): 21-35.

Ergen, M., Coleri, S. and Varaiya, P. (2003). QoS aware adaptive resource allocation

techniques for fair scheduling in OFDMA based broadband wireless access

systems. IEEETransaction on Wireless Networks. 7(4): 56-68.

Ekwan R. and Schiper A. (2011). A fault-tolerant token-based atomic broadcast

algorithm. IEEE Transactions on Dependable and Secure Computing. 8 (5):

625-639.

Ezpeleta, J., Colom, J. M. and Martinez, J. (1995). A Petri net based deadlock

prevention policy for flexible manufacturing systems. IEEE Transactions on

Robot Automatics. 11(2): 173-184.

Fanti, M. P. and Zhou, M. C. (2004). Deadlock control methods in automated

manufacturing systems. IEEE Transactions on System, MAN and Cybernets.

34(1): 5-22.

Feldman, M. and Chuang, J. (2005). Overcoming free-riding behavior in peer-to-peer

systems. ACM SIGecom Exchanges Journal. 5(4): 41-50.

Floyd, S., Mahdavi, J., Mathis, M. and Podolsky, M. (2000). An extension to the

selective acknowledgement (SACK) option for TCP. IETF RFC. 28: 83-95.

Foh, C. H. and Tantra, J. W. (2005). Comments on IEEE 802.11 saturation

throughput analysis with freezing of backoff counters. IEEE Communications

Letter. 9(2): 130-132.

Franks, G., Al-Omari, T., Woodside, M., Das, O. and Derisavi, S. (2009). Enhanced

modeling and solution of layered queuing networks. IEEE Transactions on

Software Engineering. 35(2): 148-161.

Fu, C.P. and Liew, S.C. (2003). TCP Veno: TCP enhancement for transmission over

wireless access networks. IEEE Journal Selected Areas in Communications.

21(2): 216-228.

Fu, Z., Meng, X. and Lu, S. (2004). A transport protocol for supporting multimedia

streaming in mobile ad hoc networks. IEEE journal on selected areas in

communications. 21(10): 112-126.

Galli, D.L. (2000). Distributed Operating Systems, Concepts & Practice:

Washington: Prentice-Hall publication.

Giua, A. and Seatzu, C. (2008). Modeling and supervisory control of railway

networks using Petri Net. IEEE Transactions on Automation Science and

Engineering. 5(3): 431-445.

© COPYRIG

HT UPM

144

Gusella, R. and Zatti, S. (1989). The accuracy of the clock synchronization achieved

by TEMPO in Berkeley Unix. IEEE Transactions on Software Engineering.

15(7): 847-853.

Haiyun, L. and Lu, S. (2005). A topology-independent wireless fair queuing model in

Ad Hoc networks. IEEE Journal on Selected Areas in Communications. 23 (3):

585-597.

Han, C. C., Shin, K. G. and Wu, J. (2003). A fault-tolerant scheduling algorithm for

real-time periodic tasks with possible software faults. IEEE Transaction on

Computers. 52(3): 362-372.

Hesuan, H., Zhou, M. C. and Li, Z. W. (2009). Liveness enforcing supervision of

video streaming systems using non-sequential Petri Net. IEEE Transactions on

Multimedia. 11(8): 1457-1467.

Holloway, L. E., Khare, A. S. and Gong, Y. (2009). Computing bounds for forbidden

state reachability functions for controlled Petri Net. IEEE Transactions on

Systems, MAN and Cybernetics. 34(2): 219-228.

Hruz, B. and Zhou, M. C. (2007). Modeling and control of discrete event dynamic

systems. Springer. 121-132.

Iordache, M. V. (2003). Methods for the supervisory control of concurrent systems

based on Petri Net abstractions. Doctoral dissertation published, University

Notre Dame, Britain.

Iordache, M. V., Moody, J. O. and Antsaklis, P. J. (2002). Synthesis of deadlock

prevention supervisors using Petri Net. IEEE Transaction on Robot Automatics.

18(1): 59-68.

Jadbabaie, A., Lin, J. and Morse, A. S. (2003). Coordination of groups of mobile

autonomous agents using nearest neighbor rules. IEEE Transactions on

Automatic Control. 48(6): 998-1001.

Ji, M., Azuma, S. and Egerstedt, M. (2006). Role-assignment in multi-agent

coordination. International Journal Assistant Robot Mechatronics. 7(1): 32-40.

Jiang, J., Lai, T. H. and Soundarajan, N. (2002). Distributed dynamic channel

allocation in mobile cellular networks. IEEE Transactions on Parallel and


Joffroy, B. and Sebastien, C. (2003). Group mutual exclusion in tree networks.

Journal of Information Science and Engineering. 19: 415-432.

Joshi, T., Mukherjee, A., Yoo, Y. and Agrawal, D. P. (2008). Airtime fairness for

IEEE 802.11 multi-rate networks. IEEE Transactions on Mobile Computing.

7(4): 513-527.

© COPYRIG

HT UPM

145

Joung, Y. J. (2003). Quorum-based algorithms for group mutual exclusion. IEEE

Transactions on Parallel and Distributed Systems. 14(5): 463-476.

Joung, Y. J. (2000). Asynchronous group mutual exclusion, Distributed Computing

(DC). 13(4): 189–206.

Joung, Y. J. (2002). The congenial talking philosophers‟ problem in computer

networks. Distributed Computing (DC). 12(3): 155–175.

Joung, Y. J. (2003). Quorum-based algorithms for group mutual exclusion, IEEE

Transactions Parallel and Distributed Systems (TPDS). 14(5): 463–475.

Kanhere, S. S., Sethu, H. and Parekh, A. B. (2002). Fair and efficient packet

scheduling using elastic round robin. IEEE Transactions on Parallel and


Lafferriere, G., Williams, A., Caughman, J. and Veerman, J. J. P. (2005).

Decentralized control of vehicle formations. System Control Letter. 54(9): 899-

910.

Lamport, L. (1978). Time, clocks and the ordering of events in a distributed system.

Communication of the ACM. 21: 558-564.

Lamport, L. (1990). Concurrent reading and writing of clocks. ACM Transaction

Journal on Computer Systems. 8: 305-310.

Lau, V. K. N. (2005). Proportional fair space–time scheduling for wireless

communications. IEEE Transactions on Communications. 53(8): 1353-1360.

Lee, J. F., Liao, W. and Chang, M. (2007). Inter-frame-Space (IFS) based distributed

fair queuing for proportional fairness in IEEE 802.11 WLANs. IEEE

Transactions on Vehicular Technology. 46(3): 1366-1373.

Lee, J. S., Zhou, M. C. and Hsu, P. L. (2008). Multi-paradigm modeling for hybrid

dynamic systems using Petri Net framework. IEEE Transactions on Systems,

Man and Cybernetics. 38(2): 493-498.

Lee, Y., Park, I. and Choi, Y. (2002). Improving TCP performance in multipath

packet forwarding networks. Journal of Communications and Networks. 4(2):

148-157.

Leung, K. C. and Li, V. O. K. (2003). Flow assignment and packet scheduling for

multipath routing. Journal Communications and Networks. 5(3): 230-239.

Leung, K. C. and Li, V. O. K. (2006). Generalized load sharing for packet-Switching

networks I: theory and packet-based algorithm. IEEE Transactions on Parallel

and Distributed Systems. 17(7): 694-702.

Leung, K. C. and Ma, C. (2005). Enhancing TCP performance to persistent packet

reordering. Journal of Communications and Networks. 7(3): 385-393.

© COPYRIG

HT UPM

146

Leung, K. C., Li, V. O. K. and Yang, D. (2007). An overview of packet reordering in

transmission control protocol (TCP): problems, solutions, and challenges. IEEE

Transactions on Parallel and Distributed Systems. 18(4): 522-535.

Li, Z. W. and Zhou, M. C. (2004). Elementary siphons of Petri Net and their

application to deadlock prevention for flexible manufacturing systems. IEEE

Transactions on Systems, MAN and Cybernetics. 34(1): 38-51.

Li, Z. W. and Zhou, M. C. (2008). A survey and comparison of Petri Net-based

deadlock prevention policies for flexible manufacturing systems. IEEE

Transactions on Systems, MAN and Cybernetics. 38(2): 173-188.

Li, Z. W. and Zhou, M. C. (2009). A divide and-conquer strategy to deadlock

prevention in flexible manufacturing systems. IEEE Transactions Systems, MAN

Cybernetics. 39(2): 156-169.

Li, Z. W. and Zhou, M. C. (2009). Deadlock resolution in automated manufacturing

systems: a novel Petri Net approach. Springer. 34-42.

Li, Z. W., Yan, M. M. and Zhou, M. C. (2010). Synthesis of structurally simple

supervisors enforcing generalized mutual exclusion constraints in Petri Net.

IEEE Transactions on Systems, MAN and Cybernetics-Part C. 40(3): 330-340.

Liu, J. and Singh, S. (2001). ATCP: TCP for mobile Ad Hoc networks. IEEE Journal

Selected Areas in Communications. 19(7): 1300-1315.

Lodha, S. and Kshemkalyani, A. (2000). A fair distributed mutual exclusion

algorithm. IEEE Transactions on Parallel and Distributed Systems. 11(6): 537-

549.

Luo, J., Wu, W., Su, H. and Chu, J. (2009). Supervisor synthesis for enforcing a class

of generalized mutual exclusion constraints on Petri Net. IEEE Transactions on

Systems, MAN and Sybernetics-Part A. 39(6): 1237-1246.

Mittala, N. and Mohanb, P. K. (2007). A priority-based distributed group mutual

exclusion algorithm when group access is non-uniform. IEEE Transactions on

Systems, MAN and Cybernetics-Part C. 40(3): 797-815.

Mills, D. L. (1989). Internet time synchronization: the Network Time Protocol

(NTP). IEEE Transactions Communications. 39(10): 1482-1493.

Moln´ar, I. (2009). Multiuser access in MIMO/OFDM systems. IEEE Transactions

on Communications. 53(1): 107–116.

Naimi, M. and Thiare, O. (2009). Distributed mutual exclusion based on causal

ordering. Journal of Computer Science. 5(5): 398-404.

© COPYRIG

HT UPM

147

Nesargi, S. and Prakash, R. (2002). Distributed wireless channel allocation in

networks with mobile base stations. IEEE Transactions on Vehicular

Technology. 51(6): 1407-1421.

Nga, J. H. C., Iu, H. H. C., Ling, B. W. K. and Lam, H. K. (2008). Analysis and

control of bifurcation and chaos in average queue length in TCP/RED model.

International Journal of Bifurc Chaos. 18(8): 2449-2459.

Nga, J. H. C., Iu, H. H. C., Ling, S. H. and Lam, H. K. (2008). Comparative study of

stability in different TCP/RED models. Chaos Solutions Fractals. 37(4): 977-

987.

Nishida, H. and Nguyen, T. (2010). A global contribution approach to maintain

fairness in P2P networks. IEEE Transactions on Parallel and Distributed

Systems. 21(6): 812-826.

Olfati, R. S. and Murray, R. M. (2004). Consensus problems in networks of agents

with switching topology and time-delays. IEEE Transactions on Automatic

Control. 49(9): 1520-1533.

Pattara, A. W., Krishnamurthy, P. and Banerjee, S. (2003). Distributed mechanisms

for quality of service in wireless LANs. IEEE Transactions on Wireless

Communications. 10(3): 26-34.

Peng, Li, Haisang, Wu. and Jensen, E. D. (2006). A utility accrual-scheduling

algorithm for real-time activities with mutual exclusion resource constraints.

IEEE Transactions on Computers. 55(4): 454-469.

Piroddi, L., Cordone, R. and Fumagalli, I. (2008). Selective siphon control for

deadlock prevention in Petri Net. IEEE Transactions on Systems, MAN and

Cybernetices. 38(6): 1337-1348.

Piroddi, L., Cossalter, M. and Ferrarini, L. (2007). A resource decoupling approach

for deadlock prevention in FMS. International Journal of Advance

Manufacturing Technology. 10: 1307-1319.

Pleisch, S. and Schiper A. (2003). Fault-tolerant mobile agent execution. IEEE

Transactions on Computers. 52 (2): 209-222.

Ramaswamy, S. E. and Joshi, S. B. (1996). Deadlock free schedules for automated

manufacturing workstations. IEEE Transaction on Robot Automatics. 12(3):

391-400.

Raymond, K. (1989). A distributed algorithm for multiple entries to a critical section.

ACM Communications. 30(4): 189-193.

Ren, W. and Beard, R. (2005). Consensus seeking in multi-agent systems under

dynamically changing interaction topologies. IEEE Transactions on Automatic

Control. 50(5): 655-661.

http://www.informatik.uni-trier.de/~ley/db/journals/ipl/ipl30.html#Raymond89

© COPYRIG

HT UPM

148

Ricart, G. and Agrawala, A.K. (1981). An optimal algorithm for mutual exclusion in

computer networks. Communication of the ACM. 24: 9-17.

Roczniak, A., Saddik, A.E. and Levy, P. (2007). INCA: qualitative reference

framework for incentive mechanisms in P2P networks. International Journal on

Computer Applications in Technology. 29(1): 71-80.

Srinivasan, S. and Jha, N. (1999). Safety and reliability driven task allocation in

distributed systems. IEEE Transaction on Parallel and Distributed Systems.

10(3): 238-251.

Sathiaseelan, A. and Radzik, T. (2004). Improving the performance of TCP in the

case of packet reordering. Lecture Notes in Computer Science. 3079- 63-73.

Schenato, L. (2008). Optimal estimation in networked control systems subject to

random delay and packet drop. IEEE Transactions on Automatic Control. 53(5):

1311-1317.

Stallings, W. (2012). Operating Systems. New York: Prentice-Hall Press.

Stefan, S. and Thomas, A. (1997). Eraser: a dynamic data race detector for

multithreaded programs multithreaded programs. ACM Transactions on

Computer Systems. 15(4): 391–411.

Sundell, H. and Tsigas, P. (2005). Fast and lock-free concurrent priority queues for

multi-thread systems. Journal of Parallel and Distributed Computing. 65: 609 –

627.

Tanenbaum, A.S. and Steen, M.V. (2008). Distributed system, principles and

paradigms. Amsterdam: Prentice Hall publication.

Toyomura, M., Kamei, S. and Kakugawa, H. (2003). A quorum-based distributed

algorithm for group mutual exclusion. IEEE Transaction on Distributed and

Parallel Systems. 12(7): 231-242.

Trivellato, M. and Benvenuto, N. (2010). State control in networked control systems

under packet drops and limited transmission bandwidth. IEEE Transactions on

Communications. 58(2): 611-622.

Vaidya, N., Dugar, A., Gupta, S. and Bahl, P. (2005). Distributed fair scheduling in

a wireless LAN. IEEE Transactions on Mobile Computing. 4(6): 616-629.

Vidales, P., Baliosion, J., Serrat, J., Mapp, G., Stejano, F. and Hopper, A. (2005).

Autonomic system for mobility support in 4G networks. IEEE Journal of

Selected Area. 23(12): 2288-2304.

Vincenzo, P. (1994). Design of fault-tolerant distributed control systems. IEEE

Transactions on Instrumentation and Measurment. 43(2): 257-264.

© COPYRIG

HT UPM

149

Wang, F. and Zhang, Y. (2008). Improving TCP performance over mobile Ad-Hoc

networks with out-of-order detection and response. ACM MOBIHOC. 2: 217-

225.

Wang, Y. and Wu, H. (2007). Delay/fault-tolerant mobile sensor network (DFT-

MSN): a new paradigm for pervasive information gathering. IEEE Transaction

on Mobile Computing. 6(9): 1021-1034.

Wang, T. Y., Han, Y. S., Varshney, P. K. and Chen P. N. (2005). Distributed fault-

tolerant classification in wireless sensor networks. IEEE Journal on Selected

Areas in Communications. 23(4): 724-734.

Wang, T. Y., Chang, L. Y., Duh D. R. and Wu J. Y. (2008). Fault-tolerant decision

fusion via collaborative sensor fault detection in wireless sensor network. IEEE

Transaction on Wireless Communications. 7(2): 756-768.

Wen-xia, L., Nian, L., Young-feng, F., Li-xin, Z. and Xin, Z. (2009). Reliability

analysis of wide area measurement system based on the centralized distributed

model. IEEE Transaction Communication Magazine. 97(8): 1-6.

Wu, N., Zhou, M. C. and Li, Z. (2008). Resource-oriented Petri Net for deadlock

avoidance in flexible assembly systems. IEEE Transactions on Systems, MAN

and Cybernetics. 38(1): 56-69.

Wua, W., Caoa, J. and Yang, J. (2008). A fault-tolerant mutual exclusion algorithm

for mobile ad hoc networks. Pervasive and Mobile Computing. 4: 139–160.

Xia, Q., Jin, J. and Hamdi, M. (2008). Active queue management with dual virtual

proportional integral queues for TCP uplink/downlink fairness in infrastructure

WLANs IEEE Transactions on Wireless Communications. 7(6): 2261-2271.

Xing, K., Zhou, M. C., Yang, X. and Tian, F. (2009). Optimal Petri Net-based

polynomial-complexity deadlock avoidance policies for automated

manufacturing systems. IEEE Transactions on Systems, MAN and Cybernetics.

39(1): 188-199.

Xu, S. and Saadawi, T. (2001). Does the IEEE 802.11 MAC protocol work well in

multi hop wireless ad hoc networks? IEEE Communications Magazine. 54-66.

Yaashuwanth, C. and Ramesh, R. (2010). Intelligent time slice for round robin in real

time operating systems. IJRRAS. 2(2): 126-131.

Yaiche, H., Mazumdar, R. and Rosenburg, C. (2000). A game theoretic framework

for bandwidth allocation and pricing in broadband networks. IEEE/ACM

Transaction on Networks. 8(5): 667-678.

Yang, J., Jiang, Q., Manivannan, D. and Singal M. (2005). A fault-tolerant

distributed channel allocation scheme for cellular networks. IEEE Transactions

on Computers. 54(5): 616-629.

© COPYRIG

HT UPM

150

Zabir, S. M. S. and Kitagata, G. (2004). An efficient TCP flow Control over satellite

networks. Telecommunication Systems. 25(3): 371-400.

Documents

UNIVERSITI PUTRA MALAYSIA - psasir.upm.edu.mypsasir.upm.edu.my/id/eprint/56191/1/FK 2013 125RR.pdfProcess management is a fundamental problem in distributed systems. It is fast becoming