i
DESIGN AND PERFORMANCE ANALYSIS OF FAULT TOLERANT TTCAN SYSTEMS
by
AAKASH ARORA
THESIS
Submitted to the Graduate School
of Wayne State University,
Detroit, Michigan
in partial fulfillment of the requirements
for the degree of
MASTER OF SCIENCE
August 2005
MAJOR: COMPUTER ENGINEERING
Approved by:
Advisor Date
i
2
TABLE OF CONTENTS
ACKNOWLEDGEMENTS…………………………………………………….…...... i
LIST OF TABLES…………………………………………………………………….. iv
LIST OF FIGURES…………………………………………………………………… v
1 INTRODUCTION…………………………………………………………………… 1
2 LITERATURE REVIEW……………………………………………………………. 6
2.1 Fault Tolerance Theory…………………………………………………… 6
2.1.1 Classification of failures …………………………………………7
2.1.2 Classification of Failures in CAN and TTCAN........................ 7
2.2 Error Rate in TTCAN……………………………………………………… 8
2.3 Introducing Fault Tolerance in TTCAN………………………………….. 9
3 PROPOSED FAULT TOLERANCE TECHNIQUES FOR TTCAN……………13
3.1 Assumptions……………………………………………………………….13
3.2 Mailbox Method……………………………………………………………14
3.2.1 Proposed Architecture………………………………………….14
3.2.2 Algorithm…………………………………………………………15
3.2.3 Analysis…………………………………………………………..18
3.3 Arbitration Window Method………………………………………………20
3.3.1 Proposed Architecture………………………………………….21
3.3.2 Algorithm…………………………………………………………22
3.3.3 Analysis…………………………………………………………..24
3.4 Asynchronous Redundant Bus Method…………………………………25
3.4.1 Proposed Architecture………………………………………….26
2
3
3.4.2 Algorithm…………………………………………………………27
3.4.3 Analysis…………………………………………………………..29
4 PERFORMANCE ANALYSIS……………………………………………………..32
4.1 Performance Analysis of Mailbox Method……………………………...32
4.1.1 Results …………………………………………………………..32
4.1.2 Analysis………………………………………………………….36
4.2 Performance Analysis of Arbitration Window Method………………...38
4.2.1 Results …………………………………………………………..38
4.2.2 Analysis………………………………………………………….40
4.3 Performance Analysis of Asynchronous Redundant Bus Method…..41
4.3.1 Results…………………………………………………………...42
4.3.2 Analysis…………………………………………………………..44
4.4 Comparison of Proposed Techniques ………………………………… 46
5 CONCLUSION………………………………………………………………………48
6 FUTURE WORK……………………………………………………………………50
REFERENCES……………………………………………………………………….. 51
ABSTRACT…………………………………………………………………………….54
AUTOBIOGRAPHICAL STATEMENT……………………………………………...56
3
4
LIST OF TABLES
Table I: Bit error rates………………………………………………………………….8
Table II: Testing environment………………………………………………………...9
Table III: Failures per hour…………………………………………………………...9
Table IV: Delay in the case of Mailbox Window Method………………………….36
Table V: Delay in case of Arbitration Window Method ……………………………40
Table VI: Number of messages per hour missing deadline………………………43
Table VII: Comparison of Worst Case Delay……………………………………….46
Table VIII: Comparison of probability of unsuccessful delivery…………………..47
4
5
LIST OF FIGURES
Figure 1: Progression of application of electronics in vehicles……………………..1
Figure 2: TTCAN System Matrix ……………………………………………………...3
Figure 3: Safety critical applications of in-vehicle networks………………………..4
Figure 4: Proposed System Architecture……………………………………………15
Figure 5: A System Matrix with a Mailbox Window in every basic cycle………...16
Figure 6: Proposed Algorithm for Mailbox window Method……………………….17
Figure 7: A TTCAN System Matrix for the Arbitration Window Method………….22
Figure 8: Algorithm for Proposed Arbitration Window method……………………23
Figure 9: TTCAN system with redundant bus………………………………………26
Figure 10: Proposed TTCAN architecture with asynchronous redundant bus….27
Figure 11: Algorithm for Proposed Asynchronous Redundant Bus method…….28
Figure 12: Calculation of delay……………………………………………………….29
Figure 13: Effect of mailbox window…………………………………………………33
Figure 14: Effect of number of time windows in a basic cycle…………………....34
Figure 15: Effect of average failure rate per second (λ )………………………....35
Figure 16: Effect of Arbitration Window method…………………………………...39
Figure 17: Effect of average failure rate per second ……………………………...39
Figure 18: Average latency versus Secondary bus speed……………………….43
Figure 19: Effect of Asynchronous Redundant Bus Technique……………….....44
5
1
1 INTRODUCTION
The automobile world is witnessing a paradigm shift. Most of the hydraulic and
mechanical systems are being replaced by electronics. Today the cost of
electronics in high-end cars can amount to 50% of the total cost of the car [1].
This cost is increasing by 9-16 % every year. The number of microprocessors in
a car varies from 70-100 [2, 3]. These microprocessors have applications in
almost all major vehicle systems like brakes, steering, air bag deployment,
emission control, engine control and other critical systems [4-6]. Figure 1 [7]
depicts the replacement of hydraulic and mechanical systems by electronics and
role of electronics in major systems inside vehicle.
Figure 1: Progression of application of electronics in vehicles (Courtesy of Leen
and Heffernan [7])
1
2
In order to be effective, sensors, actuators and microprocessors in a vehicle must
communicate using appropriate network protocols. A fine comparison of 40 such
protocols that exist since early 1980’s can be found in [7], [8] and [9]. By far CAN
(Controller Area Network) has been the most dominating protocol in Europe and
is now being used in the USA as well. The fact that last year almost 300 million
CAN microchips were sold [10], establishes the worldwide acceptance of CAN.
CAN was invented by Robert Bosch in the1980’s. The basic features of CAN like
high speed serial interface, low cost physical medium, short data lengths, fast
reaction times, multi-master and peer-to-peer communication, error detection and
containment [11] make it the common choice for many designers.
Factors like cost, stability, reliability and safety demand a distributed, safety
critical real-time computing platform [3]. But the event-triggered nature of CAN
prevents it from being used for deterministic real-time communications. Message
latency is not guaranteed in CAN and increases with higher bus message traffic
load [7]. This pushed research on time-based scheduling of messages for in-
vehicle networks in the early 1990’s. As a result, various time-triggered
architectures started to emerge. One of these, TTCAN, is based on unchanged
standard of CAN protocol [12].
TTCAN defines a system matrix that consists of basic cycles. Each Basic cycle
has various windows. Some of these are exclusive windows, some are arbitrating
windows and some are free windows. A frame synchronization entity has all the
time-triggered information and is responsible for triggering specific messages in
their corresponding exclusive time windows. Event triggered messages follow
2
3
standard CAN arbitration and are scheduled for arbitrating windows. Free
windows are reserved for future expansion of the network. A global timer
synchronizes all local timers at the start of each basic cycle using a message
known as the reference message. The above scheme is shown in Figure 2.
TTCAN is now accepted as ISO CD 11898-4(draft). Software independent
hardware for testing TTCAN is already available [13, 14].
Figure 2: TTCAN System Matrix (Courtesy of can-cia.org)
Need For Fault Tolerance
Almost all applications of in-vehicle networks are in safety critical applications, as
shown in Figure 3 [7].
3
4
Figure 3: Safety critical applications of in-vehicle networks (Courtesy of Leen and
Heffernan [7])
Thus it is vital for all in-vehicle networks to be fault tolerant. Though research is
going on, no systematic fault tolerance schemes exist for TTCAN [14]. Standard
CAN protocol allows retransmission if an error occurs during normal transmission
of a message. To protect the definitive latency of messages this feature was
taken out in TTCAN. TTCAN does not allow automatic retransmission of
messages during exclusive windows and arbitrating windows [15]. This implies
that if an error occurs during these windows, messages will be lost. Since, as
discussed in the beginning, communication of safety critical messages is one of
4
5
the main purposes of using TTCAN, loss of these messages defeats the purpose
of this new protocol, and much more than that puts the life of passengers at risk.
A study [16] shows that it is possible to have up to 2480 failures per hour in a
TTCAN system in the worst case. Hence to protect the lives of passengers, it is
important to make TTCAN completely fault tolerant. The main objective of the
thesis work presented here is to increase the fault tolerance of the TTCAN
systems by addressing the above mentioned issue. The work presented here
makes an attempt to achieve this goal by proposing and analyzing three new
techniques for increasing fault tolerance. The first two techniques ameliorate the
design of the system matrix to reduce the probability of unsuccessful delivery of a
message. These methods have been discussed analytically. The third technique
describes a method of incorporating an asynchronous redundant bus. This
method has been analyzed using simulations.
The work has been organized as following. Chapter 2 presents a survey of
existing literature on fault tolerance in time-triggered protocols, focusing on Time
Triggered CAN. Chapter 3 describes the proposed schemes for increasing fault
tolerance in TTCAN. It goes on to discuss the required architectures and
algorithms for successful implementation of these methods. Also various
parameters like worst-case response time, latency, average rate of failure per
second, etc that are used for analysis and simulation, are described in this
chapter. Parameters like probability of unsuccessful delivery and delay in delivery
of a message facing error are studied. Chapter 5 presents the conclusion and
Chapter 6 presents the future work.
5
6
2 Literature Review
2.1 Fault Tolerance Theory
Fault tolerance is defined as the ability of a system to respond gracefully to an
unexpected hardware or software failure. According to another definition, it is the
ability of a system or a component to continue normal operation despite the
presence of faults.
In case of X-by-wire systems, the definition and theory of fault tolerance provided
by Cedric Wilwert, et al in [17] is most suited. The paper defines fault tolerance
as a means of dependability, as much as fault prevision, fault prevention and
fault elimination. The authors go on further to distinguish and define errors, faults
and failures. As per their definition,
• Failure is a condition when the service delivered deviates from the
specified service. The failure is also the effect of an error on the service.
• Error is the part of the state of system which is susceptible to produce a
failure, and this is also the manifestation of a fault in the system.
• Fault is the supposed or assumed cause of an error.
The “fault-error-failure” chain is recursive, i.e., fault in a system will lead to an
error which will result in a failure. This failure will become a fault of some other
systems and the chain will continue. A nice example is of coding failure caused
by reasoning errors. This failure becomes a dormant fault in software. When
activated, this fault will lead to errors in data being processed by software. The
output will deviate from the expected outcome, thus causing failure.
6
7
The aim of fault tolerance is to provide means for graceful degradation of a
system in a manner that deviation of actual output from expected output is
minimum possible.
2.1.1 Classification of failures
For analyzing fault tolerance of a system it is necessary to have a failure model
for the system. Failures can be defined in different dimensions [17]. Two such
basic dimensions are value domain and temporal domain.
In the value domain, failures can be categorized as Byzantine failures, Coherent
failures and Fail-Silent failures. The Byzantine failure occurs when a failure is
perceived differently in non faulty nodes of the network. If all nodes perceive the
failure identically then the failure is known as Coherent failure. If no
communication is received by the nodes when they expect to receive it, the
failure is known as Fail-silent.
In the temporal domain, failures are categorized as permanent failures, transient
failures and intermittent. If the failure is irreparable then it is known as Permanent
failure. Transient failures are caused by external and transient stimulus.
Intermittent failures are caused by conception faults that are repeated
intermittently.
2.1.2 Classification of Failures in CAN and TTCAN Systems
Most of the fault tolerance studies for in-vehicle networks have been done on
coherent or fail salient transient failure models. Some studies [18, 16] categorize
7
8
the failures in a CAN system as Inconsistent Message Duplicate (IMD) failures
and Inconsistent Message Omission (IMO) failures.
In a CAN system, sometimes nodes may receive the same message twice, this is
called Inconsistent Message Duplicate (IMD) failure whereas in some other
cases, nodes may not receive a message, this is known as Inconsistent Message
Omission (IMO) failure. As discussed in Chapter 1, in a TTCAN system
retransmission of a message is not allowed in case of an error. Thus, in a
TTCAN system we only have IMO failures.
2.2 Error Rate in a TTCAN System According to [17] for X-by-wire systems a failure rate of better than 10-9 is
required. However, in reality not much actual data for error conditions and rates
is available [18]. For most analytical purposes, bit error rates of 10-4 (worst case)
to 10-6 (Best Case) have been considered. Probably the most practical data is
available in [16] and is shown Table I. These data were collected under
conditions shown in Table II. Table III [16] depicts the number of IMO failures in a
TTCAN system under various bit error rates.
Table I: Bit error rates Benign
Environment Normal
EnvironmentAggressive
Environment Bits
Transmitted 2.02 x 1011 1.98 x 1011 9.79 x 1010
Bit errors 6 609 25239 Bit error rate 3.0 x 10-11 3.1 x 10-9 2.6 x 10-7
8
9
Table II: Testing environment
Available Bus Bandwidth 1Mbps
Used Bus Bandwidth 250Kbps Length of CAN bus 30m
Time slot 400 Microseconds Data frame 8 bytes
Table III: Failures per hour Bit error rate IMO Failures per hour
10-4 2840
10-5 286
10-6 2.87
The paper cited above mentioned that these tests were carried out in academic
conditions (benign), in normal factory conditions (normal) and in a factory with
arc welding taking place at a distance of 2 meter (aggressive).
For safety critical applications, it is essential to design a system to withstand the
worst-case scenarios. As is clear from Table III, the number of IMO failures is too
high in the worst-case scenarios. Thus, it is of utmost importance to make a
TTCAN system fault tolerant.
2.3 Introducing Fault Tolerance in a TTCAN System
Fault tolerance can be introduced using both hardware and software measures.
For example, to introduce fault tolerance in a TTA (Time Triggered Architecture
based on TTP/C) system, redundant nodes, duplicate channels (hardware) and
algorithms (software) controlling functions such as membership agreement,
clique avoidance and clock synchronization [19] are used.
9
10
TTCAN provides fault tolerance for global clock synchronization by providing
potential time master nodes. These nodes can become the time master in an
event of failure of actual time master. Thus, the time synchronous nature of a
TTCAN system is preserved [13, 15, 20]. But other than that, no systematic fault
tolerant scheme exists for a TTCAN system [14]. For example, as described in
the problem statement in Chapter 1, in case of occurrence of an error,
retransmission of a message in not allowed. This leads to loss of safety critical
messages, thus endangering lives of passengers.
Different studies have been carried out to address fault tolerance issues in
TTCAN systems. Mϋller, et al proposed a fault tolerant TTCAN in ref [21]. They
proposed to introduce fault tolerance in a TTCAN system by providing
synchronous redundant channels. These channels vary in number from two to
three. The work focuses mainly on the topologies and synchronization issues.
The authors try to solve problems related to phase synchronization of cycle time,
global time and rate. The main drawback of this technique is that it requires the
redundant bus to be dedicated. This means that the redundant bus cannot be
shared between two networks. This problem is inherent to almost all redundant
bus implementations that are synchronous in nature. Another drawback is that
this solution requires the speed of the redundant bus to be exactly same as that
of the primary bus. Same speed of primary and secondary busses is required to
maintain the synchronous nature of the solution.
Colnarič and Verber [22] proposed to use the redundant bus for sharing of bus
load. In this technique there are multiple timetables in Frame Synchronization
10
11
Entity. Each bus has its own timetable for normal circumstances. In case of a
failure, load of one bus is moved to the other. There exists a separate timetable
for such a condition. The failure is detected using a monitor and diagnostic time
slot. In this slot, the time master polls each bus. If there is no reply from any of
the busses, that bus is assumed to have the failure and the load of that bus is
shifted to the other bus. There are several issues related to this proposed
solution. First, the solution has not been proved by analysis, simulation or
experiments. Second, under normal circumstances the bus utilization is very low.
This is done in order to accommodate the load of the secondary bus in case of a
failure. Also, when load of one bus is transferred to another, some messages are
eliminated from the timetable thus decreasing the overall output.
In another study [23], Broster, et al, have tried to handle the IMO failures. In this
case no redundant bus is used. The solution is software based. They have
proposed a new system matrix for reducing the probability of an unsuccessful
delivery. In the system matrix proposed by them, each exclusive window is
repeated twice and is placed next to each other. First the system is studied with
a normal system matrix and the probability of failure is calculated. Then the
system matrix is redesigned as stated above, i.e., by transmitting each message
twice, one after another. Again probability of a failure is calculated. The analysis
is based on poisson’s fault model. The results of this study show that probability
of an unsuccessful delivery decreases in the second case, i.e., when each
message is transmitted twice. Though the scheme produces good results, it is far
less than efficient. Transmitting every message two times forces the period of
11
12
each message to be more than the case when only single transmission is
required. Increasing the period of any message is not a good idea, especially in
case of safety critical messages.
Though work has been done on defining fault tolerance for Time Triggered CAN
systems, collecting bit error rates, failure rates and introducing techniques to
handle failures, no satisfactory solution exists. Especially, not much work has
been done to address the problem of message omission in case of an error. In
Chapter 3, three schemes are proposed to solve this issue.
12
13
3 PROPOSED FAULT TOLERANCE TECHNIQUES FOR TTCAN
As concluded in Chapter 2, not much work has been done on making TTCAN
networks fault tolerant. Especially, the issue of the loss of messages due to
prohibition of retransmission in exclusive time windows has not been addressed
effectively. This section of the thesis work presents three techniques for
addressing this particular problem. Section 3.1 of this thesis describes the
assumptions for the three proposed techniques. Section 3.2 explains the first
proposed technique, which is referred to as the Mailbox Window method. Section
3.3 describes the second proposed technique, which is known as the Arbitration
Window method. Section 3.4 presents the third proposed technique, which is
called the Asynchronous Redundant Bus method.
3.1 Assumptions
The work presented in this section assumes a coherent (in value domain),
transient (in time domain) failure model. Also it is being assumed that for any
practical fault tolerant system, occasional delivery failures are acceptable and
expected and that hard deadlines cannot be made in any form of electrical
communications that are subject to unpredictable faults [23]. The distribution of
faults is considered to follow Poisson’s distribution. All the system design
analysis presented in the following work is for the worst-case scenarios. Also it is
essential to mention here that TTCAN uses the error detection mechanism
identical to that of the CAN which includes Cyclic Redundancy Check (CRC) and
Acknowledgement field in each message.
13
14
3.2 Mailbox Method
The basic objective of this method is to increase the fault tolerance of a TTCAN
system by decreasing the probability of an unsuccessful delivery of a message
and delay in delivery of the message that is caused by the prohibition of
retransmission in an exclusive window. The Mailbox Window method makes an
attempt to decrease this probability by providing an “extra chance” to the failed
message. This chance is provided by attaching an extra window at end of each
basic cycle called the Mailbox window. Any failed message can be dynamically
rescheduled for either any forthcoming arbitrary window in the current basic cycle
or for the Mailbox window. The proposed architecture, algorithm and analysis are
presented in the following sections.
3.2.1 Proposed Architecture The proposed architecture is based on layered network approach. The physical
layer consists of the transceivers and the transfer medium. On the top of this
layer is the TTCAN controller that enforces the protocol. The controller is in turn
supervised by a software layer, which takes care of scheduling and exception
handling algorithms. The TTCAN controller maintains two pairs of receive and
transmit buffers. The transmit buffers are represented by P_Tx_Buf and
S_Tx_Buf. Similarly, the receive buffers are P_Rx_Buf and S_Rx_Buf (where P
and S stand for Primary and Secondary, respectively). Figure 4 shows the above
mentioned architecture.
The physical layer can be implemented using a twisted pair of copper wires.
Available SAE standards for CAN, for example SAEJ2241 etc can be used for
14
15
communications. The TTCAN controller can be either a custom built or an off-
the-shelf controller manufactured by BOSCH [13].
SUPERVISOR Software
TTCAN CONTROLLER SP Physical layer
TRANSCIEVER
CAN Bus
Figure 4: Proposed System Architecture
3.2.2 Algorithm The algorithm proposed here for ensuring the delivery of safety critical messages
is built on the top of TTCAN. In the proposed system-matrix design, each basic
cycle contains a reserved time window at its end, known as the MAILBOX
WINDOW. This system matrix is shown in Figure 5.
If any message in a basic cycle is not transmitted due to an error, the supervisor
stores the message in the secondary transmit buffer and schedules it for the
mailbox window of the current basic cycle. In a real-time scenario, this window
acts like a mailbox for the messages that couldn’t go through the bus. If there is
15
16
an arbitrary window in the basic cycle after the point of an error but before the
mailbox window, then the message that encountered an error is scheduled for
that arbitrary window.
Mailbox Msg BFree Msg C Msg A Ref
Mailbox Msg CARB Merged Msg B Ref
Mailbox Msg CMsg CMsg B ARB Ref
Mailbox MsgCFree Msg BMsg A Ref
Figure 5: A System Matrix with a Mailbox Window in every basic cycle
Architecturally, the only changes that are required are in the role of the
supervisor and addition of an extra pair of the Tx and Rx buffers, as shown in
Figure 4. When there is no error, the whole system behaves just like a regular
TTCAN system [12, 15, and 20]. The frame synchronization entity controls the
whole system of exclusive, arbitrating and free windows. However, in the case of
an error, the role of the supervisor increases. If an error occurs during the
transmission of a message through the primary bus, the supervisor sends an
error frame on the primary bus while it sends the failed message to the
S_Tx_Buf. Generally, most CAN chips maintain their transmission (Tx) queues
on a first in first out (FIFO) basis [24]. In this work also, all messages in all
queues are maintained on a FIFO basis. The S_Tx_Buf is flushed at the start of
16
17
each system matrix. This is done in order to prevent the overflow of messages, if
any, from one system matrix to another. The following flowchart, as shown in
Figure 6, describes the whole algorithm.
Start of System Matrix
Flush S_Tx_Buf
Send next scheduled message to specified node
Queue the message in S_Tx_Buf
YESNO Error
NOIf the
message was last message
of matrix
Schedule message for mail -box (or Arbitrary window, if present) window of current cycle
YES
Finish
Figure 6: Proposed Algorithm for Mailbox window Method
17
18
3.2.3 Analysis
The performance parameters chosen for the proposed scheme are the
probability of unsuccessful delivery and the latency (delay) of message
transmission. The goal of the proposed technique is to reduce the value of both
parameters.
The probability of a failure can be calculated as described below [23].
Considering Poisson’s distribution of errors, the probability of m errors in time t, is
given by:
!)(
mtePmt
tλλ−
= ………………………………………………… (1)
Where, λ is the average number of failures per second. For a successful delivery,
the value of m should be zero. Thus for a successful delivery, Equation (1)
reduces to the following equation.
tt eP λ−= ………………………………………………………………. (2)
The probability of an unsuccessful delivery can be calculated by subtracting
Equation (2) from one:
teP λ−−=1 …………………………………………………………… (3)
To consider the worst-case scenario, t in the above equation is replaced by ,
where is the worst-case transmission time for a message. It takes into
account a message with the maximum number of stuff bits and error frames.
ic
ic
18
19
We can extend Equation (3) to state that the probability of an unsuccessful
delivery in n attempts is
ncieP )1( λ−−= ………………………………………………………………………………. (4)
Now by adding a mailbox window in a basic cycle with N time windows this
probability can be reduced to the following value
N
ncieP1
)1(+−−= λ
……………………………………………………. (5)
Mathematically this can be viewed as giving each time window another 1/N
attempt for message delivery.
As described in Chapter 1, in a TTCAN system, the retransmission of messages
are not allowed in exclusive and arbitrating windows. This essentially means that
only one attempt can be made for a delivery. Thus in Equations (4) and (5) the
value of can be taken as one. Hence, Equations (4) and (5) can be rewritten
as Equation (6) and (7). The probability of an unsuccessful delivery in an
exclusive window of a TTCAN system is
n
)1( iceP λ−−= ………………………………………………………………………………. (6)
If a TTCAN system contains a Mailbox Window at the end of every basic cycle,
then the probability of an unsuccessful delivery in an exclusive window is given
by
NcieP11
)1(+−−= λ
………………………………………………... (7)
19
20
For the analysis purposes, the values of λ are taken as 0.7, 0.08 and 0.00079 for
the worst-case, normal and benign scenarios, respectively. These values are
proved in [16] and are shown in Table III.
Table III: Failures per hour
Bit error rate IMO Failures per hour
10-4 2840
10-5 286
10-6 2.87
If the mailbox window is not considered, a message that encountered an error
will be transmitted after a time period equal to its periodicity. However with the
mailbox window this message can be transmitted earlier than that. This
improvement in the quality of service is expressed as following:
Suppose in a basic cycle, the length of window i is t . If an error occurs in the nith
window of a basic cycle, then the message will be delivered in the mailbox
window (the worst-case scenario). Thus the delay in time (D) is
D= =∑ ………………………………………………….... (8) ∑+=
N
niit
1∑==
−n
ii
N
ii tt
00
3.3 Arbitration Window Method
The proposed Mailbox Window method, as described in Section 3.1, is suitable
for cases where the length of a basic cycle is small. Otherwise, the latency of the
message that encountered an error tends to go beyond acceptable limits. Also, in
a rare case, if two or more messages encounter errors during one basic cycle,
20
21
then only one message is able to make it through the Mailbox Window. To
overcome these shortcomings, a new method is proposed in this section.
As mentioned in Chapter one, a TTCAN system matrix consists of three types of
windows: the exclusive windows, arbitration windows and free windows. The
exclusive windows are reserved for specific messages as per timetable laid out in
frame synchronization entity. The free windows are reserved for future expansion
of the network. The arbitration windows are unreserved. Any message can
compete for the bus during the arbitration window. In case of a collision, as per
CAN specifications, a message with a higher priority wins the bus during the
bitwise arbitration.
The basic idea of the method presented here is to use arbitration windows in a
system matrix design in such a manner that it increases the fault tolerance of the
TTCAN system. It is being assumed that the lowest priority safety critical
message has a higher priority than the highest priority non-safety critical
message. This helps in ensuring that during a collision in an arbitration window,
the safety critical message gets the access to the bus.
3.3.1 Proposed Architecture
The proposed architecture for this technique is identical to the one presented for
the mailbox window method as shown in Figure 4.
21
22
3.3.2 Algorithm
The TTCAN system matrix for the arbitration window method is constructed in
such a manner that an arbitration window follows every exclusive window as
shown in Figure 7.
Arb Msg B Free Arb Msg A Ref
Arb Msg C Free Arb Msg D Ref
Free ArbMsg C Arb Msg B Ref
Arb MsgCFree ArbMsg A Ref
Figure 7: A TTCAN System Matrix for the Arbitration Window Method
Like the Mailbox Window method, this method is also a dynamic rescheduling
scheme. The algorithm works as follows. During the normal course of operation,
the frame synchronization entity keeps on sending messages to their respective
time windows. In case of a delivery failure in an exclusive window due to a
transmission error, the supervisor stores the message in the secondary
transmission buffer and schedules this message for the next window. Since in
the arbitration window method, an arbitration window follows every exclusive
window, the failed message is immediately scheduled for the next arbitration
window. During the arbitration, this message might collide with some other
messages. However, since any safety critical message has higher priority than all
22
23
the non-safety critical messages, the failed safety critical message will get the
bus during the second attempt. This serves the purpose of increasing fault
tolerance. The algorithm is shown in Figure 8.
Start of System Matrix
Flush S_Tx_Buf
Send next scheduled message to specified node
Queue the message in S_Tx_Buf
YESNO Error
NOIf the msg
was last msg of matrix
Schedule msg for following Arbitrating
window
YES
Finish
Figure 8: Algorithm for Proposed Arbitration Window method
23
24
3.3.3 Analysis
The performance analysis parameters here are the probability of an unsuccessful
delivery and delay in the delivery of the message. The analysis can be extended
from the analysis of the Mailbox Window method. In the Mailbox Window
method, the mailbox window was viewed as providing an extra 1/N (where N is
equal to the number of windows in a basic cycle) chance to the message that
encountered error. In the present case, each message gets one complete (as
compared to 1/N chance in the previous case) extra chance in the form of
arbitrating window that follows its exclusive window. This means that each
message gets two chances of delivery. Thus, for the arbitrary window case,
Equation (7) can be extended to express the probability of an unsuccessful
failure as:
2)1( iceP λ−−= ……………………………………………… (9)
The worst-case delay in this case is equal to the length of longest exclusive
window. This is true because if transmission failure occurs in an exclusive
window the message will get transmitted in the following window which as per
our design is an arbitration window. Hence the transmission will be completed
after a delay of one window length. The length of the longest window can be
calculated as shown below. This calculation has been proved in a study
described in Ref [25]. Also Ref [26] provides an applet to calculate worst case
time for transmission of a CAN message for a given bus speed and data load for
a message.
24
25
bitefsstuff
datastufffixdatafixdata ).tl
1-l ll-l1l(l t +
++++= ……………… (10)
Where
fixl =length of fixed size bits in a message subject to bit stuffing
datal =length of data field
efsl =length of bits which are not subjected to bit stuffing
stuffl = number of stuff bits
bitt =bit timing
Also for error frames
bititerdeflagerror tllt )2( lim+= ……….………………………………………… (11)
Length of a time window in worst case can be calculated by adding (10) and (11).
Delay = errordatawindow ttt += …………………….……………….………..… (12)
3.4 Asynchronous Redundant Bus Method
In this method a TTCAN system is considered in which a non-synchronous
secondary bus is attached to each node. This arrangement is shown Figure 9. In
the system matrix no mailbox window is considered. The redundant bus need not
be dedicated to a single network. It can be used on a bandwidth sharing basis.
The speed of the redundant bus can differ from the speed of the primary bus.
There is no need of synchronization, i.e., the secondary bus need not have time
slots. The Secondary bus can be a normal CAN bus.
25
26
This scheme also assumes that it is necessary to accept the graceful
degeneration of the system for the sake of fault tolerance.
Figure 9: TTCAN system with redundant bus
3.4.1 Proposed Architecture
This architecture considers two controllers at a node; a TTCAN controller for the
primary bus and a CAN controller for the secondary bus. Each controller has a
transceiver connected to the transfer medium. These two separate channels are
made available to handle errors. Controller 1 takes care of the normal
communication through the primary bus and Controller 2 takes care of the
communication in case of a failure on the primary bus. It does so by using the
secondary bus. This architecture is shown in Figure 10.
26
27
Figure 10: Proposed TTCAN architecture with asynchronous redundant bus
SUPERVISOR Software
CAN CONTROLLER 2
TTCAN CONTROLLER 1
Tx Rx Tx Rx Physical layer
TRANSCIEVER 2 TRANSCIEVER 1
Primary Bus
Secondary Bus
3.4.2 Algorithm
During normal operations, the frame synchronization entity keeps scheduling
messages for their respective windows and transmits them through Controller 1.
In case of an unsuccessful transmission, the supervisor immediately sends an
error flag on the primary bus. At the same time it reroutes the message that
encountered error through the Controller 2 onto the secondary bus. As
mentioned earlier, the secondary bus may not have the same speed as that of
the primary bus. The algorithm is shown in Figure 11.
27
28
Start of System Matrix
Flush Secondary queue
Send message to specified
node through secondary bus
Send next scheduled message to specified node
through primary bus
Error YES
NO
NO
If the msg was last msg
of matrix
YES
Finish
Figure 11: Algorithm for Proposed Asynchronous Redundant Bus method
28
29
3.4.3 Analysis
The main area of interest here is the average latency in case of errors. Suppose
in a basic cycle an error occurs in time window n at time t . The supervisor will
immediately redirect this message to the secondary bus using the second
controller. The best-case scenario is that the secondary bus is fault free and is
not pre-occupied. Thus, the secondary bus is readily available for transmission of
the message that encountered an error during its first attempt. Depending upon
the speed of the secondary bus the message will be received at time by
receivers. If the expected time of delivery without the occurrence of an error
was t , then
1
2t
3
Delay =D= - ………………………………………………. (13) 2t 3t
This delay is illustrated in Figure 12. In the worst-case scenario it is considered
that a message has delay due to the fact that it has to wait because of the reason
like other messages in the secondary queue, successive network faults (burst),
jitter, etc.
3t Rx complete t
Point of error
n
through secondary bus
2
1t 2 3t
Start of message29
D= -t
30
Figure 12: Calculation of delay
Another parameter for performance analysis is probability of unsuccessful
delivery. In this case each message gets two attempts of delivery, one through
primary and second through primary bus. For the attempt made through the
primary bus this probability can be driven from Equation (6).
)1( ipcp eP λ−−= ……………………………………………………… (14)
Where pλ is the average failure rate per second on the primary bus.
Similarly for the attempt made through the secondary bus the probability of
unsuccessful delivery can be calculated as shown below
)1( is cs eP λ−−= ……………………………………………………… (15)
Where sλ is the average failure rate per second on the secondary bus.
The probability of unsuccessful failure for the overall system is the intersection of
and which represents the occurrence of error on both primary and
secondary busses simultaneously. In such a case the message tries to take the
primary bus but faces an unsuccessful delivery and as per algorithm supervisor
sends this message onto the secondary bus. On secondary bus also this
message faces an unsuccessful delivery. Our objective is to find the probability of
occurrence of such a completely unsuccessful delivery. This probability is shown
below.
pP sP
)1)(1())(( isip ccspsp eePPPPP λλ −− −−==∩= …………………… (16)
For studying the worst case let
30
31
λ = pλ = sλ ……………………………………………………….……………….. (17)
Substituting equation (17) in equation (18),
2)1( iceP λ−−= ……………………………………………………………….. (18)
The comparison of the equation (9) and equation (18) show that these two
equations are identical. This is true since in the asynchronous redundant method
as well as arbitration window method a message encountering error gets a
complete extra chance for retransmission.
31
32
4 PERFORMANCE ANALYSIS
4.1 Performance Analysis of the Mailbox Method
The main objective of the work presented in this thesis is to increase the fault
tolerance in TTCAN. To measure the performance of the proposed Mailbox
Window technique, the probability of an unsuccessful delivery and delay in
delivery of message that countered an error, have been considered as the
metrics for fault tolerance. Lower the probability of an unsuccessful delivery
better the fault tolerance of the TTCAN system. The performance of the Mailbox
Window method has been analyzed by varying the parameters like average
failure rate per second, the number of windows in a basic cycle and the number
of data bytes in a message. The effect of all these parameters on the probability
of an unsuccessful delivery is measured. The basic tools used for these
measurements are Equations (6) and (7) derived in Section 3.2.3. The analysis
presented here is generic and is independent of the system matrix design. For
the purpose of calculation of the worst-case length of a message ( ), shown in
Equations (6) and (7), the speed of the bus is assumed to be 250 kbps.
ic
4.1.1 Results
Figure 13 compares the effect of two cases, one with the Mailbox Window and
the other one without the Mailbox Window. The value of the probability of an
32
33
unsuccessful delivery is calculated for various message lengths ( ) in each
case. The number of windows in each basic cycle is considered to be ten.
ic
Effect of Mailbox Window
0
0.00005
0.0001
0.00015
0.0002
0.00025
0.0003
0.00035
0.0004
0.00045
0.0005
1 2 3 4 5 6 7 8
Number of Data Bytes in Message
Pro
bab
ility
of
Un
succ
essf
ul D
eliv
ery
Without Mailbox WindowWith Mailbox Window
Figure 13: Effect of mailbox window
Figure 14 manifests the effect of number of time windows, N, in a basic cycle on
the probability of an unsuccessful delivery. Here also, the probability of an
unsuccessful delivery is calculated for the various message lengths ( ) in each ic
33
34
case. All messages in a basic cycle are assumed to have time windows of the
same length.
Effect of Number of Windows in a Basic Cycle
0
0.00005
0.0001
0.00015
0.0002
0.00025
1 2 3 4 5 6 7 8
Number of Data Bytes in a Message
Pro
babi
lity
of U
nsuc
cess
ful
Del
iver
y 10 Windows8 Windows6 Windows5 Windows
Figure 14: Effect of number of time windows in a basic cycle
Figure 15 depicts the effect of average failure rate ( λ ) on the probability of an
unsuccessful delivery. As in the previous case, here also probability values are
found over a range of . The values of ic λ that are used in the figure are from
Table III and represent the best (λ =0.00079), normal ( λ =0.08) and worst (λ =0.
7) cases.
In this case also, the number of windows in a basic cycle is assumed to be ten.
The figure has been sketched on a logarithmic scale.
34
35
Effect of Average Failure Rate
0.00000001
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
11 2 3 4 5 6 7 8
Number of Data Bytes in a Message
Prob
abilit
y of
Uns
ucce
ssfu
l Del
iver
λ=0.7
λ=0.08
λ=0.00079
Figure 15: Effect of average failure rate per second (λ )
For analyzing the effects of this message on delay, a TTCAN system matrix with
three basic cycles is considered. Each basic cycle consists of five windows, the
fifth window being the mailbox window. The length of each time window is 400
microseconds. If a message faces an error it will transmitted during the mailbox
window. The delay can be calculated using equation (8). For the present case
delay is calculated for each window. This is shown in Table IV.
35
36
Table IV: Delay in the case of Mailbox Window Method
Position (in the basic cycle) of exclusive window encountering
error
Delay (in microseconds)
1 1600
2 1200
3 800
4 400
4.1.2 Analysis
From Figure 13, it is clear that the probability of an unsuccessful delivery
decreases with the use of the Mailbox Window. This can be explained from
comparison of Equations (6) and (7). By attaching a mailbox window at the end
of each basic cycle, this method provides an extra 1/N chance (where N is the
number of windows in a basic cycle) to the message that encounters an error
and is unable to be transmitted during its regular time slot. Thus, it reduces the
probability of an unsuccessful delivery. The probability of an unsuccessful
delivery increases with the increase in message length ( ). This happens
because of the fact that with the increase in message length, more time is
available for an error to occur.
ic
After concluding form Figure 13 that the Mailbox Window method certainly
provides better fault tolerance in a TTCAN system, it becomes essential to study
other system matrix design criteria such as the number of windows in a basic
36
37
cycle. Figure 14 provides an insight to this aspect. Figure 14 shows that the
probability of a failed delivery decreases with the decrease in number of windows
in a basic cycle. A mailbox window can be viewed as an extra chance of delivery
for failed messages. This chance of delivery is distributed over N windows. Thus
this chance is inversely proportional to N. If N is decreased, then the chance of
delivery is increased and vice versa. Hence, the probability of an unsuccessful
delivery will decrease as N decreases. This explains the nature of curves in
Figure 14. The figure helps in choosing the size of a basic cycle in a system
matrix for a given message size and an acceptable probability of an unsuccessful
delivery.
The effect of the best, worst and normal values of λ on the probability of a failed
message is shown in Figure 15. These values were collected during a study [16]
conducted in various conditions like laboratory conditions (best value ofλ ),
factory conditions (normal value ofλ ) and in a factory with arc welding taking
place at a two-meter distance (worst case value of λ ). Form the abovementioned
figure it is clear that in the worst-case scenario the probability of an unsuccessful
delivery increases.
The delay in the delivery of the message depends on the length of basic cycle
and position of the exclusive window in which the error occurred; this is evident
from Table IV. The earlier the error occurs in the basic cycle more it has to wait
for the mailbox time slot. The effect of the length of basic cycle can be shown
from the following consideration. In the case mentioned in section 4.1.1 there are
five windows in a basic cycle of length 400 microseconds each. If an error occurs
37
38
in the first exclusive window then message is received after 1600 microseconds
of deadline as shown in Table IV. However if there were ten windows in the basic
cycle everything else remaining same this message would have reached 3600
microseconds after its deadline. Thus it is better to have shorter basic cycles in
case of Mailbox Window method.
4.2 Performance Analysis of Arbitration Window Method
The chosen performance metrics for the Arbitration Window method are the
probability of an unsuccessful delivery of a message and delay in delivery of
message. The probability can be analyzed with the help of Equation (6) derived
in Section 3.2.3 and (10) derived in Section 3.3.3 of this work.
The performance of the proposed system has been analyzed by varying the
average rate of failure per second (λ ) and the worst-case length of a message
( ). The values for ic λ have been chosen as discussed in the case of the Mailbox
Window method. The bus speed is assumed to be 250 kbps
4.2.1 Results
Figure 16 shows two cases, one where the TTCAN is implemented according to
the Arbitration Window technique and the other where normal TTCAN is
implemented. The effect of the Arbitration Window method is noted on the
probability of an unsuccessful delivery. This effect is calculated for various
message lengths. The average rate of failure per second considered for the set
of results shown in Figure 16 is 0.7. The figure is drawn on a logarithmic scale.
38
39
Figure 17 shows the effect of average failure rate per second ( λ ) on the
probability of an unsuccessful delivery when the Arbitration Window method is
implemented. The values for λ have been chosen as discussed in the case of
the Mailbox Window method. This figure is also drawn on logarithmic scale.
Effect of Arbitration Window Technique
0.00000001
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
11 2 3 4 5 6 7 8
Number of Data Bytes in Message
Prob
abili
ty o
f Uns
ucce
ssfu
lDe
liver
y WithoutAritrationWindowTechniqueWith ArbitrationWindowTechnique
Figure 16: Effect of Arbitration Window method
Effect of Average Failure Rate
1E-14
1E-12
1E-10
1E-08
1E-06
0.0001
0.01
11 2 3 4 5 6 7 8
Number of Data Bytes in Message
Prob
abili
ty o
f Uns
ucce
ssfu
lDe
liver
y λ=0.7λ=0.08λ=0.00079
39
40
Figure 17: Effect of the average failure rate per second
The delay in the delivery of the message that encountered the error can be
calculated using equations (10), (11) and (12) and Ref [26]. Table V shows the
delay for various lengths of exclusive windows (different number of data bytes in
the message).
Table V: Delay in case of Arbitration Window Method
Number of Data Bytes in a Message Worst case delay from deadline
(in microseconds)
1 342
2 382
3 422
4 462
5 502
6 542
7 582
8 622
4.2.2 Analysis
Figure 16 makes it clear that with the use of the Arbitration Window method, the
probability of an unsuccessful delivery falls significantly thus increasing the fault
tolerance of the system. This occurs because attaching an arbitrary window at
the end of each exclusive window provides an extra chance of delivery for every
message that could possibly face an error during the transmission. The increase
40
41
of probability of an unsuccessful delivery with the increase in the message size
can be attributed to the extended exposure of the message to error occurrence.
Figure 17 shows the effect of the average rate of failure per second ( λ ) on the
probability of an unsuccessful delivery when the Arbitrary Window method is
used. As expected in the worst-case scenario, that is, when λ is equal to 0.7,
the probability of an unsuccessful delivery is more than what it is in the best-case
(λ equal to 0.0079). These values of λ were collected in conditions stated in
Section 4.1.3 (analysis of mailbox window method)
Table V shows the worst case delay for the message encountering error based
on the length of exclusive window for which it was scheduled. The delay is
directly proportional to the length of exclusive window facing error since this is
amount of time for which a message has to wait in order to be transmitted in the
next time slot which is an arbitrating window, as per proposed system matrix. As
can be seen from the Table V maximum possible delay is 622 microseconds.
Unlike mailbox window method, delay in this case is independent of length of
basic cycle.
4.3 Performance Analysis of the Asynchronous Redundant Bus Method
The Asynchronous Redundant Bus method has been analyzed by software
simulation. The purpose of the simulation is to collect results that can help in
finding the average latency when the redundant bus is used. The simulation
setup is described as follows. There are two buses and five nodes. The primary
bus is the one on which all the time triggered communications are carried out and
41
42
the dedicated redundant secondary bus is the one which handles
communications in case of a transmission failure on the primary bus. The speed
of the primary bus is 250 kbps. The speed of the secondary bus is varied as a
percentage of the primary bus speed. The system matrix consists of three basic
cycles. Each basis cycle contains five exclusive time windows of 400
microseconds each. Bus loading is 75 percent. The average failure rate per
second ( λ ) is chosen as 0.7 (the worst-case scenario). The percentage error
according to the chosen value of λ and configuration of the system matrix is
0.00029. The time window in which an error occurs and the point of occurrence
of the error within that time window, both are chosen randomly. As mentioned
earlier, it is being assumed that for any practical fault tolerant system, occasional
delivery failures are acceptable and expected and that the hard deadline cannot
be met in any form of electrical communications that are subject to unpredictable
faults [23]. In this work, the acceptable delay past the deadline is taken as 1200
microseconds. Any message with a delay exceeding this limit should be rejected.
4.3.1 Results
Figure 18 shows the variation of the maximum, minimum and average latencies
of messages that faced failure during transmission through the primary bus. This
variation is shown for various secondary bus speeds. The secondary bus speed
is shown as a percentage of the primary bus speed, which is 250 kbps.
Table IV shows the number of messages per hour (taking the secondary bus)
that exceed the maximum delay beyond the deadline. The value of this
42
43
acceptable delay from deadline is varied between 800 microseconds to 1200
microseconds.
Average Latency Vs Secondary Bus Speed
0
500
1000
1500
2000
2500
3000
3500
10 20 30 40 50 60 70 80 90 100
Secondary Bus Speed (percentage of primary bus)
Time i
n micr
osec
onds
AverageLatency
MinLatency
maxLatency
Figure 18: Average latency versus Secondary bus speed
Table VI: Number of messages per hour missing deadlines
Speed of Secondary
bus (percentage of Primary Bus )
Number of messages per hour missing Deadline of
800 sµ
Number of messages per hour missing Deadline of
1000 sµ
Number of messages per hour
missing Deadline of 1200
sµ
10 2691 2691 2594 20 2691 2691 2594 30 1863 0 0
43
44
40 0 0 0 50 0 0 0
The probability of unsuccessful delivery in this case can be calculated using
equation (18). The results are shown in Figure 19 below.
Effect of Asynchronous Redundant Bus Technique
0.00000001
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
11 2 3 4 5 6 7 8
Number of Data Bytes in Message
Prob
abili
ty o
f Uns
ucce
ssfu
lD
eliv
ery Without
AsynchronousRedundant bus
WithAsynchronousRedundant Bus
Figure 19: Effect of Asynchronous Redundant Bus Technique
4.3.2 Analysis
As is clear from Figure 18, the average, maximum and minimum latencies of
messages using the secondary bus increase with decrease in the speed of the
secondary bus. This can be explained by the fact that because of the slower
speed of the secondary bus, these messages spend more time on the secondary
bus. The average latency varies from 200 microseconds when the speed of the
44
45
secondary bus is the same as that of the primary bus to 2828 microseconds
when the speed of the secondary bus is ten percent of the speed of primary bus.
It becomes essential to determine the speed of the secondary bus for a good
fault tolerant TTCAN system design. In order to determine the speed of the
secondary bus to be used, it is necessary to consider the acceptable delay after
the deadline of the message. Table IV shows the number of messages per hour
that miss the extended deadline. Three cases have been studied where the
extension of the deadline is considered as 800 microseconds, 1000
microseconds and 1200 microseconds. From Table IV it becomes clear that
when the secondary bus speed is approximately equal to 30 to 40 percent of the
primary bus speed almost all messages are able to make it to the receivers
within 1200 microseconds of deadline. Consider a car moving at 50kmph.
Suppose on the application of the brake, the ABS message encounters a
transmission error. This message is then sent through the secondary bus with
the bus speed equal to 40 percent of the primary bus speed. This message will
definitely reach the intended receiver within 1200 microseconds. In these 1200
microseconds, the car will move only 1.6 centimeters. This distance is
insignificant, especially when compared to the case when there is no secondary
bus and this message is completely lost leading to break failure or delayed by
one period. Thus it can be concluded that it is practical to use an asynchronous
redundant bus for introducing fault tolerance in a TTCAN system.
From Figure 19 it is clear that the probability of unsuccessful failure decreases
with use of asynchronous redundant bus. The results shown in Figure 19 and the
45
46
results shown in Figure 16 for effect of Arbitration Window Method are identical
since both methods are effectively providing the message facing an error two
attempts for delivery.
4.4 Comparison of Proposed Techniques
To compare the effect of all three methods proposed and analyzed in this work a
TTCAN system matrix with three basic cycles is considered. Each basic cycle
consists of five windows each of which is 400_microsecond (two data byte and
maximum number of stuff bits as well as error flags and error delimiter). The
value of the average failure rate per second is considered to be 0.7 (worst case
scenario). The speed of the primary bus is considered to be 250 kbps. The speed
of the secondary bus in case of Asynchronous Redundant Bus method is
considered to be 30 percent of the primary bus speed. As it is clear from Table VI
at this secondary bus speed most of the messages are able to meet the 1200_
microsecond deadline. A comparison of the worst case delay is shown in Table
VII. The results shown in Table VII have been taken from Table IV, Table V and
Figure 18.
Table VII: Comparison of Worst Case Delay
Proposed Method Worst Case Delay(in microseconds)
Mailbox Window Method 1600
Arbitration Window Method 382
Asynchronous Redundant Bus 987
46
47
Thus we see that for the given system configuration the Arbitration Window
method has the least delay which is equal to 382 microseconds. Even when a
message with eight data bytes is used this delay does not exceed 622
microseconds, which is still lesser than the delay in the other two cases.
For the same system configuration the probability of unsuccessful delivery can
be compared for the three cases. This comparison is shown in Table VIII. The
results shown in Table VIII have been taken from Figure 13, Figure 16 and
Figure 19.
Table VIII: Comparison of probability of unsuccessful delivery
Proposed Method Probability of unsuccessful delivery
Mailbox Window Method 5.15851X 10-5
Arbitration Window Method 7.14836 X 10-8
Asynchronous Redundant Bus 7.14836 X 10-8
From Table VIII it is clear that the Arbitration Window method and the
Asynchronous Redundant Bus method have equal probability of unsuccessful
failure for the given system configurations and this probability is around 1000
times lesser than the Mailbox Window method.
47
48
5 CONCLUSION
The work presented in this thesis is an effort to increase fault tolerance of
TTCAN systems. In case of an error, retransmission of a message is not allowed
in an exclusive window of TTCAN. This can cause loss of safety critical
messages, thus putting life of passengers at risk. The thesis work presented here
proposes three solutions to address this issue. The work provides a detailed
description of architectures and algorithms required to implement these
schemes. The parameters considered for measuring the effects of these
schemes on the fault tolerance of the system are probability of unsuccessful
delivery and the delay in the delivery of the message. The results of a given
system configuration are compared at the end of work. The first method known
as the Mailbox Window Method is useful only when the length of the basic cycle
is small, messages have short periods and no hardware solution is available. The
second method called the Arbitration Window Method has the least delay in all
cases and requires no hardware changes. It has the lower probability of
unsuccessful delivery than the Mailbox Window Method but equal to the
probability of unsuccessful delivery in case of third method (known as the
Asynchronous Redundant Bus Method). Also, here the delay is not dependent on
the length of a basic cycle. The only drawback of this method is that it requires
periods of messages to be long enough to incorporate an arbitration window
following every exclusive window. The third method is known as Asynchronous
Redundant bus method requires hardware changes in the system as it requires a
secondary bus. This method is suitable for the cases where multiple number of
48
49
low speed busses exist that are used for carrying out communication of non
safety critical methods. For this method the probability of unsuccessful delivery is
equal to that of the Arbitration Window method but lesser that the Mailbox
Window method. The delay in delivery of messages can be reduced significantly
by using a high speed dedicated secondary bus. All three proposed methods
show improvement in fault tolerance of TTCAN system and are cost effective
when compared to existing techniques.
49
50
6 FUTURE WORK
The increase of drive-by-wire systems in the vehicles will give rise to extensive
use of real time in-vehicle networks. FlexRay is a communication system that will
support the needs of future in-car control applications. FlexRay will provide
flexibility and determinism by combining a scalable static and dynamic message
transmission, incorporating the advantages of familiar synchronous and
asynchronous protocols.
FlexRay Consortium (Core members consist of BMW, DaimlerChrysler, Motorola,
Philips, GM and Bosch) have been working together in developing the
requirements for an advanced communication system for future automotive
applications. These six companies have brought together their respective areas
of expertise to define a communication system that is targeted to support the
needs of future in-car control applications.
However, FlexRay is still in specification development phase. It will take four to
five years to implement it completely and to start manufacturing components
supporting FlexRay commercially. Till then, Time Triggered CAN can be used as
protocol for supporting real time applications. It is essential to ensure fault
tolerance in TTCAN. The work presented in this thesis can be used introduce
fault tolerance in TTCAN.
Any real time safety critical application of CAN in the field of Industrial
Automation can also utilize these schemes for increasing fault tolerance.
50
51
REFERENCES
1. Karen Parnell, “Automotive Electronics Digital Convergence-How to Cope
with Emerging Standards and Protocols”, AMAA 2004, Berlin.
2. Stephen Channon and Peter Miller,” The Requirements of Future In-
Vehicle Networks and an Example Implementation”, SAE Technical Paper
Series 2004-01-0206.
3. Rienhard Maier, et al,” Time Triggered Architecture: A Consistent
Computing Platform “, IEEEmicro July/Aug 2002.
4. Patrick Leteinturier, et al,” TTCAN from applications to products in
automotive”, SAE Technical Paper Series 2003-01-0114.
5. Maria Bruce,” Distributed Brake–By-Wire Based on TTP/C”, ISSN 0280-
5316 ISRN LUTFD2/TFRT-5668 SE.
6. http://sciencedaily.com/releases/1998/119811031415.htm
7. S Shaheen, D Heffernan and G Leen,” A Comparison of Emerging Time
Triggered Protocols for X-by-wire Control Networks”, Proc. Instn Mech.
Engrs Vol 217 PartD: J.Automobile Engineering.
8. Christopher A. Lupini,”Multiplex Bus Progression 2003”, SAE Technical
Paper Series 2003-01-0111.
9. G Leen and D Heffernan,” Expanding Automotive Electronic Systems”,
IEEE Computer Jan 2002 P.88.
10. Naill Murphy,” A Short trip on the CAN bus”, Embedded System
Programming (8/11/03), embedded.com.
51
52
11. M.Farsi, et al,” An overview of CAN”, Computing and Control Engineering
Journal, June 1999.
12. Florian Hartwich, et al,” Integration of Time Triggered CAN (TTCAN_TC)”,
SAE Technical Paper Series, 2002-01.
13. http://www.can.bosch.com/content/TTCAN.html
14. www.tttech.com/technology/docs/protocol_comparisons/TTTech-
comparison_TTP-TTCAN-FlexRay.pdf
15. Thomas Fuehrer, et al,” Time Triggered Can (TTCAN)”, SAE Technical
Paper Series 2001-01-0073.
16. Joaquin Ferreira , Arnaldo Oliveria ,et al,” An Experiment to Assess Bit
Error Rate in CAN”, RTN 2004 - 3rd Int. Workshop on Real-Time Networks,
Catania, Italy.
17. Cedric Wilwert, et al, “Impact of Fault Tolerance Mechanisms on X-by-wire
System Dependability “,TRIO report 2003.
18. Guillermo Rodriguez-Navas, et al, “Harmonizing Dependability and Real
Time in CAN Networks”, RTLIA2003 - 2nd International Workshop on
Real-Time LANs in the Internet Age.
19. http://www.tttech.com/technology/docs/fault_handling/TTTech-Fault-
Handling-TTA.pdf
20. Holger Zeltwanger,”Time-Triggered communication on CAN”, SAE
Technical Paper Series 2002-01-0437.
21. B.Mϋller, et al,”Fault Tolerant TTCAN networks”, 8th iCC Las Vegas,
2002.
52
53
22. Matjaz Colnaric, Domen Verber,” Communication Infrastructure for IFATIS
Distributed Embedded Control Application”, RTN 2004 - 3rd Int. Workshop
on Real-Time Networks, Catania, Italy.
23. Ian Boster, Alan Burns, et al,”Comparing Real-Time Communication under
Electromagnetic Interference”, 16th Euromicro Conference on Real-Time
Systems (ECRTS'04), 2004 Catania, Italy
24. http://www.engin.umd.umich.edu/ceep/reports/200MidYearRichardson01.
html
25. Jose Rufino, “An Overview of Controller Area Network”, Proceedings of
CiA Forum- CAN for Newcomers, January 1997, Braga, Portugal.
26. http://www.esacademy.com/faq/calc/can.htm
53
54
ABSTRACT
DESIGN AND PERFORMANCE ANALYSIS OF FAULT TOLERANT TTCAN SYSTEM
by
AAKASH ARORA
MAY 2005
Advisor: Dr. Syed Masud Mahmud
Major: Computer Engineering
Degree: Master of Science
Continuous demand for fuel efficiency mandate “Drive-by-Wire” systems. The
goal of Drive-by-wire is to replace nearly every automotive hydraulic/mechanical
system with electronics. Drive-by-Wire and active collision avoidance systems
need fault tolerant networks with time triggered protocols, to guarantee
deterministic latencies. CAN is an event triggered protocol which has features
like high bandwidth, error detection, fault confinement and collision avoidance
based on message priority. However, CAN do not ensure message latency,
which is critical for real time application. TTCAN (Time Triggered CAN) removes
this fallacy of CAN by providing exclusive time windows for those messages that
need deterministic latencies. In addition to the exclusive windows, there are
arbitration windows too, which make way for event triggered communications. In
TTCAN, if an error occurs within an exclusive or arbitration window,
retransmission of the message is not allowed. If the message that encountered
54
55
the error is a safety critical message, then the transmission error can
compromise the safety of the vehicles. The thesis work presented here proposes
three techniques to increase fault tolerance of TTCAN systems. The proposed
techniques increase fault tolerance of TTCAN systems by improving system
matrix design and incorporating redundant bus. A detailed description of
architectures and algorithms required to implement these techniques has been
presented. These techniques have been studied analytically and by using
simulation. The results show significant improvement in the fault tolerance of
TTCAN.
55
56
AUTOBIOGRAPHICAL STATEMENT
AAKASH ARORA
I received my Bachelors degree in Mechanical Engineering from Punjab Engineering College, Chandigarh, India. My pursuit for challenging career in the field of automobiles helped me in making the decision to come to Detroit and to pursue higher degree in the field of Computer Engineering. I was lucky to be member of Dr. Syed M Mahmud’s IVTS research group at Wayne State University. Under able guidance of Dr. Mahmud, I was able to acquire knowledge and skills in the field of CAN and TTCAN. I have published papers mainly in the area of fault tolerant in-vehicle networks. I have designed and analyzed various methods to introduce fault tolerance in TTCAN. The details of algorithms, architectures and analysis of these methods are the major contribution of my thesis work. I have done internships at Suzuki India Limited and at Siemens Energy and Automation, Automotive Business Unit, Troy, Michigan. I have been selected for Engineering Rotation Program at Motorola Automotive, Deer Park, Illinois. In future also, I hope to contribute to the automotive world to the best my capabilities. My leisure time activities include playing basketball, running, listening to music, reading inspirational books, visiting new places and watching movies.
Publications: 1. Aakash Arora and Syed Masud Mahmud “Performance Analysis of a
Fault Tolerant TTCAN System ”, Proc. of the SAE 2005 World Congress, April 11-14, 2005, Detroit, Michigan, USA, Paper Number 2005-01-1538.
2. Aakash Arora, Praveen Ramteke and Syed Masud Mahmud, “A Fault Tolerant Time Triggered Protocol for Drive-by-Wire Systems,” proceedings
of the 4th
Annual Intelligent Vehicle Systems Symposium of National Defense Industries Association (NDIA), National Automotive Center and Vectronics Technology, June 22 –24, 2004, Traverse City, Michigan.
3. Praveen Ramteke, Aakash Arora and Syed Masud Mahmud “Feasibility of using Vehicle’s Power Line as a Communication Bus”, Proceedings
of the 4th
Annual Intelligent Vehicle Systems Symposium of National Defense Industries Association (NDIA), National Automotive Center and Vectronics Technology, June 22 –24, 2004, Traverse City, Michigan.
56