7
Assessment of multi-bus fault-tolerant communications Valter Filipe Silva ESTGA-University of Aveiro [email protected] Paulo Bartolomeu Micro I/O [email protected] Joaquim Ferreira University of Lisbon [email protected] José Alberto Fonseca DETI - University of Aveiro [email protected] Abstract—The Flexible Time-Triggered CAN protocol supports synchronous and asynchronous traffic over a CAN fieldbus. The use of bus and master redundancy is an evolution to the original FTT definition. A protocol supporting multiple buses and masters has been proposed. This paper presents experimental results allowing to assess the timeliness of the communication system. The conclusion is that the system reacts according to its theoretical formalization in a consistent way. I. I NTRODUCTION In the last few decades distributed embedded systems (DES) have been widely used in several application fields, ranging from industrial machinery to avionics, automotive systems and robotics. These applications have timeliness requirements that can only be met using synchronization protocols, including, in many cases, mechanisms for online reconfiguration and fault- tolerance. However, the support of such features increases the computing and network overhead, thus increasing the complexity of nodes and reducing the available bandwidth. FTT-CAN [1] is a synchronous time triggered protocol operating over a CAN network. It allows the coexistence of synchronous and asynchronous traffic with the provision of fault-tolerance and reconfiguration mechanisms using a single CAN fieldbus. Besides the control message overhead introduced at network level, FTT-CAN adds computational overhead in the middleware, which limits the maximum achievable bit rate when low power microcontroller based nodes are used, i.e. bandwidth is not fully exploited unless faster processors are applied. Hence, FTT-CAN is unsuitable for applications with high bandwidth requirements, for exam- ple video traffic. Currently, the maximum bit rate attainable in FTT-CAN implementations using PIC microcontrollers is 250 kbps [2]. Mobile robotics is one of the application fields where FTT- CAN can be employed [2]. Robots encompass sensors, some- times with high bandwidth requirements (e.g. video cameras), actuators and controller units often with low processing capa- bilities (e.g. microcontrollers). A preliminary study of sensor bandwidth usage in autonomous mobile robots is presented in [3] where examples of robots integrating high bandwidth sensors are described. For example, Mendes [4] proposes a sonar based navigation system that exploits almost all the available bandwidth in the CAN fieldbus. X-by-wire applications also require dependable networks, examples of such systems using redundant transmission media can be found [5][6]. Redundant buses are also used on the Ethernet to improve the redundancy and also to improve the bandwidth of the system [7]. Cisco systems uses a so called EtherChannel [8] to improve the bandwidth of the connections using more than one Ethernet channel. Systems like TTP/C [9] and FlexRay [10] provide depend- ability for safety critical applications by using a redundant communication medium where message replicas are transmit- ted. In FlexRay the redundant bus can also be used to increase the available bandwidth by enabling the transmission of non- redundant traffic in the redundant buses. These systems deal with a single redundant bus and are not flexible with reference to message scheduling. Pimentel [11] proposes redundant CAN-based systems with redundant buses, redundant interfaces and ECUs (Electronic Control Units), and redundant message transmission. A sim- plified middleware interfaces with the application, taking care of the redundant transmission and of selecting one of the redundant messages received. An example of a steering by wire system is used for demonstration in [12]. However, this solution does not increase the available bandwidth. Another CAN based networks employ more than one CAN bus, but do not improve the overall available bandwidth. Examples are TTCAN [13] networks and also the ones based on the Columbus Egg idea [14]. In previous work [15], [16], changes to the first version of 72 978-1-4244-3760-3/09/$25.00 c 2009 IEEE

[IEEE 2009 7th IEEE International Conference on Industrial Informatics (INDIN) - Cardiff, Wales, UK (2009.06.23-2009.06.26)] 2009 7th IEEE International Conference on Industrial Informatics

Embed Size (px)

Citation preview

Assessment of multi-bus fault-tolerantcommunications

Valter Filipe SilvaESTGA-University of Aveiro

[email protected] Bartolomeu

Micro I/[email protected]

Joaquim FerreiraUniversity of Lisbon

[email protected]é Alberto Fonseca

DETI - University of [email protected]

Abstract—The Flexible Time-Triggered CAN protocol supportssynchronous and asynchronous traffic over a CAN fieldbus.The use of bus and master redundancy is an evolution to theoriginal FTT definition. A protocol supporting multiple buses andmasters has been proposed. This paper presents experimentalresults allowing to assess the timeliness of the communicationsystem. The conclusion is that the system reacts according to itstheoretical formalization in a consistent way.

I. INTRODUCTION

In the last few decades distributed embedded systems (DES)have been widely used in several application fields, rangingfrom industrial machinery to avionics, automotive systems androbotics. These applications have timeliness requirements thatcan only be met using synchronization protocols, including, inmany cases, mechanisms for online reconfiguration and fault-tolerance. However, the support of such features increasesthe computing and network overhead, thus increasing thecomplexity of nodes and reducing the available bandwidth.

FTT-CAN [1] is a synchronous time triggered protocoloperating over a CAN network. It allows the coexistenceof synchronous and asynchronous traffic with the provisionof fault-tolerance and reconfiguration mechanisms using asingle CAN fieldbus. Besides the control message overheadintroduced at network level, FTT-CAN adds computationaloverhead in the middleware, which limits the maximumachievable bit rate when low power microcontroller basednodes are used, i.e. bandwidth is not fully exploited unlessfaster processors are applied. Hence, FTT-CAN is unsuitablefor applications with high bandwidth requirements, for exam-ple video traffic. Currently, the maximum bit rate attainable inFTT-CAN implementations using PIC microcontrollers is 250kbps [2].

Mobile robotics is one of the application fields where FTT-CAN can be employed [2]. Robots encompass sensors, some-times with high bandwidth requirements (e.g. video cameras),

actuators and controller units often with low processing capa-bilities (e.g. microcontrollers). A preliminary study of sensorbandwidth usage in autonomous mobile robots is presentedin [3] where examples of robots integrating high bandwidthsensors are described. For example, Mendes [4] proposes asonar based navigation system that exploits almost all theavailable bandwidth in the CAN fieldbus.

X-by-wire applications also require dependable networks,examples of such systems using redundant transmission mediacan be found [5][6].

Redundant buses are also used on the Ethernet to improvethe redundancy and also to improve the bandwidth of thesystem [7]. Cisco systems uses a so called EtherChannel [8]to improve the bandwidth of the connections using more thanone Ethernet channel.

Systems like TTP/C [9] and FlexRay [10] provide depend-ability for safety critical applications by using a redundantcommunication medium where message replicas are transmit-ted. In FlexRay the redundant bus can also be used to increasethe available bandwidth by enabling the transmission of non-redundant traffic in the redundant buses. These systems dealwith a single redundant bus and are not flexible with referenceto message scheduling.

Pimentel [11] proposes redundant CAN-based systems withredundant buses, redundant interfaces and ECUs (ElectronicControl Units), and redundant message transmission. A sim-plified middleware interfaces with the application, taking careof the redundant transmission and of selecting one of theredundant messages received. An example of a steering bywire system is used for demonstration in [12]. However, thissolution does not increase the available bandwidth.

Another CAN based networks employ more than one CANbus, but do not improve the overall available bandwidth.Examples are TTCAN [13] networks and also the ones basedon the Columbus Egg idea [14].

In previous work [15], [16], changes to the first version of

72978-1-4244-3760-3/09/$25.00 c© 2009 IEEE

Flexible Time-Triggered CAN (FTT) [1] were made. Thesechanges enable fault tolerance at physical level in the FTT-CAN protocol by adding extra buses, which also allowsincreasing the available bandwidth. The master redundancymechanism and the corresponding modifications in the FTT-CAN protocol architecturehave been described in [17], alsoaddressing the master errors. This paper presents a timelinessassessment of the multi-bus and multi-master FTT-CAN pro-tocol.

This paper is organized as follows: Section II provides abrief description of the FTT-CAN protocol by focusing onits architecture, operation and multi-bus extension. SectionIII presents the fault hypothesis while Section IV describesthe Bus and Master error detection mechanisms. The setupused for assessing the timeliness of the multi-bus multi-masterFTT-CAN protocol is described in Section V with emphasison the system architecture. Results are addressed in SectionVI highlighting and discussing both Bus and Master failures.Finally, the paper is concluded in Section VII with a reviewof its findings and with topics for further work.

II. FTT-CAN: A BRIEF INTRODUCTION

The FTT-CAN protocol (Flexible Time-Triggered communi-cation on CAN) [1] has been developed with the main purposeof combining a high level of operational flexibility withtimeliness guarantees. It uses the dual-phase elementary cycleconcept for isolated time and event-triggered communication.The time-triggered traffic is scheduled on-line and centrally ina particular node called a master, facilitating on-line admissioncontrol of requests, thus being managed in a flexible way,under guaranteed timeliness. The protocol relies on a relaxedmaster-slave medium access control in which the same mastermessage triggers the transmission of messages in several slavessimultaneously (master/multi-slave). The eventual collisionsbetween slave messages are handled by the native distributedarbitration of CAN.

In Figure 1 the general architecture of the system is de-picted. Note that more than one master and more than onestation can be used to provide redundancy.

Figure 1. FTT-CAN General architecture

FTT-CAN slots the bus time in consecutive ElementaryCycles (ECs) with fixed duration. All nodes are synchronizedat the start of each EC by the reception of a particular messageknown as EC trigger message (TM), sent by the masternode. Within each EC, the protocol defines two consecutivewindows: asynchronous and synchronous, corresponding totwo separate phases (see Figure 2). The first is used toconvey event-triggered traffic, here called asynchronous be-cause the transmission requests can be issued at any instant.The second is used to convey time-triggered traffic, herein

called synchronous for the reason that its transmission occurssynchronously with the ECs. The synchronous window of thenth EC has a duration that is set according to the trafficscheduled for it. The schedule for each EC is conveyed bythe respective EC trigger message (see Figure 3). Since thiswindow is placed at the end of the EC, its starting instant isvariable and encoded in the respective EC trigger message.

Figure 2. The Elementary Cycle

Figure 3. Master/multislave access control and EC schedule coding scheme

The FTT-CAN protocol communication requirements areheld in a database located in the master node [18], the SystemRequirements Database (SRDB). This database holds severalcomponents, one of which is the Synchronous RequirementsTable (SRT), that contains the description of the periodicmessage streams. Based on the SRT, an on-line schedulerbuilds the synchronous schedules for each EC (EC schedules).These schedules are then inserted in the data area of theappropriate EC trigger message (see Figure 3) and jointlybroadcasted. Due to the on-line nature of the schedulingfunction, changes performed in the SRT at run time will bereflected in the bus traffic within a bounded delay, resultingin flexible behaviour.

It should also be noticed that, in FTT-based systems, theslave nodes are not allowed to retransmit synchronous mes-sages affected by errors in order to keep room for the messagespreviewed for the EC where the error occurred. This issue wasaddressed in [19].

A. FTT-CAN with multiple buses

One recent improvement on the FTT-CAN presented bySilva and Fonseca is the use of the Master to control more

2009 7th IEEE International Conference on Industrial Informatics (INDIN 2009) 73

than one bus in the system [16][15]. The use of more thanone CAN bus (at least two) is advantageous to improve thebandwidth availability and fault tolerance of the system bymeans of paralell message issuing on different communica-tion channels (buses). This approach allows overcoming theCAN bandwidth limitation and the single point of failureproblem affecting the CAN bus [16]. Besides, this architectureinherits the dispatching flexibility of FTT, which enables on-line modifications in the conveyed traffic. This is useful tofulfil application requirements or to react to bus failures. Thearchitecture of FTT-CAN protocol using more than one bus ispresented in Figure 4.

Figure 4. FTT-CAN using multiple buses

Note that the slaves can be connected to a single CAN bus orto a set of buses, depending of the tasks they have to performand the associated fault tolerance and bandwidth requirements.

A trigger message encoding information about the Elemen-tary Cycle (size of the synchronous window, messages to betransmitted, etc.) must be transmitted in all communicationchannels (buses) using an approach similar to the one appliedin FTT-CAN single bus systems. Currently, the ElementaryCycle is identic and synchronized in all buses. Thus, triggermessages are issued simultaneously and the bus time is slottedin a similar way. Figure 5 presents an example of a timelinefor communication a system using two FTT-CAN buses.

Figure 5. Bus timing with two buses

In the example shown in Figure 5, Synchronous message1 is replicated in both buses, which yields an improved faulttolerance. Synchronous messages 2 and 3 do not require a highlevel of reliability and, as such, are not sent in both buses, thusallowing to save bandwidth for other messages.

As synchronous messages are scheduled by the FTT-CANmaster, modifications had to be implemented in order toenable the message scheduling in more than one bus. TheSynchronous Requirements Database has also been revised toencompass information about the bus where messages are tobe issued. This is described in [17].

III. FAULT HYPOTHESIS

The multi-bus FTT-CAN architecture presented in this paperis able to detect partitions in one or more buses, but notsimultaneously.

• Only transient errors that affect the bus in more than oneElementary Cycle plus a guardian time (further denotedas TMRW , Trigger Message Retransmission Window)can be detected and treated by the system.

• The system must have a minimum of one error-free busat any time. This ensures that masters have at least onebus to communicate.

• Electrical magnetic interference does not occur in all thebuses simultaneously. If buses are deployed in differentlocations this is a reasonable assumption.

• There is only one active master per elementary cycle.• If a message is transmitted between masters in a given

bus, then it is not partitioned.• Master nodes have a fail silent behavior both in time and

value domains.• Besides detecting bus partitions, it is assumed that the

proposed architecture also detects stuck-at faults.

IV. BUS AND MASTER ERROR DETECTION

A. Bus error

As presented, besides redundant buses, the system encom-passes redundant masters, which implies that the envisagederror detection mechanism must acknowledge both bus andmaster errors, according to the defined fault hypothesis. There-fore, the system uses the systolic nature of the Trigger Messageand the position of the masters (at the end of the buses) inorder to detect bus and master errors.

A bus is assumed faulty if the Trigger Message issued bythe active master is not received in the backup master withina given amount of time (TMRW ). If a backup master detectsa faulty bus, an Assynchronous Message with high priority istransmitted to the active master (in all buses) to indicate thata reconfiguration is required. The reconfiguration correspondsto the rescheduling of all the messages from the faulty busto remaining ones. This procedure enables a (degradated)working mode that notifies the fault to the maintenance staff.Figure 6 presents the timeline for a bus error detection.

Figure 6. Bus error timeline

In Figure 6, the assynchronous message issued by thebackup master to indicate an error in Bus B is indentifiedas “BEAM” (Bus Error Assynchronous Message). The timeelapsed since the transmission of the last Trigger Message(TM0A in Figure 6) and the instant of the error is presented

74 2009 7th IEEE International Conference on Industrial Informatics (INDIN 2009)

as tphase. After missing the TM1B in Bus B, the backupmaster waits the retransmission window (TMRW ) to issue the“BEAM” message. The delay from the error to the receptionof this message is identified as tBEAM . After its reception, theactive master changes the Bus B messages to Bus A (indicateas Bus-Changing) and their schedule will appear in TM4A(elapsing tre_schedule time).

Note that, the new schedulle cannot be present in TM3Asince the schedulling for this EC is initiated during the TM2xand terminates before the reception of the request message(to guarantee the mutual exclusion to internal tables, likepresented in [17]). This is also shown in figure 6.

B. Master error

Concerning master error, as defined in the fault hypotesis,at least one bus at a time will be working correctly. As such,if the backup master does not receive the Trigger Messages inany bus, the active master is faulty. The backup master waitsthe retransmission window (TMRW ) before gaining controlof the buses and starting the Trigger Message transmissionin degradated mode. Here, if the backup master is the onlyworking master, bus error detection is unsupported [17]. Figure7 depicts the timeline of a error in the active master.

Figure 7. Master error timeline

In Figure 7, tphase indicates the time elapsed from the lastTrigger Message (TM0x) to the error occurrence in the FTT-CAN Master. Since the TMs can have a different numberof bits (e.g. due to bit suffing), two different delays weredefined: tTM_first and tTM_second. The former is the timeelapsed from the error in the active master and the first TMreception instant. The later is similar, but regarding the secondTM reception.

In order to evaluate the addressed delays, a measurementsystem was developed and it is fully described in the followingsection.

V. MESUREMENT SETUP

A measurement setup was developed to evaluate the time-liness of a multi-bus multi-master FTT-CAN communicationsystem. This setup aims at providing the basic features re-quired for the assessment of bus and master redundancy and,as such, it includes an active master and a backup masterconnected by two CAN buses (Bus A and Bus B). The eva-luation of the delays associated with reconfiguration as resultof bus or master errors is conducted using an extended DelayMeasurement System (xDMS) [20] specifically designed forthis application.

The xDMS has the ability to impose either bus or activemaster errors and to measure the time elased from the errorto the event instants described in the previous section. ThexDMS allows to conduct multiple trials in an automatedfashion, i.e. users can program a set of parameters (e.g.minimum and maximum error phase - tphase) and performa batch of trials using a common configuration. The results ofminimum, maximum and average delay together with standarddeviation and delay histogram are updated in the end ofeach trial, becoming available for download at the end of thetrial batch. The xDMS interfaces with a PC using EIA232communications for configuration procedures and downloadof statistical data in the end of a trial batch.

The xDMS employs a microcontroller (DsPIC 30F6012A)commanding two multiplexers (based on 4066 ICs), whichare able to force errors either by disrupting the bus or bysetting the bus to a dominant or recessive state. Besides buserrors, the xDMS can induce an error condition in the activemaster by maintaining it in a reset state. Furthermore, thexDMS has also reset control over the backup master thusallowing the start of new “fresh” trials. Figure 8 presents ageneral architecture of the measurement setup including thexDMS. Figure 9 displays a picture of the measurement setup,excluding the PC presented in Figure 8.

Figure 8. Measurement system architecture

Figure 9. Develloped System

2009 7th IEEE International Conference on Industrial Informatics (INDIN 2009) 75

VI. RESULTS

The results presented in this section were obtained usingthe measurement setup previously described. For the followingdiscussion, a bus error consists on its segmentation or settingin a dominant state. For proof-of-concept, the recessive stateis also evaluated.

Results are divided into two delay categories: absolute andrelative. Absolute delays are measured from the last TiggerMessage before the occurence of the error while relative delaysare measured from the error instant itself.

A measurement (trial) is conducted using the followingsequence of operations: reset the masters (active and backup),wait for a random number of Trigger Messages, wait for arandom amount of time (less than an Elementary Cycle) andapply the specified error. The random amount of time elapsedsince the last Trigger Message (tphase in Figure 6 and 7) islimited by the first next received Trigger Message.

Figure 10. Maximum of tphase

Figure 10 describes the delay components of the maximumpermitted phase for error injection. The delay between the dis-patching of two messages in two buses results from processingoverhead on the master, represented in Figure 10 by t1. Thetime duration of a Trigger Message (TM) is not constant, i.e.it dependends on the payload and on the associated stuffingbits. The worst case scenario occurs when one TM has themaximum length (e.g. TM0B) and the other has the minimumlength (e.g. TM1A). In Figure 10, the size of the TriggerMessage and of the Elementary Cycle is represented bytTM and tEC , respectively. The extended Delay MeasurementSystem handles Trigger Messages interrupt service routines.The time elapsing from the end of a TM to the instant of itsprocessing in the interrupt service routine is represented by t2.

Thus:

tphase_max = tEC − t1 − t2 − (tTM_max − tTM_min)

Where tTM_max is the maximun length of a TM andtTM_min is the minimum TM length. Thus, tTM_max −tTM_min is the maximum delay associated with bit stuffingin a TM. tphase_max is the value of the maximum phase thatcan be used (tphase). This value ensures that the error is nevertriggered in the time window elapsing between the end of TMtransmissions, which could result in an error injection on thenext EC, thus invalidating the trial due to error triggering inthe wrong EC.

Provided that each EC has tEC = 5ms, t1 = 28μs and t2 =89μs (both measured with an oscilloscope), and tTM_max −tTM_min = 16 × 1/(250 × 103) = 64μs (transmission of

Parameter Value

Number of TMs 0 to 1000 (uniformly distributed)tphase 0 to 4.819ms (uniformly distributed)

Number of trials per batch 1000

Table IXDMS PARAMETERS

16 bits - maximum number of stuffing bits - at a bit rate of250kbps), the phase of an error (tphase) can take the maximumvalue of 4.819ms.

The results presented in the following subsections wereobtained with the xDMS configured with the parameterspresented in Table I.

A. Bus error

As defined, bus errors are triggered in random instantsuniformly distributed along the duration of an ElementaryCycle. Provided that each EC has tEC = 5ms, t1 = 28μsand t2 = 89μs (both measured with an oscilloscope), andtTM_max − tTM_min = 16 × 1/(250 × 103) = 64μs(transmission of 16 bits - maximum number of stuffing bits -at a bit rate of 250kbps), the phase of an error (tphase) cantake the maximum value of 4.819ms.

Figure 11 presents an histogram of the bus error assyn-chronous message delay (tBEAM ) measured relativelly to theinstant of the error injection. As it is shown in Figure 6, thisdelay is random (depends on the instant where the error wasinjected, tphase) and takes values bewteen 5ms and 10ms (1and 2 ECs, respectively). However, if tBEAM is measuredfrom the last TM, it is reasonably constant as shown in Figure12. Note that each tBEAM here represented is the sum of arelative tBEAM with the corresponding tphase.

Figure 11. tBEAM histogram (relative delay)

As it can be seen in Figure 12, the time elapsing fromthe last TM to the transmission of the rescheduling requestmessage is almost constant in all trials, which suggests thatthe backup master recognizes the bus error and requests abus rescheduling in a consistent and timely fashion. Thesmall difference between the maximum and minimum valueobserved (24μs) results from the stuffing bits imposed in theTrigger Message. The average tBEAM is 10.1818ms with astandard desviation of 6.68μs. Note that tBEAM includes theduration of the associated TM and, as such, can take valueshigher than 10ms (2 ECs).

76 2009 7th IEEE International Conference on Industrial Informatics (INDIN 2009)

Figure 12. tBEAM histogram (absolute delay)

After the reception of the request message, the active masterreschedules the faulty bus traffic in the remaining (opera-tional) buses. The new schedulling will appear two TriggerMessages ahead (TM4 in Figure 6). This time is denoted bytre_schedulling and was measured using the xDMS with the setof parameters specified in Table I. The corresponding resultsare shown in relative and absolute terms in Figure 13 and 14,respectively. Relative delays are measured from the occurrenceof the error while absolute delays are measured from the TMbefore the error.

Figure 13. tre_scheduling histogram (relative delay)

Figure 14. tre_scheduling histogram (absolute delay)

These histograms show that the time elapsed from anerror to the rescheduling of the syncronous messages has amaximum value of 19.92ms, i.e. the system takes a maximumof 3 Elementary Cycles to recover from the error. Note alsothat the standard desviation for the absolute tre_scheduling

delays is 4.36μs.

B. Master error

As in bus errors, the xDMS waits a random number of ECs(uniformly distributed) before starting the phase waiting pro-cedure that preceeds the (Active Master) error injection. Afterthis error, the Backup Master waits a period of time denoted byTMRW before gaining control of the buses and transmittingthe next Trigger Message. As such, the transitional ElementaryCycle will have an increased duration (see Figure 7) that isdependent on the defined TMRW (1.125ms).

As illustrated in Figure 7, there is a small difference in thereception instants of the Trigger Message in both buses. Thisoccurs because the Backup Master has a single processor ar-chitecture that hinders simultaneous transmissions. Therefore,TM transmissions are dephased by a small amount of timethat, toghether with the TM length variation introduced bystuffing bits, results in different reception instants. The his-togram of the relative delays associated with the first TriggerMessage reception, after the Active Master error (tTM_first),is presented in Figure 15. Figure 16 shows the absolute delayscorresponding to the sum of tphase and relative tTM_first

delays.

Figure 15. tTM_first histogram (relative delay)

As documented in Figure 16, the first Trigger Message isreceived after an average delay of 6.927ms with a standarddesviation of 3.12μs.

The absolute delay for the reception of the second TriggerMessage (tTM_second) is shown in Figure 17. It can be seenthat tTM_second and tTM_first are a similar although havingdifferent dispatching instants and number of stuffing bits.

For the set of measures presented in Figure 17, the average

Figure 16. tTM_first histogram (absolute delay)

2009 7th IEEE International Conference on Industrial Informatics (INDIN 2009) 77

Figure 17. tTM_second histogram (absolute delay)

delay is 6.965ms with a standard desviation of 3.12μs. Notethat the difference of the average delays of tTM_first andtTM_second is 40μs (equal standard desviation). As statedbefore, this difference results from the delayed trigger of theTM (t1 = 28μs) plus the delays associated with stuffing bits.

VII. CONCLUSION

In previous work, the FTT-CAN has been upgraded touse more than one bus in order to increase the availablebandwidth and fault-tolerance. In addition, employing morethan one master allows the reaction to master errors, whichalso improves the fault-tolerance of the system.

This work has experimentally assessed the FTT-CAN fault-tolerant capabilities resulting from a multi-bus multi-masterarchitecture. More specificaly, it has addressed the protocolsthat react to a bus (partition or seting of a dominant state)or a master error. In both scenarios, the delay elapsing fromthe error to the instant of the system reconfiguration hasbeen verified. In what concerns bus errors, the system takes amaximum of 20ms to recover while in a master error conditionthe recovery time depends on EC length and the specifiedTrigger Message Retransmission Window.

Several histograms and values of average delay and standarddeviation have been presented for the key measurementsshowing that the system behaves as expected and consistentlywith the timelines presented in Figures 6 and 7.

A strong influence of bit stuffing was noted in the ab-solute measures for tBEAM , tre_scheduling , tTM_first andtTM_second due to the relevant occurrencies observed ininstants multiple of 4μs (bit time for the communication bitrate).

The xDMS is an improved version of the Delay Mea-surement System developed by the authors, which has beenupdated to fullfill the requirements of this system.

The tested system and xDMS can be applied to othersfieldbuses or even to wireless communications systems. Thisis an important future work direction.

ACKNOWLEDGMENT

Valter Silva is supported by Fundação para a Ciência eTecnologia under grant PRODEP 2001 - Formação Avançadade Docentes do Ensino Superior Nº 200.019.

Paulo Bartolomeu is supported by Fundação para a Ciênciae a Tecnologia under grant POSI 2004 Medida 6.1, Ref.SFRH/BDE/15602/2006 and by Micro I/O.

This work was partially supported by ARTIST2, NoEon Embedded Systems Design, (EC-IST - IST-004527) andNITEC/114/20/06 (71/00101), TIGELA, Programa PRIME,funded by ADI (Agência de Inovação).

REFERENCES

[1] L. Almeida, P. Pedreiras, and J. A. Fonseca, “The FTT-CAN Protocol:Why and How,” IEEE Transactions on Industrial Electronics, vol. 49,no. 6, pp. 1189–1201, December 2002.

[2] V. Silva, R. Marau, L. Almeida, J. Ferreira, M. Calha, P. Pedreiras, andJ. Fonseca, “Implementing a distributed sensing and actuation system:The CAMBADA robots case study,” in Proceedings of the 10th IEEEConference on Emerging Technologies and Factory Automation, 2005.ETFA 2005, vol. 2, September 2005, pp. 781–788.

[3] V. Silva, J. Fonseca, U. Nunes, and R. Maia, “Communications Re-quirements for Autonomous Mobile Robots: Analysis and Examples,”in Proceeding of FeT 2005, Elsevier, Ed., Mexico, November 2005, pp.91–98.

[4] A. Mendes, “Detecção e seguimento de alvos com Laser Range Finder,”Master’s thesis, University of Coimbra, 2004.

[5] P. Yih and J. Gerdes, “Modification of vehicle handling characteristicsvia steer-by-wire,” IEEE Transactions on Control Systems Technology,vol. 13, pp. 965–976, 2005.

[6] B. Zheng, C. Altemare, and S. Anwar, “Fault tolerant steer-by-wire roadwheel control system,” in Proceedings of the 2005 American ControlConference, 2005., vol. 3, June 2005, pp. 1619–1924.

[7] F. Koch and K. T. Hansen, “Redundancy performance of virtual networksolutions,” in Proceedings of the IEEE Conference on Emerging Tech-nologies and Factory Automation ETFA06, Czech Republic, September2006, pp. 328–332.

[8] D. Hucaby, CCNP BCMSN Oficial Exam Certification Guide. CiscoPress, 2007.

[9] H. Kopetz and G. Grünsteidl, “TTP – A Protocol for Fault-TolerantReal-Time Systems,” IEEE Computer, vol. 27, no. 1, pp. 14–23, January1994.

[10] F. Consortium, “FlexRay Communications System - Protocol Specifica-tion, v2.0,” FlexRay Consortium, Tech. Rep., 2004.

[11] J. Pimentel, “Safety-Reliability of Distributed Embedded Systems FaultTolerant Units,” in Proceeding IECON 2003, vol. 1, November 2003,pp. 945–950.

[12] ——, “An Architecture for a Safety-Critical Steer-by-Wire System,”Proceedings of the SAE World Congress 2004, 2004.

[13] B. Müller, T. Führer, F. Hartwich, R. Hugel, and H. Weiler, “Faulttolerant ttcan networks,” in Proceedings of 8th International CANConference. CAN in Automation GmbH, Oct 2002.

[14] J. Rufino, P. Veríssimo, and G. Arroz, “A Columbus’ egg idea forCAN media redundancy,” in Digest of Papers, The 29th InternationalSymposium on Fault-Tolerant Computing Systems. Madison, Wisconsin,USA: IEEE, Jun. 1999, pp. 286–293.

[15] V. Silva, J. Fonseca, and J. Ferreira, “Using FTT-CAN to the FlexibleControl of Bus Redundancy and Bandwidth Usage,” in Proceedings ofthe 11th International CAN Conference iCC 2006, Sweden, September2006, pp. 5.9 – 5.15.

[16] V. Silva and J. Fonseca, “Using FTT-CAN to Combine Redundancy withIncreased Bandwidth,” in Proceedings of the 2006 IEEE InternationalWorkshop on Factory Communication Systems, June 2006, pp. 54–62.

[17] V. Silva, J. Fonseca, and J. Ferreira, “Adapting the FTT-CAN Master forMultiple-bus Operation,” in Proceedings of the 5th IEEE InternationalConference on Industrial Informatics, Vienna, Austria, July 2007.

[18] P. Pedreiras, “Supporting flexible real-time communication on dis-tributed systems,” Ph.D. dissertation, University of Aveiro, Portugal, July2003.

[19] J. C. Ferreira, “Faul-tolerance in flexible real-time communicationsystems,” Ph.D. dissertation, University of Aveiro, 2005.

[20] P. Bartolomeu, V. Silva, and J. Fonseca, “Delay measurement systemfor real-time serial data streams,” in Proceedings of 12th IEEE Con-ference Emerging Technologies and Factory Automation, 2007 (ETFA),September 2007, pp. 516–523.

78 2009 7th IEEE International Conference on Industrial Informatics (INDIN 2009)