Heteogenous Real-Time Rapid Prototyping Environmentpapers/compendium94-03/papers/1998/codes98/pdffiles/codes98_035.pdfHeteogenous Real-Time Rapid Prototyping Environment, j I I,

[-

Towards Interprocess Communication and Interface Synthesis for aHeteogenous Real-Time Rapid Prototyping Environment*

,

j

I

I

,

I

I

Franz Fischer Annette Muth Georg FarberLaboratory for Process Control and Real-Time Systems, Prof. Dr.-Ing. Georg Farber

Technische Universitat Mtinchen, Germany{Franz.Fischer,Annette.Muth,Georg.Fiirber} @Ipr.e-technik.tu-muenchen.de

Abstract

Rapid Prototyping has been proposed as a means to re-duce development time and costs of real-time systems. Our

approach uses a heterogeneous, tightly coupled multipro-cessor system based on off-the-shelf components as targetarchitecture for an executable prototype, which is gener-

atedfrom the specification in an automated design process.Here, too, we aim to use existing tools and languages. But

interface and communication synthesis, while being the keyrequirement of an automated translation of a abstract spec-ification to a distributed system, is not yet state-of-the-art.The sensitivity of the overall performance of multiproces-sor systems to overhead and latency introduced by com-munication on the other hand calls for an efficient inter-

process communication (IPC). This paper presents conceptand implementation of IPC functions which, implementingthe message queue semantics of the specification languageSDL, links the standard components of our multiprocessorsystem in an efficient manner, while at the same time pro-viding the interface synthesis needed by the automated gen-eration of a rapid prototype. The experiences gained whenimplementing a non-trivial, real-world CAN controller andmonitor application on our rapid prototyping environment,are described as a first proof of concept.

Keywords: communication synthesis, rapid prototyping,SDL,hard real-time, task classification model;

1. Introduction

The aim of rapid prototyping real-time applications is tosubstantially reduce development times by confirming thefunctional and timely requirements of the application at avery early stage of development with the help of an exe-cutable prototype.

'The work presented in this paper is supported by the DeutscheForschungsgemeinschaftas part of a research programme on "Rapid Proto-typing for Embedded Hard Real-Time Systems" under Grant Fa 109111-1.

1092-6100/98 $10.00 @ 1998 IEEE

mu' m...""'..

"""1-],"."

,., '0000'"./ /">-

,,-'

,,'

,,< 1/

m Cla_.,S"..I.""","'dlon-

~c"'-"C_"."'_'_Ta..~ Cla_U""""_a"...T....

[gjjCia- " ~_apo'"T.."[illjcla_"H., Tau'0'

'0',,' ,,' ,,'

,,10 ~=.,,"'::..],,',,'

Figure 1. Classification of application tasks

When the real-time application has hard deadlines, thefocus of the development lies not only on the functional cor-rectness of the system, but also on the proof that it will meetall its deadlines. A real-time analysis that is able to deliverthis proof needs the worst case execution times (WCETs) ofthe implementation.

Our rapid prototyping target architecture REAR(Rapid Prototyping Environment for Advanced Real-TimeSystems) was designed to support real-time analysis inguaranteeing realistic, not-too-pessimistic worst-case ex-ecution times. The basis for this is the task classifica-tion model presented in [3], where each type of real-timetask corresponds to a best suited processor type, in termsof performance and deterministic execution times. It isa configurable and scalable heterogeneous multiprocessorsystem consisting of standard off-the-shelf components,which are tightly coupled by a global PCI-Bus. The pro-cessing units' basic difference is the loss of predictable per-formance caused by interrupts and context switches:

High Performance Unit HPU (Tasks Class 3): based onstandard computer architectures to benefit from tech-nological advances, it is in our case a PCI slot CPU

35

HPU ,.~",~r.!'!"."!:!'!!~..

Figure 2. REAR hardware architecture

with Intel Pentium processor, large L2-cache andmemory.The impact of interrupts and context switcheson predictabilityis limited by software means.

Real-Time Unit RTU (Tasks Class 2,1): optimized forsmall tasks with short response times, which do notallow predictions of the cache behaviour. Instead,the much slower RAM-access times have to be usedfor the computation of WCETs. Our current RTUis a MIPS R4600 based single board computer withpel-Interface and fast static RAM, running the freelyavailable light-weight kernel RTEMS.

Configurable I/O Processor ClOP (Tasks Class 0):FPGA-based unit for tasks with deadlines too short tobe met in software. The implementation in hardwareallows a straightforward computation of the WCETs.In our rapid prototyping application, the ClOP consistsof one Xilinx FPGA and additional Dual Ported RAM.It serves two purposes: It acts as a "event-filter" forevents with short deadlines, and it links the rapidprototyping platform to the embedding process in aconfigurable manner.

Design Process To allow a schedulability analysis of theresulting generated system in addition to the automatedtransformation, our rapid prototyping design process re-quires both, the specification of the systems behaviour aswell as a specification of the timing properties and con-straints of the system's embedding environment. The latteris done with the help of event streams, their associated max-imum response time and event interdependences [5], whilewe use SOL for the functional specification. The specifica-tion is followed by the allocation of the SOL processes tothe available processing units, according to the task classi-fication model and based on the maximum response timesand an estimated computational complexity. After that, highlevel language code is generated for the software and hard-ware parts and communication and HW/SW interfaces aresynthesized, before the software parts are compiled andlinked for the respective processing unit and the hardwareparts synthesized for the FPGA-based CIOP(s).

Communication synthesis for distributed real-time sys-tems has recently received considerable interest, e. g. [7,1,2]. While our approach does not claim the generalityofthese systems - system architecture, bus topologyandpro-tocol are known after all - we need to achievean efficientsolution for this particular environment and aim to reachthehigher degree of automation required by a rapid prototypingsystem. T

For code generation we decided to use existing tools:SOT's "C advanced" code generator for the SW side(map-ping one SOL process in our case to an RTEMStask)andthe SOL-to-VHOL translator originally intendedto supporthardware design on a high abstraction level [4]. As bothtools currently do not support mixed HW/SW systems,wedeveloped a concept to integrate them to a synthesissystemfor our heterogeneous rapid prototyping target architectureREAR.

The following section describes this approach and theimplementation of the underlying inter unit communicationmethods. Section 3 summarizes the results of a case studybefore the paper closes with conclusions and an outlookonfuture work.

2. An Approach to Interprocess Communica-tion and Interface Synthesis

A SOL specificatioQ consists of a set of processescommunicating via asynchronous Signals which optionallycarry arbitrary data and imported or viewed variables. Pro-cesses are specified as extended finite state machines. Astate transition is triggered and the according SOL state-ments executed by the reception of a signal (INPUT) ora boolean expression including imported/viewed variables(so called continuous signals) or both (signal with enablingcondition). For the reception of signals, each process hasa private FIFO input queue, where signals may be sent byother processes (OUTPUT).

2.1. SDT generated code

SOT supports code generation for the so called tight in-tegration with a real-time operating system: A SOL pro-cess and its input queue are implemented as one RTOSthread and one RTOS FIFO message queue. INPUT andOUTPUT statements are translated to macro calls which fi-nally expand to the RTOS's message queue receiveand sendcalls. Shared (exported/revealed) variables are mapped to(buffered) global variables.

As stated above SOT does not directly support mixedHW/SW systems: The usual way to handle HW parts is touse one or more external tasks to do all the dirty work andcommunicate with the SOL processes by sending/receiving"hand coded" SOL signal structures. However, this andthe

36

Figure 3. Integrating available code genera-tors to a synthesis system

macro based code fonns the path to an integrated HW/SWsynthesis system (see below).

2.2. SDL-to-VHDL translator

The tool translates each SDL process to a VHDL entitybasedon configurable VHDL code fragments. For generat-ing synthesizable VHDL code for asynchronous SDL sig-nals, the translator needs as inputs a set of signal imple-mentation descriptions (VHDL procedures) and a library ofprotocol descriptions (preprocessed to a VHDL package) inaddition to the SDL processes to generate code for. The pro-tocol descriptions can be used to connect external compo-nents or - as in our case - to implement the necessary in-terfaces (signal queues, registers, . . . ) to the software side.

2.3. Integration

Figure 3 gives an overview of the integrated synthesissystem. In addition to the set of application SDL processesit requires infonnation about the process mapping (HW/SWpartitioning), the mapping of signals at the HW/SW bound-ary to a particular HW/SW queue, an implementation of(optional) signal converter functions and the signal imple-mentationand protocoldescriptionsrequiredby the SDL-to- VHDL translator. The signal mapping infonnation isnecessary to generate for the software side

1. C macros to map local and remote SDL OUTPUTstatements to RTOS calls or the library routines (andoptionally signal converter) for inter unit communica-tion, respectively,

2. an ISR to handle communication interrupts from re-mote processing units, and

I

3. an initialization procedure for all necessary inter unitqueues.

For the hardware side the signal mapping infonnation isused to configure the HW/SW queue entities while the otherinfonnation is passed to the SDL-to-VHDL translator.

2.4. Inter-Unit Queue Implementation

The inter unit queue implementation is based on thetight coupling of the target architecture's processing units,i.e. that communicating units share a common memory re-gion. On REAR this can be e.g. a memory region accessiblethrough the PCI bus by another master capable processingunit or the ClOP's dual ported RAM. This common memoryregion is used to implement FIFO queues (for SDL signals)or for exported/revealed variables.

2.4.1 Software - Software

For communication between RTEMS threads on the RTUand Linux processes on the HPU, a simple module for mes-sage based IPC provides services to establish a communi-cation queue and to send messages to and receive messagesfrom the queue.The queue descriptor encapsulates infonna-tion concerning the sender and receiver threads or processesand a pointer to a FIFO queue for the messages. The queuesare located within the RTU's DRAM, which is accessibleby the HPU via PCI bus. A queue consists of the in andout indices and, in the current implementation, a config-urable fixed length message buffer array. A more flexiblealternative that requires less copy operations, but needs onespinlock or semaphore for each thread, is a buffer pool withlinked lists queue, which will replace the fixed arrays in thefuture.

The basic algorithm for receiving messages is as fol-lows: queue_getmsg() checks whether a message isavailable and if so, copies the message from the queue's tothe thread's message buffer and returns TRUE. A receivingthread will wait for a message to arrive and then notify theremote side of the receive queue being not full by trigger-ing an interrupt. queue-putmsg() perfonns the samefunction vice versa. On the RTU, the queue ISR handlesthe notifications from the HPU side and propagates themas RTEMS events to any waiting thread, which in turn isunblocked and re-checks its receive or send queue. On theHPU, access to the queue memory area as well as triggeringand handling notification interrupts is supported by a UNIXdevice driver.

2.4.2 Software - Hardware

A target architecture like REAR allows different implemen-tation alternatives for FIFO queues for SDL signals:

37

I. for signals with no or only little data, short queues canbe realized as FPGA internal FIFO, with the head (ortail) being accessible by a register;

2. larger FIFO queues make use of the ClOP's dualported RAM and implement the in and out indices aswell as additional control and status flags in a queuecontrol and status register within the FPGA; (this al-ternative is very similar to the software queue imple-mentation described in Section 2.4.1);

3. implementing only one (set of) SDL signal data regis-ter(s) is sufficient if either the interrupt latency on thesoftware side is short enough to avoid loss of a signal,or signal data are transferred to a queue in main mem-ory by bus mastering or direct memory access;

PROCEDURE send_canmsg (

SIGNAL clock, req, abort, done,IN std_Iogic;IN std...logic_vectornSOOWNTO 0);

OUT std...logic;

IN std...logic_vectorl 6 OOWNTO 0);

BUFFER std_'ogic_vectorl 3 OOWNTO 0);

BUFFER std_'ogic_vectorl 2 OOWNTO 0);

SIGNAL data,SIGNAL ack,SIGNAL base,SIGNAL inidx,SIGNAL offset,

) IS

BEGINoffset <= '000'; _ite <= '0';send, LOOP

WHILE NOT Ireq='l' OR abort='l' OR done='l') LOOPWAIT UNTIL clock' EVENT AND clock = '1';

ENO LOOP;EXIT send WHENabort= 'I' ;IF done = 'I' THEN

inidx <= inidx + '000,.;ENO IF;EXIT sendWHENdone='1 ' ;daddr(13 OOWNTO1) <= base;daddr( 6 OOWNTO3) <= inidx;doddr! 2 IJOWNTO0) <= offset;ddata <= data; dwrite <= '1'; -- write data

WAIT UNTIL clock'EVENT AND clock = '1';dwdte <= '0'; ack <= '1'; -- data writtenWHILE NOT req=' 0' LOOP

WAITUNTIL clock'EVENT AND clock = '1';ENO LOOP;ack <= '0';EXIT send WHEN reset= '1';offset<= offset+ '00,.;

ENO LOOP;ENO send...conmsg;

Figure 4. VHDL procedure to "send" a CANmessage to a DPRAM queue

For the case study described in Section 3 the second al-

ternative has been chosen to implement a transmit and a

receive queue for CAN messages. Central elements on theHW side are the queue control and status register (QCSR)and the send_carunsg () and recei ve_carunsg ( )

VHDL procedures (Fig. 4). The QCSR holds the two queue

indices, a DPRAM base address, not-full and not-emptystatus flags and clear, start and interrupt enable controlbits. The VHDL procedures were coded according to theprotocol descriptions required later for the SDL-to- VHDLtranslator.

For the SW side, library routines for enqueuing I de-queuing messages (SDL signals) in I from DPRAM queueshave been implemented. The dequeue routine checks thenot-emptyflagand if it is set, copiesthe messagefrom the

,-ro',up- ....

Figure 5. Task allocation and communicationof the CAN application

DPRAM address given in the QCSR and increments the outindex and is typically called from the ISR. In contrast, theenqueue routine is called by the SDL processes sending asignal to the HW side.

3. Case Study: a CAN Monitor Application

As a test bed for the application to be implemented onREAR, we built a CAN bus environment consisting of twoexemplary SUO-based IIO-cards emulating sensors andactors and two commercial CAN participants with moni-tor and analyzing software. Details on the realized CANenvironment can be found in [6].

From the user's point of view, the application performstwo functions: The CAN bus monitor allows the user tosend, receive and filter CAN messages, to monitor activityon the CAN bus. The SUO controller provides an interfaceto the SUO cards. The lower levels of the application haveto provide the distribution of CAN messages to and from theCAN monitor and the SUO controller (message routing)and all functions of a fully functional CAN bus participant[8]. The task classification model presented in I appliedto the identified functions yielded the task allocation shownin Figure 5. The CAN application was modeled in SDL- except for the graphical user interfaces implemented onthe HPU. The SDL process diagram is shown in Figure 6.All software processes were implemented as RTEMS-tasksusing the described mechanism to include the appropriateIPC-calls respectively RTEMS library functions. As theSDL-to-VHDL translator was not available for this exam-ple, the hardwareprocess(can_controller) was spec-ified in Statemate and translated to VHDL using Statem-ate's VHDL-generator. Together with the VHDL-part ofthe HW-SW-interface, it was fitted to the FPGA.

The implemented system met all the requirements posedby the planned CAN controller and monitor application, inparticular met all the deadlines of the CAN protocol up tothe maximum I MHz without message loss.

For a first evaluation of IPC performance, the queuefunctions described above were instrumented to write time

stamps to a memory buffer. The timing test applicationfor the SW/SW communication included one Linux pro-

38

Block can_application from...j>llnel 1(1)

[:::::Jto-l"'no'

[",,"-10_""']

Can...controUo,

Figure 6. SDL process diagram of the CANapplication

80.RTEMS roC ~ !

70~ ~-=:::::::: i ! '

f : ~~=~--~~~~=-fCJ::+=~-'5 f f I :! 30 ---tml- 1-- --- __j_m_~m_--i ----

20 11~III!III!III!III-

10~ t i iff Hf f H t f Iii-f t. . ~ . I .i I . .:. . .'. . .o.j. : ' !

0 50 100 150 200-1ongIh (byIoo)

250

Figure 7. performance of the simple messagequeue implementation

cess to send a message to a RTEMS thread, which af-ter being unblocked and receiving the message, sent theunmodified message back to the now blocked Linux pro-cess. I.e. the receive operations on both sides were blockingand involved processing an interrupt and a context switch,while the send could be performed without the sender beingblocked. Fig. 7 shows the measured (average) executiontimes of the queue_sndmsg () and queue_rcvmsg ( )

calls on RTEMS and Linux, respectively.Due to the similar implementation, the send opera-

tion from SW to HW is as fast as queue_sndmsg( )

on RTEMS, i.e. below 5 jJS. The opposite direction,however, involves the ISR overhead and the call ofrtems_message_queue_send () and therefore takesapproximately 20 jJS.

4. Conclusionsand Future Work

In this contribution we described concept and implemen-tation of efficient IPC functions which are integrated inthe automated design process for a rapid prototyping sys-

tern. It is now possible to map a SDL specification of areal-time system to a heterogeneous multiprocessor plat-form without regarding the details of the underlying hard-ware. Latency and overhead caused by the communicationare very low: The message queues between HPU and RTUand between RTU and ClOP introduce latencies in the be-low 20 jJs-range. The next steps include integrating thehardware part of the HWISW queue interface functions withthe SDL-to- VHDL translator, and at the same time imple-menting the HW/SW interface alternatives outlined in Sec-tion 2 (DMA, internal FIFO). The degree of automation canbe further increased by an automatic configuration of themessage queues on the hardware side, which includes allo-cation of SDL-signals to message queues and defining theirsize depending on the data types to be transmitted.

References

[1] I. Bolsens, H. J. De Man, B. Lin, K. v. Rompaey, S. Ver-cauteren, and D. Verkest. Hardware/software co-designof digital telecommunication systems. Proceedings of theIEEE, 85(3):391-418, Mar. 1997. Special Issue on Hard-ware/Software Co-Design.

[2] J. Daveau, G. Marchioro, C. A. Valderrama, and A. A. Jer-raya. Vhdl generation from sdl specifications. In Proceedingsof the XlIIIFfP Conferenceon Computer Hardware Descrip-tion Languages, CHDL'97, Toledo, Spain, Apr. 1997.

[3] G. Farber, F. Fischer, T. Kolloch, and A. Muth. Improvingprocessor utilization with a task classification model basedapplication specific hard real-time architecture. In Pro-ceedings of the 1997 International Workshop on Real-TimeComputing Systems and Applications (RTCSA '97), AcademiaSinica, Taipei, Taiwan, ROC, Oct. 27-29 1997.

[4] W. Glunz. Hardware-Entwurf auf abstrakten Ebenen unterVerwendung von Methoden aus dem Software-Entwurf. Dis-sertation, Fachbereich MathematiklInforrnatik, Universitat-Gesamthochschule Paderbom, Paderbom, Munchen, Apr.1994.

[5] K. Gresser. An event model for deadline verification of hardreal-time systems. In Proc. Fifth Euromicro Workshop onReal TImeSystems, pages 118-123, Oulu, Finland, June 1993.IEEE.

[6] A. Muth, F. Fischer, T. Hopfner, T. Kolloch, S. Petters, andS. Rudolph. Implementation of a CAN controller and monitorapplication on the rapid prototyping platform REAR. Techni-cal report, Lehrstuhl fUrProzessrechner, Technische Univer-sitat Munchen, Oct. 1997.

[7] R. B. Ortega and G. Borriello. Communication synthesisfor embedded systems with global considerations. In FifthInternational Workshop on Hardware/Software Codesign-Codes/Cashe '97, pages 69-73, Braunschweig, Germany,24-26 Mar. 1997. IEEE, IEEE Computer Society Press.

[8] Philips Semiconductors, Eindhoven, The Netherlands.PCA82C200, Stand-alone CAN Controller, Product Specifi-cation, 1992.

39

Documents

Heteogenous Real-Time Rapid Prototyping Environment*papers/compendium94-03/papers/1998/codes98/pdffiles/codes98_035.pdfHeteogenous Real-Time Rapid Prototyping Environment*, j I I,

Heteogenous Real-Time Rapid Prototyping Environmentpapers/compendium94-03/papers/1998/codes98/pdffiles/codes98_035.pdfHeteogenous Real-Time Rapid Prototyping Environment, j I I,