Porting ROME to a new architecturerome.sourceforge.net/downloads/v1.0/docs/PortingGuide.pdfPorting ROME to a new architecture May 1, 2001 Leslie J. French Distributed Systems Software

PortingROME to anew architecture

May 1, 2001

LeslieJ.FrenchDistributedSystemsSoftwareGroup

CCRL,NEC USA4 IndependenceWay

Princeton,NJ 08540-6634

You cangetthecurrentversionof this documentathttp://rome.sourceforge.netFor questionsandcommentsmailto:[email protected]

ROME andthe ROME utilities arefree software; you canredistribute themand/ormodify themunderthe termsof the GNUGeneralPublic Licenseaspublishedby the FreeSoftwareFoundation;eitherversion2 of the license,or (at your option) anylaterversion.

They aredistributedin thehope thatit will beuseful,but WITHOUT ANY WARRANTY; without eventheimplied warrantyofMERCHANTABILIT Y or FITNESSFORA PARTICULAR PURPOSE. SeetheGNU GeneralPublicLicensefor moredetails.

You shouldhave receiveda copy of theGNU GeneralPublicLicensealongwith thisprogram;if not,write to theFreeSoftwareFoundation,Inc.,59 TemplePlace- Suite330,Boston,MA 02111-1307

1

mailto:[email protected]

http://rome.sourceforge.net

CONTENTS CONTENTS

Contents

1 Intr oduction 5

2 Gather Documentation 5

3 GenerateCompiler 6

3.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Target-SpecificTools 6

4.1 Changesto getsym.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4.2 Thextip program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 General Ar chitectur e 7

5.1 AddressMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5.2 Endianness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.3 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5.4 Criticality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.5 InterruptsandFaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.6 Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5.7 Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.8 Process Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.9 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.10 Machine-DependentLibrary Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.11 DesignStrategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.12 Initialisation Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6 The Target File 14

6.1 Creating a new target file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.2 CPUSettings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.3 Compiler andLinker Directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.3.1 Linker Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.4 TheMake andInstall Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.5 TheHardwareFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6.5.1 Initial ProcessorState . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2

CONTENTS CONTENTS

6.5.2 MemoryRegion Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.5.3 Interrupt Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.5.4 Cachable Pointer Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.5.5 SCV64Device Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.5.6 ROME Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.5.7 Serial16650 UART Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.5.8 On-chip Timer definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.5.9 EndiannessMacros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7 The CPU plug-in 18

7.1 limits.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.2 stdargs.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.3 stdtypes.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.4 cpu_plugin.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.5 _link_first.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7.5.1 Entry-point Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7.5.2 Interrupt Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7.5.3 Scheduler Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

7.5.4 DebuggerSupport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7.5.5 Special SupportRoutines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.6 k960.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.7 debug.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7.8 disassembler.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

8 The ICU plug-in 35

8.1 icu.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

8.2 icu.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

8.3 icu_asm.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9 The Serial Interface 36

10 Testingthe idle process 37

11 The Timer 39

11.1 timer.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

11.2 timerlib.c andtimerlib.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

11.3 Timer Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3

CONTENTS CONTENTS

12 Interr upt Timing 42

13 Running the perf process 43

14 The SCV64 VMEb us controller 43

14.1 Additionsto theTargetFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

14.2 Thevme.hheader file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

14.3 Thesvc64.c sourcefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

14.3.1 Theinit routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

14.3.2 Theinterrupt handlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

14.3.3 Thesharedlibrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

14.3.4 TracingandDebugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

14.4 TheSCV64 Modulein RTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

15 Tuning the System 47

16 What if it doesn’t work? 48

16.1 New Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

16.2 Loader Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

16.3 Initialisation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

16.4 Serial-Line Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

16.5 Context Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

16.6 Interrupt Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

17 And Finally... 53

4

2 GATHER DOCUMENTATION

1 Intr oduction

Thisdocumentdescribeshow to write aROMEsystemfrom scratch for anew architecture. In ourexamplesystem thebasic procedureis dividedinto 10 steps:

1. Gatherdocumentation

2. Generate compiler/linker for target CPU

3. Portor write any target-specific tools

4. Designoverall implementationarchitecture

5. Construct thetarget file

6. Write thecpuplugin andinterrupt controller

7. Write thepolled-modeI/O routimes(theuartmodule)

8. Loadandtestthe idle process

9. Write thetimer module

10. Loadandexecutetheperf process(asour exampleapplication)

Eachof these steps is describedin its own chapter. As a concreteexample, ROME will beported to theCyclone CVME965 developmentboardfor the Intel I960HD CPU.Brief mention will be madeof thedesign considerationsfor otherarchitectures.

2 Gather Documentation

Building a ROME system for anembeddedapplication requiresdetailedknowledgeof thehardwareplat-form on which the system is to run. The first stageof contemplating sucha project is to gather all therequired information. At a minimum, this comprisesthe motherboard manual, processormanualandspecifications for thedevice chips. In this case, we have:

� “Cyclone MicroSystemsCVME965 Single Board ComputerUser’s Manual” (supplied with theboard)

� “I960HD UserManual” (availableon theWebfrom developer.intel.com)

� “SCV64” (sufficient descriptionis in theUserManual, additionaldataareavailablefrom www.tundra.com)

� “16550UART” (any versionof thisspec.is good, for exampletheNational SemiconductorPC16550Davailableon theWeb).

The I960HD hasan onboard timer, so thereis no external timer chip needed for this system. As this isa commercial developmentsystem, it is supplied with its own bootROM. In thecaseof a new hardwareplatform, you may have to write a boot ROM from scratch. In general, this is not straightforwardandadesign will bedrivenby many factorsnotdirectly related to ROME.Writing thebootROM is notcoveredaspartof this document. It is assumed thattheboot ROM alreadyexists,togetherwith its documentation.This adds onemorecomponentto thelist:

5

4 TARGET-SPECIFICTOOLS

� “MON960 Debug Monitor User’s Guide”

3 GenerateCompiler

Wherever possible, ROME usesthegcc toolkit, which is available for a wide rangeof target processors.Assumingthat the hostenvironment is a 386-basedLinux system,for most targetsa cross-compilationenvironment mustbecreated

3.1 Configuration

From the boot ROM manual, the target file downloadedto the system mustbe in COFFformat, so thecross-compilationsystemshould bebuilt for ani960 generating COFFoutput. This savesanextra stepinconverting from a second format (suchasELF) to COFFbefore downloading. Apart retrieving andun-tar’ing theinstallation(weusetheegcsversion of gcc), configuring andbuilding thegcccross-compilationsystem is usually aseasyas:

./configure --target=i960-coffmake

For full details on this procedureyou should consult thegcc-cross-compiling guide.

3.2 Installation

Oncebuilt, thevarioustoolsmustbeinstalledinto theappropriatedirectory within theROME treesothattheRTB system cangeneratethecorrect paths.TheToolsdirectory is arrangedfirst by manufacturer (inthis caseIntel) andsecondly by individual cpu(i960). Thecross-compiler toolkit (gcc, gas, etc.) is theninstalled in:

<ROME-ROOT>/Tools/Intel/i960/bin

4 Target-SpecificTools

Although thegcc toolkit is quitecomplete, it is alwayspossible thatadditional toolkit support is required.In this case, asthis is thefirst ROME system to useCOFF output, thesymbol-extraction programneedsto bemodified. Also, an XMODEM control programis needed to download the target through the bootROM monitor.

4.1 Changesto getsym.c

Thegetsymprogramextractsall theglobal symbolsdefinedin all objectfilesandcreatestheROME sym-bol tableasusedby thedebuggerandthePISA environment. Unlessyou plannever to usethedebuggeror the interpreter, you will needa version of getsymfor your environment. The initial implementationoperatedon ELF files, but the I960 port generatedCOFFfiles. In order to providesymbol-table support

6

5 GENERAL ARCHITECTURE 4.2 Thextip program

for I960 systems,either a new version of getsymcouldbe implemented,or thecurrentversion extended.Writing a new programfrom scratch might have been simpler, but astheoutput of getsym is closely tiedto theROME core,any changesto thesymbol tableformatwould break existing versionsof getsym.

As it turnedout, in this case,mostof thelogical structureof thecodewasunchanged whena new object-file format wasadded, so we wereable to extend the existing program. This waspossible becausethedifferentfile formatscaneasily bedistinguished.A setof COFFstructureswereadded to representCOFFfile formats, anda new proceduredofile_coff_i960 wasadded to populate the symbolstructure. A testwasadded to thedofile routine to checkthe formatof eachobject file asit wasprocessedandto call theappropriatehandler. In thiswaythegetsymprogramwasextended,to theextentthatit will copeevenwithmixed-formatobject files in a single build.

4.2 The xtip program

Mon960, assupplied on the CVME965 board, usesthe XMODEM protocol to download(COFF)filesinto the processor’s memoryfor execution, over the serial interface. This requiresa terminal programon the otherendof the serial line which canhandle both the regular keyboard/display operation andthexmodemdownload.This wasprovidedby a variantof the tip program,xtip, running on Linux, to which adownload commandwasadded. Thedetailsof thisprogramarenotrelevantto themainporting issues, buttheprogramprovidedthelink betweenthecross-compilation systemandthetargetsystem. In othercasesthis link hasbeenbridged through the tftp protocol, or writing bootablefloppy disks or other removablemedia.

5 General Ar chitecture

From the assembled documentation, a numberof design decisionscannow be made. Although manyaspects of the implementation will be fixed by the motherboardandCPU, thereis still somescopeforchoice,andfor makingtheright (or wrong)decisions. Thegeneralapproachis to “go with theflow” andadapt thearchitecture to thesystem,ratherthanforcea particularsolution from a previousversion.

5.1 Addr essMap

The I960 addressing scheme divides the address spaceinto 16 regions, somewith properties fixed bythearchitecture andotherdynamincally configurable, for examplefor mainRAM andVMEbusmemory.Chapter 2 of the User’s Manualdescribesthe useof the front-panel switches to set the VMEbus slaveaddress.Onthis particular board, theswitchesaresetto ‘00100’ giving aVMEbusaddressat2000.0000h(from table2-1).

Chapter 3 givesfurtherdetails of theaddressing. Locations000h–800h areon-chip SRAM which canbeusedasa local register stack. Region a000.0000his themainDRAM bank(which is wheretheprogramanddatawill be loaded) andthe othermemoryregions areusedfor I/O spaceaccessesasdescribed intable3-1.

Within theDRAM area,spacemustbeallocatedfor systemtables,in particular theInterrupt Vectors, andthestackfor the interrupt handler. A good approachis to reserve stack spacesat known locationsout ofthe way of the main code. In this case,we will start the main codepartway into the DRAM space, andallocatestackabove. We will setthecodeentrypoint at a00a.0000h, thesupervisor stack at a005.c000h

7

5.2 Endianness 5 GENERAL ARCHITECTURE

andtheinterruptstackata005.8000h. Thisgivesplentyof roomfor theinterruptstack andsomespaceforotherfixeddatastructures,if needed. Thesedecisionsarereflectedin the contents of sections 6.3.1and6.5.1below

5.2 Endianness

Mostsystems,particularly if they arrivewith aboot ROM, have theirendiannessfixedby thearchitecture.In somecases, trying to forcethe‘wrong’ endiannessis almost impossible, for examplePCIbusreally doesnotwork properly in abig-endianaddressingregime.Usually, theonly placewherethis is directly exposedto software is converting byte streamsto integersandvice-versa. Theseconversions should properly behandled throughthentohl family of macros.

Fromtable 3-2 in theUserManual,therecommendedmemoryconfigurationplaces mostof thememoryin little-endian regions with theexception of theSCV64deviceareain region F. Thus,the ntohl macrosmustbedefinedto byte-swapdatain mainmemory.

Endiannesis alsoa concernwhenit comesto compiling andlinking, to ensure thatstringsarecompiledin thecorrectsequenceof characters.Most gcc/as/ld versionsallow flagsto definetheendiannes.

5.3 Caching

Handling the instruction and datacaches is crucial to achievingreasonable processingon any modernCPU.The instruction cacheis relatively easy, thememoryregion containing theexecutablecodeshouldbemarkedcacheable, anddynamic self-modifying code is out. Depending on thearchitecture, it maybenecessaryto performanexplicit cacheflushduring initialisation, for exampleif jump tablesfor interruptsarebuilt in memory.

Handling thedatacacheis generally moredifficult, especially for applications thatexpect heavy interac-tions with external devices. Any regions of the addressspace usedto contain memory-mappedregistersfor devices mustbemarked uncachable. Also, any datain main memorythat is readfrom or written byexternal devicesmustnotberetainedin thecache. For theCVME965board, theprivateRAM areacanbemarked cachable,andcancontain the local dataandstackspaces for the ROME processes. All externalDMA datawill appearin theshared memoryregion(2000.0000hasdefinedabove)whichshould thereforebemarkeduncachable. This leavestheimplementation of theindividual devices with two choices,eitherto access thedatadirectly andpay theaccess-timepenalty, or to copy into cachedmemoryandhave thedata-copy overhead. The ‘right’ answerwill dependon thedata. For exampledatareadfrom a disk willlikely beaccessedsequentially, socopying into thecachewill win, whereas thefirst stageof filtering IPpacketsfor addressesandportswill accessonly asmallportion of thedata; thecopy canbedeferreduntilit is known that thedatawill beneeded.

Different approaches will apply to different architectures. In the MIPS environment, main memoryismappedinto two differentaddressing ranges, onegoing first to the cache, the other bypassingthe cacheandaccessing thememorydirectly. In this situation, it is possible to have close control over thecachingalgorithm, for examplethe data portion of all mblks is uncached, but the control portion is cached. Forthe I386 architecture, wherethedatacacheis very large, a cacheflush is anexpensive operation. In thiscase,a smallareaof memoryis markedasuncachedasusedasa buffer pool for interacting with devices,andthedataarecopied (exploiting thefastdata-movementassembler instructions) asneeded.

8

5 GENERAL ARCHITECTURE 5.4 Criticality

5.4 Criticality

TheROME systemrelies heavily on the‘critical section’ to implement interlocksbetweenprocessesandbetweencontexts. Therearetwo requirementsfor implementing critical sections:

1. Interruptserviceroutinesmustexecutein critical sections.Also,sending amessage(usually areply)from within an interrupt service routine mustnot causean immediatecontext switch, thatmustbedeferreduntil theroutine hascompleted.

2. The rome_start_critical routinemustenter a critical section andyield someform of token indicat-ing the previous criticality. That is, critical sections may be nested, andleaving a critical section(throughrome_end_critical) mustrestore thepreviousstate. Theform of thetoken is not specified,to allow maximumflexibility in theimplementation.

Theusual approachis to disable all interruptsin a critical section,either througha ‘global interrupt’ flagor special machine instruction. Sincethis is usually a privileged(kernel or supervisor mode)operation,this alonedictatesthatROME coderunsin aprivilegedstate. It is necessary thattheselectedmethodalsopermitsa way to determinecurrent statesothat it canberestored.

Chapter 11 in theI960Hx microprocessor manualexplainshow the interrupt state canbealtered. In thiscase,therearesix possible waysto preventfurtherinterrupts:

� by moving 0 to theinterrupt maskregister (sf1);

� by storing 0 in ff00.8504h,thememory-mappedlocation of sf1;

� by moving 1 to thegie bit in theinterrupt control register (sf3);

� by writing 1 to thegie bit in ff00.8510h,thememory-mappedlocation of sf3;

� by executing the intdis machine instruction;

� by executing the intctl instructionwith 0 asthefirst argument.

Sincecritical sections areusedextensively within ROME, it is worth spending a little time to find thebestoption oncethe system is operating correctly. An important consideration is if the operations canbe written asinline code, thereby saving the overheadof a procedurecall andreturn. This is especiallyimportant if a procedurecall is relatively expensive, for exampleif it causes a frame-spill on theI960. Inthis implementation,wewill usetheintdisandinteninstructionsto explicitly disable andenableinterrupts,andthe intctl routine to implement critical sections,sinceit conveniently returns thepreviousstate.

For the I386, the only way to changeinterrupt stateis through the assembler ‘sti ’ and‘cli’ instructions.Thecurrent state is reflected in the0200h bit in theflagsregister.

5.5 Interru pts and Faults

Eachindividual architecture hasits own way of handling external interrupts.For theI960, eachinterruptindexesanentry of a interrupt table andcauses a context switchto theaddressat that location. Thestackpointer is set to the Interrupt Stack(unless the processorwasalready processingan interrupt), the pfp

9

5.6 Contexts 5 GENERAL ARCHITECTURE

register is set to the interrupted framepointer, andthe bottom 3 bits areset to 7, to show an interrupt-return. An interrupt recordis pushedontothestackgiving theinterrupt vector andtheold interrupt mask.As this counts asa new procedurecall, a new setof local registers areallocatedfor theinterrupt call. Theglobal registers retain thevalues they hadbefore theinterrupt. An oddity of theI960 is thateachexternalinterrupt (e.g. XINT7) is mapped to an internal interrupt vector through the IMAP registerswhich thendeterminesthe interrupt’s priority. However, it is the interrupt vector (e.g. 0x82) that is presentedto theinterrupt service routine,but theinterrupt number(e.g.7) thatmustbeusedto clearthepending-interruptflag.

As well assupplying a methodfor calling registeredhandlers for device interrupts, the system shouldalsohandle interrupts for which thereis no driver-definedhandler. At a minimum,this should report the‘unhandledinterrupt’ in someway, andpossibly enterthesystem debugger.

Similarly, the system should provide a meansof handling system faults. Depending on the architecturethesemay appearas interrupts, or theremay be a separate fault system. On the I960, thereis a sepa-rate table for faults (for exampleaddressing or arithmetic exceptions). SinceROME does not provideapplication-level fault handlers,theusual approachis to treatfaults in the sameway asunhandledinter-rupts, andenterthedebugger. Theonly differenceis that thesystemsupplies a fault-recordon thestackinsteadof aninterrupt vector.

5.6 Contexts

Every ROME processhasassociatedwith it a set of context information that must be saved when theprocessis suspended,andrestoredwhentheprocess is restarted.Typically, this will include theprocessorregisters,thecondition codeandother per-processflagsandthecriticality. Thecontext informationmustbestored under two conditions.Thefirst is whenthereis anexplicit contectswitchdueto messagepassingwithin theROME systemitelf (by a call to cpu_suspend). Thesecond is whenan interrupt is scheduledduring a process’normalexecution which causesa higher-priority processto becomerunnable. Often,itis possible to handle thetwo casesin a similar way, so that it doesnot matterin whatway a processwassuspended.

Therearetwo basicstrategiesfor storing the processinformation. The first is to usean areapointed toby thefirst word of thecurrentprocesscontrol block, andthesecond is to usetheprocess’ stack. Whichoption makesmostsense dependson the architecture of the CPU.In the I386 case,the process stack isalready usedby the interrupt dispatcher to storethe return information, andthere areassembler routinesto pushandpoptheregister setontothestack. In this case, theprocesscontrol block field is usedonly tosave thecurrent stackpointer for a suspendedprocess.

For theI960, interruptsarehandledonadifferent stack andit is notparticularly straightforwardto manip-ulatethe interruptedprocess’ stack. In this case,thedatawill bestoredin anarealocatedoff thecurrentprocesscontrol block. Whentheinterrupt is scheduled, theroutine is allocatedanew local-register frame.This gives12 ‘spare’ registers(r4–r15) which canbe usedastemporary storage for the global registers;otherdatacanbeplacedon the interrupt stackfor theduration of the interrupt. Obviously this datamustbepreservedif thereturn is not to theinterruptedprocess.

Sincean interrupt causesa context switch (to the interrupt handler), processorsmustprovide a way torestore theoriginal context back to theexecuting process.Understanding how this works is a goodstartto design the ROME context-switcher. Often, it is possible to usethis codeandto combine the ‘return-from-interrupt’ statewith thereschedulecodein orderto change thecontext. For theI960, theinteruptedstateis indicatedby the bottom 3 bits in the previous-frame-pointer register being set to ‘7’, whereasa

10

5 GENERAL ARCHITECTURE 5.7 Devices

‘normal’ returnhas‘0’ in this position. Thesamemachine instruction (ret) is usedon bothcases. In otherarchitectures, there areseparateinstructions (ret and iret) and the switching codemust usethe correctone.In this architecture,we canusethe‘normal’ returnto effect a context switchevenfrom theinterrupthandler, whichmeansthatwhentheinterrupthandler requiresacontext switch(insteadof returning to theinterruptedprocess),thepfp returntypemustbesetto ‘0’.

Whena context is restored,thecriticality of thecontext mustalsoberestored. Contexts maybesaved ineither criticality: a context switchfrom within theinterrupt handler impliesthattheoriginal processmusthave beenoutsidea critical section (otherwise it couldn’t have beeninterrupted); context switcheswhilewaiting for messages arisefrom inside a critical section, in this caseinside rome_await_message. Thisstatemustbestoredaspartof thecontext information.

5.7 Devices

Therearetwo, or three, basicstrategiesfor accessingdevices.They are:

1. Using special ‘IOspace’ instructions to move bytesor words from special locations representingdevices,asfoundin theI386 architecture.

2. Usingnormaldatamovementinstructionsoperatingonspecial memorylocations(memory-mappeddevices).

3. Usinga ‘controller’ to accessmultiple deviceson a single bus(for examplea SCSI controller).

Thethird of theseaccessmethodsis notadifferentarchitecture, but requiresadifferent styleof driver. FortheI960,all thedevicesaremappedinto memory. Figure3-1givestheoveralldevicelayout. For example,theareaat b000.0000his used to access theregistersof theonboardUART. Figure4-4 shows thedetailedmemorymapfor this device, with one8-bit register beingdecodedin each32-bit addressrange. Thismeansthatthemachine-dependent accessesto IOspacearesimply accessesto regular memory, unlike theI386 wherespecial machine instructionsmustbeused. Thesedifferencesaremostly hiddenfrom driversby defining generic macros,for example CPU_IORD1, to access device registers. This allows the samedriver to becompiledfor either memory-mappedoperation or for special IOspaceaccesses.

Section3.6 of the User manualdescribes the interrupt sources for the processor; for exampleXINT7is connected to the UART interrupt. From the interrupt list, in this configuration eachinterrupt line isconnectedto a unique interrupt source, so the interrupt dispatcher canoperate in ‘dedicated’ mode. Inconjunction with section 5.5above,thesimplestapproachmapsXINT0 to dedicatedinterrupt1 andvector12hetc.

5.8 ProcessInitialisatio n

Eachprocessin thesystem mustbegivenasuitable setof initial valuesfor thecontext-specific informationsothatit executescorrectly whenfirst started. Spacemustbeallocatedfor theprocess’stack, andthestackframeregister positionedcorrectlyaccording to whetherthestackincrementsor decrements.For theI960,thestackframemustcontaintheinitial entry point of theprocessatthe‘return’ addressandthepfp registermustpoint within the stack. Theprocessmustalsobestartedwith interruptsenabled on its first contextswitch.

11

5.9 Optimisations 5 GENERAL ARCHITECTURE

5.9 Optimisations

SomeCPU architecturesprovide particular assembler assists for optimising datamovementopreations.For exampletheI386‘rep stosw’is anefficient meansof clearing ablock of memoryto zero. ThestandardROME C library providesimplementations of the usualroutinesfor thesepurposes(e.g. memset) but iffaster versions areneeded they canbeimplemented.

5.10 Machine-Dependent Library Support

Therearetwo further typesof routinesthatneedto beconsidered aspartof themachine-dependent designfor thesystem. Thefirst is architecture-specific support for C runtimes.Usually this meanssupplying thecpu_longjmpandcpu_setjmproutines,plus the datadefinition of the jmp_buf for thestandardC librarylongjmpandsetjmpcalls. For theI960, theseroutinesmustpreserveandrestoreastack-frameenvironmentthatpermitslongjmp to executea form of non-local-return.

The second type of routine is a true architecture-specific routine found only on this particular CPU (orfamily). In thecaseof the I960, it is necessaryto beableto clearindividual bits in the interrupt-pendingregister from within interrupt handlers. On the I386, devicedriversneedto readandwrite the machine-specific-registers(MSRs)aswell asgeneratetheIOspaceinstructions.

5.11 DesignStrategy

Theinformation gatheredabove mustbeincorporated into thedesign of thefinal ROME system. Mostly,this meanseither writing code,or configuring existing code throughpre-processordefinitions. Therearefivemainwaysof supplying configuration information:

1. by adding definitions to the ‘Hardware.h’ file section in the target file. This option should beusedfor definitionsthatarespecific to theparticularmotherboard,for example thelocation of theUARTregisters;

2. by setting optionsassociated with oneor modulesin thesystem.Moduleoptionsshould beusedforconfiguration datathatarespecific to a particular project, or to a particularbuild of thatproject, forexampleenabling theinterrupt-tracerecords;

3. by creating an external header file for a module. This should contain type definitions, dataandproceduredeclarations that will needto be accessedfrom outside the module, for examplecpu-specific routinesthatmayberequired by drivers;

4. by creating aninternal header file for a module. Thesedefinitionswill not beavailableoutsidethemodule,but provide themeansto sharedataanddefinitionswithin amodule,for examplethelayoutof a register-setfor a memory-mappeddevice;

5. by adding pre-processor definitions directly in a C, or assembler file. Thesedefinitions will havethemostrestrictedscope, which is appropriate for truly local data.

As a general principle, ‘extern’ declarations should appear only in header files, not directly in sourcefiles. Exceptfor specific standard-library examples(likeprintf ) all external (non-static)definitionsshouldcontain the module prefix as the initi al component. The module prefix is either the full nameof the

12

5 GENERAL ARCHITECTURE 5.12 Initi alisation Flow

module(e.g. tcp) or the classpart of a compound name(e.g. serial in serial_uart16550). All variablesandroutinesshould beproperlydeclaredandtype-checked. Theptr typeshould only beused whenatrulyopaquepointer referenceis beinggeneratedor whenanunknown 32-bit quantity is passed asanargument(for examplein thedifferent possible tracetypes).

Most of the porting effort is directed towardsthe central coreof the ROME system. It comprisesthreeinterlinked modules:

� themachine independent ‘rome’ module

� theplug-in for theparticularCPUin thesystem

� theplug-in for theinterrupt controller in thesystem(ICU)

The functionality of the CPU andICU modulesaredeliberately separatedassomesystemssharing thesameCPU may have different interrupt controllers (for example whendispatching interrupts to PCI orVME buses). As a rule, theromemodule should require no changesbetweenarchitectures,but theothertwo moduleswill needto bewrittenalmostfrom scratch. Almost,becausesomefilesmaybederivedfromexistingimplementations,for examplemany of theroutinesusedin thedebugger aremachine-independentcode.

The namingrules for the coreof the systemrelax the rules slightly. In general, exports from the coreusethe romeprefix for routinesprovided acrossall architectures(e.g. rome_start_critical) irrespectiveof which actual component contains the definition. Machine-specific implementations usecpu or icu toexport routines(for exampleto device drivers). Also, the idle prefix is reserved by the coreto identifyroutinesanddataassociatedwith theidle process.

5.12 Initialisa tion Flow

In order for theROME module to bemachine-independent, it makescertain assumptionsaboutthestruc-tureof thecpuandicu plugins. Thefollowing diagramshowstheflow of control during initialisation,andtheroutinesthat mustbesupplied for theROME moduleto function:

13

6 THE TARGET FILE

cpu rome icu serial

_startclear BSS

setprocessorstatesetupC environment

-> rome_startcpu_prologue <-

-> icu_setup_def_handlerssetcpu_freemem

-> serial_initpsetup memorychain

setup tracebufferprint copyright -> serial_out

call init functions[->] rome_add_handlercreateprocessstructure

cpu_setup_process <- (for eachprocess)cpu_epilogue <-

print “Starting. . .”-> serial_outcpu_scheduler <-

[idle_process]

The‘rome’ columnof this tableis fixedby themodule. Thetablealsoshows theserial routinescalledtoprovidecharacter-by-characteroutput to adisplayduringsysteminitialisation. Thecallsto rome_add_handlercomeindirectly from the individual processinitialisation routines. Thefinal line representsthefirst con-text switch; in the absence of any other processes(which is not the normalstate)this will be to the idleprocess.

6 The TargetFile

Oncethearchitectural analysisdescribedabovehasbeenfollowedthroughfor aparticular system,enoughof thedetails of themotherboardandits configurationshould beknownthatthetargetfile canbegenerated.The target file describes the architecture and layout of a particular system. It allows the ROME TargetBuilder to generatetheappropriate makefilesandthecodeto locate machine-specific resources.

6.1 Creating a new target file

Start rtb andcreatea new project (cvmetest) attached to your default repository. Openthe project andcreate a new Target namedcvme965LE with a suitable description (e.g. “Cyclone VME 965 boardinlittle-endian mode”). The following definitions areentered through the various rtb dialogs. Seethe rtbmanualfor further details.

6.2 CPU Settings

Basedon theinstallation directory for thetool chainchosen above, theCPUClassis “Intel” andthetypeis “i960”.

14

6 THE TARGETFILE 6.3 CompilerandLinker Directives

6.3 Compiler and Link er Dir ectives

On thecompiler/linker screen, select “EnableCompilerWarnings” and“Optimization3”. Thefollowingadditional flagssettheassembler instruction setandlisting files andthemapfile for thetarget:

CFlags: -I. -fno-builtin -fpack-struct "-Wa,-al=$*.al" "-Wa,-AJX"AsFlags: -al=$*.alLdFlags: -Map=target.map

Thecurrent egcsversionof gccdoesnot support theHx family explicitly, sotheclosest architecture (JX)is used, to enable thegenerationof theextendedinstructions (notably the inten family).

6.3.1 Link er Input File

In addition to the ld flags,you must also supply the input file that ld will useto control the layout ofthe image. This file doesnot normally explicitly needwriting for linking executablesto run under antraditional operating system,asit is presentin thegccdirectory. It is needed hereto control theplacementof thevarioussections.Thestartof thetext section, at a00a.0000h is to leave roomfor thesupervisor andinterrupt stacks asdescribedabove

$(OUTPUT_FORMAT(coff-Intel-little)SECTIONS {

.text 0xa00a0000 : { *(.text) }

.rodata : { *(.rodata) }

.data : { *(.data) }

.bss : { _bssstart = . ; *(.bss) *(COMMON) ; _bssend = . ; }}

This sequencealsodefinessymbols for the startandendof theblank-storagesection, so that the systeminiti alisation code canlocateandclear theareato zeroes.

6.4 The Make and Install Rules

Therulesfor convertingC andassembler files to object files aremostlystandardacrossall architectures.The ‘m’ rule converts moduleswritten in the preprocessor for Finite StateMachines andmessagedis-patchersinto standardC. Theassembler is calledonly afterthesourcefile hasbeenpassedthroughcpp toincorporate thedefinitionsfrom thetarget file:

%.c: %.mstate $*.m > $@

%.o: %.scpp $(INCLUDES) $*.s >/tmp/$*.t;$(AS) $(ASFLAGS) -o $@ /tmp/$*.t;rm /tmp/$*.t

%.o: %.c$(CC) $(CCFLAGS) -o $@ $*.c

%.o: %.C$(CXX) $(CXXFLAGS) -o $@ $*.C

15

6.5 TheHardwareFile 6 THE TARGET FILE

Therearenospecial requirementsto post-processthis target for installation, sono install rulesareneeded.For the I386 system, though, the target mustbe placed on a bootable floppy disk with a loader sector atthe front. This is achievedwith a special ‘install’ rule in themake file which is entered into that target’sdefinition:

install:getprog target target.b(dd if=/rome/Tools/bin-

386/boot1.b bs=512 conv=sync; dd if=target.b) > boot.bdd if=boot.b of=/dev/fd0 bs=512 conv=sync

Thegetprog utility is partof thetarget-specific tools for theI386 environment, asis theboot1 loader.

6.5 The HardwareFile

The largest part of the target description is the definition of the hardware file. This is a pre-processorinclude file (Hardware.h) which may be used by C and assembler codeto fix details of the particularsystem. The file generatedby rtb is a combination of the hardware description defined hereand theindividual optionsselectedwhenthesystem is configured.Here,thehardwarefile containsdefinitionsfortheinitial processorstate,thememorylayout andthelocation of thedevices.

6.5.1 Initial Processor State

Thefollowing values will beusedby theassembler code astheinitial values for theArithmetic Controls,Fault Control, Interrupt Control Register-cacheandInstruction-cachecontrol registers:

#define CPU_INIT_AC 0x00001000#define CPU_INIT_FC 0x40000000#define CPU_INIT_IC 0x0#define CPU_INIT_RC 0x0f#define CPU_INIT_CACHE 0x00000000

The interrupt stackvalueis usedto handle external interrupts, andthe supervisor stackis a (temporary)stackusedonly during systeminiti alisation. Thesizeof the accessible RAM is fixedhere, although thesystem maywish to obtain this value dynamically.

#define CPU_INTERRUPTSTACK 0xa0058000#define CPU_SUPERVISTORSTACK 0xa005c000#define CPU_RAMSIZE (4 << 20)#define CPU_PRIV_RAM_BASE 0xa0000000

In this implementation, thesizeof mainRAM is fixedin thetarget file to 4M, evenif theactual machinehasmorememory. Anotherapproachis to defineCPU_RAMSIZEasavariable initialisedby thesystematstarttime. Theapproachtakenfor any givensystemwill dependson theflexibilit y of theCPUplugin, andthe possibility of determining such values at run time. Oneadvantageof allowing dynamic valuesto beoverridenis thatit allows aboard to emulateanother (perhapsdevelopmentboard) with lessrealmemory.

16

6 THE TARGETFILE 6.5 TheHardware File

6.5.2 Memory RegionDefinitions

As thesamecpuplugin maybeused with many different memoryconfigurations, theinitial valuesof thephysical memoryregistersaresupplied asdefinitions in the target file, allowing different targets to havedifferentmemorylayoutswithout changing thecode. In thiscase,thevaluesaretakenfrom theCVME965UserGuide.

#define CPU_REGION0 0x30800000#define CPU_REGION1 0x30800000. . .#define CPU_REGIONC 0x20800000#define CPU_REGIOND 0x20800000#define CPU_REGIONE 0x30800000#define CPU_REGIONF 0x30800000

6.5.3 Interrupt Map

Each(dedicated) interrupt must be mapped to a unique interrupt vector, including the special on-chiptimers.Thesedefinitionsallow themappingsto beseton a per-targetbasis.

#define CPU_IMAP0 0x00004321#define CPU_IMAP1 0x00008765#define CPU_IMAP2 0x00a90000#define CPU_ICON 0x40c0#define CPU_MANUAL_INTERRUPT 248

ThesearethevaluesthatmapXINT0 into bit 1 andvector12hasdescribedin chapter5.

6.5.4 CachablePointer Conversions

Thedesign for thecacheingarchitectureusesdifferent regionsfor cachedanduncachedmemory, sothereareno in-placeconversions possible.

#define CPU_CACHED_PTR(_a) (_a)#define CPU_UNCACHED_PTR(_a) (_a)

6.5.5 SCV64Device Definitions

This section contains theCVME965layout of theVME buscontroller.

#define SCC_BASE 0xf0000000

6.5.6 ROME Definitions

The foll owing definitions areusedwhenpointer-checking is enabled in the ROME module to validateprocessaddresses:

#define ROME_MIN_PPTR (ROME_PROCESS *)0xa00a0000#define ROME_MAX_PPTR (ROME_PROCESS *)0xb0000000

17

7 THE CPUPLUG-IN

6.5.7 Serial 16650UART Definitions

The16550UART hasageneric driver(seebelow). Thesedefinitionsconfigurethedriver for thisparticularmotherboardin thedefault setting of 9600 baud.

#define SERIAL_UART16550_BASE0 0xb0000000#define SERIAL_UART16550_VEC_INT0 0x82#define SERIAL_UART16550_ADD_HANDLER rome_add_handler#define SERIAL_UART16550_CLEAR_INT

The last definition definesan ‘empty’ line in the source, as the interrupt doesnot needto be explicitlycleared.

6.5.8 On-chip Timer definitions

Thetimer usesa setof memory-mappedaddresses.Thesystemis configuredfor thedefault operation of1,000timer interruptspersecond:

#define TIMER_TRR0 0xff000300#define TIMER_TCR0 0xff000304#define TIMER_TMR0 0xff000308#define TIMER_INT 12#define TIMER_VEC_INT 0x92

#define TIMER_TICKS2SEC 1000#define CPU_FREQ_REGISTER 0xb4100000

Becauseof thevector/pin numbering scheme,bothformsaredefined.

6.5.9 EndiannessMacros

Themainmemoryis little-endian, sothehtonl family of macrosmustswapbytes.

#define htonl(_a) ((((uint)(_a) & 0xff) << 24) | \(((uint)(_a) & 0xff00) << 8) | \(((uint)(-a) & 0xff0000 >> 8) | \(((uint)(_a) & 0xff000000) >> 24))

#define ntohl(_a) htonl(_a)#define htons(_a) (((ushort)(_a) << 8) | ((ushort)(_a) >> 8))#define ntohs(_a) htons(_a)

Most of thecpu-dependentcontentsof these sectionscanbededucedfrom theoverall architectural deci-sionsmadeabove. Otherdefinitions (for exampletheUART) areneeded to configure pre-existing mod-ules.

7 The CPU plug-in

All cpuplugins follow thesamestructure. They comprisetheassembler routinesfor the initialisation ofthesystem,andlow-level operations, C routinessupporting themachine-independent partof thecore, andthedebuggerenvironment.

18

7 THE CPUPLUG-IN 7.1 limits.h

7.1 limits.h

The limits.h header file contains the various maximumand minimum integer valuesallowed for char,short, int andlong variables,signedandunsigned.

7.2 stdargs.h

The stdargs.hheader file contains the definitions for variable-count arguments. The implementation ofthesemacrosdependsonhow thegcccompilerpassesarguments. Thisfile is suppliedfor eacharchitectureaspart of thegccdistribution andtheROME file is derived from thatdistribution. In this case,thefile iscopiedfrom va-i960.hwith no changes.

7.3 stdtypes.h

Thestdtypes.hheader file containsdefinitionsof extendedtypesusedthroughout ROME. For most32-bitarchitectures,thefile canbeusedunchanged from oneof theotherdistributedversions.

7.4 cpu_plugin.h

Thecpu_plugin.h header file contains definitions for any cpu-specificstructuresandroutines,aswell asthestandardroutinesprovidedaspartof theROME core.

typedef struct{

uint gregs[16]; /* saved global registers */uint pc; /* saved process controls */uint ac; /* saved arithmetic controls */uint pfp; /* saved pfp (for ret) */uint imsk; /* save/restore imsk value */

}CPU_I960_REGISTERS;

This structuredefinesspacefor the16 global registersandtheprocesscontrol registers.The imskvalueisusedto flag whetheror not theprocesswasinsidea critical section whenits context wassaved.

The header file also contains the machine-dependentdefinition of the jump buffer usedto implementlongjmp andsetjmp:

typedef struct _jmp_buf{

uint pfp; /* saved pfp */uint rip; /* saved rip */

} *jmp_buf;

The header file mustalsodefinethe I/OSpaceaccess macrosusedby devices. In this case, the macrossimply referenceexplicit memorylocations:

19

7.5 _link_first.s 7 THE CPUPLUG-IN

#define CPU_IORD1(_a) *(volatile uchar *)(_a)#define CPU_IORD2(_a) *(volatile ushort *)(_a)#define CPU_IORD4(_a) *(volatile uint *)(_a)#define CPU_IOWR1(_a, _v) *(volatile uchar *)(_a) = (uchar)(_v)#define CPU_IOWR2(_a, _v) *(volatile ushort *)(_a) = (ushort)(_v)#define CPU_IOWR4(_a, _v) *(volatile uint *)(_a) = (uint)(_v)#define CPU_IOSET1(_a, _v) *(volatile uchar *)(_a) |= (uchar)(_v)#define CPU_IOSET2(_a, _v) *(volatile ushort *)(_a) |= (ushort)(_v)#define CPU_IOSET4(_a, _v) *(volatile uint *)(_a) |= (uint)(_v)#define CPU_IOCLEAR1(_a, _v) *(volatile uchar *)(_a) &= (uchar)~(_v)#define CPU_IOCLEAR2(_a, _v) *(volatile ushort *)(_a) &= (ushort)~(_v)#define CPU_IOCLEAR4(_a, _v) *(volatile uint *)(_a) &= (uint)~(_v)

Thesemacrosexplicitly casttheaddressesto theappropriatesize,sothatthesameaddresscanbeusedtoaccess morethanonewidth, andthetarget file cancontain only thevalues,without needing explicit caststhere.

7.5 _link_first.s

The assembler initialisation file is probably the mostcomplicatedpart of any ROME port. The file hasthe special name_link_first.s which causes the ROME Target Builder to placeit first in the constructedimage. In this way, the entry point of the system is known, andfixed,at the startof the image. Thefileimplements thefollowing functions:

1. Theinitial entry point code which preparesthememoryandtheprocessorto run ROME

2. Thefirst-level interrupt dispatcher

3. Thecontext-switching partof thescheduler

4. Fault-handling entryto thedebugger

5. Low-level cpu-dependentroutines

Thefoll owing code fragments do not contain thefull sourceof thefiles. Rather, they focuson particularaspects relevantto theporting process.

7.5.1 Entry-point Code

Theentry-point codecontains a second entry-point at offset4 from themaincode which branchesto thedebugger. This canbe usedto restart a halted systemif it trapsbackto the boot ROM andthenusetheROME debuggerto trace thefault:

b L001 # real startup code;#;# Enter the debugger;#

ldconst CPU_SUPERVISORSTACK,fp # Set frame pointerlda 0x40(fp),sp # and stack pointercallx _serial_initpcallx _rome_debugfmark

20

7 THE CPUPLUG-IN 7.5 _link_first.s

Themainentry sequenceis determinedmainly by the “Initi alisation andSystemRequirements” chapter(chapter 13) of theI960Hx Processormanual. First, theregistersaresetto good initial values:

L001:movq 0, g0 # Clear out globalsmovq 0, g4movq 0, g8movq 0, g12mov 0, sf0 # Clear IPND registermov 0, sf1 # Mask all interruptsmov 0, sf2 # Clear DMA/Cache register

thenthecodeclearsthesupervisor stack, interruptstackandBSSarea.Although clearing thestacks is nota necessarypart of the operation of the system, it is only done onceper resetandhelpsin tracing faultsearlyin theinitialisationprocess. Thelocation of thestacks is picked up from thetargetfile definition andtheBSSareafrom theld input scriptcreatedabove.

lda CPU_INTERRUPTSTACK, g0 # clear interrupt ...ldconst 0x8000, g1 # ... and supv stackaddo g0, g1, g1

zloop0:stq g4, (g0)addo 16, g0, g0cmpobl g0, g1, zloop0lda _bssstart, g0 # clear out zerovarslda _bssend, g1 # End

zloop1:stq g4, (g0)addo 16, g0, g0cmpobl g0, g1, zloop1

In orderto move thesystemcontrol tablesfrom thebootROM versionsto those for ROME operation, theCPUmustbegivena software resetinstruction.

ldconst 0x300, r9 # r9 <- RE-INIT messagelda continue1, r10 # r10 <- Next IPlda _ProcessControlBlock, r11 # r11 <- PRCBsysctl r9, r10, r11 # Software resetfmark # If here, we are dead

continue1:and r14, r14, r14 # Flush out the pipelineand r14, r14, r14 # in case it is not alreadyand r14, r14, r14

The resetinstruction loadsa new processcontrol block value which points to a datastructure definedfurther down thefile aspartof thedata segment. Thethree lines following thecontinuation label areno-opsto ensuretheinstruction pipeline operatescorrectly. Suchadditional codeis defined by thearchitecturemanualfor heCPU,andshould notbeomitted. Thedatatablesareconstructed using theCPU_XXX valuespickedup from thetarget file hardwaredefinitions:

_ProcessControlBlock:.word _FaultTable # Fault Table

21


.word _ControlTable # Control Table

.word CPU_INIT_AC # Initial AC register value

.word CPU_INIT_FC # Initial Fault mask

.word _InterruptTable # Interrupt Table

.word _SystemProcedureTable # SysProc Table

.word 0 # Reserved

.word CPU_INTERRUPTSTACK # Interrupt Stack Pointer

.word CPU_INIT_CACHE # Instr. Cache Cfg Word

.word CPU_INIT_RC # Register Cache Cfg Word

Thecontrol tablecontainstheinterruptmapandregionconfigurationparameters,alsodefinedin thetargetfile

_ControlTable:.word 0 # IP breakpoint register 0.word 0 # IP breakpoint register 1.word 0 # Data Addr bpt register 0.word 0 # Data Addr bpt register 1.word CPU_IMAP0 # Interrupt Map register 0.word CPU_IMAP1 # Interrupt Map register 1.word CPU_IMAP2 # Interrupt map register 2.word CPU_ICON # ICON register.word CPU_REGION0 # Region 0 Memory Config

. . . ..word CPU_REGIONF # Region F Memory Config.word 0 # Reserved.word 0 # Reserved.word 0 # Trace Control Register.word 0x00000001 # BPCON Register

Theinterrupt tablecontains256entries, the first 9 of which arereserved. Theremainderall point to theentrypoint of thefirst-level interrupt handler:

_InterruptTable:

.word 0 # Interrupt table #1. . .

.word 0 # Interrupt table #9

.rep 247 # Entries #10 thru #256

.word _cpu_interrupt_handler # Interrupt table entry

.endr

Thefault tabledirectsall faultentriesbut oneto thedefault fault handler. Theexception is the‘trace’ faultwhich is usedto provideadditional trace informationfor ROME processes.

_FaultTable:.word _cpu_def_fault_handler # Parallel Fault.word 0x00000000.word _cpu_trace_handler # Trace Fault.word 0x00000000.word _cpu_def_fault_handler # Operation Fault.word 0x00000000.word _cpu_def_fault_handler # Arithmetic Fault.word 0x00000000.word 0x00000000 # Empty entry

22


.word 0x00000000

.word _cpu_def_fault_handler # Constraint Fault

.word 0x00000000

.word 0x00000000 # Empty entry

.word 0x00000000

.word _cpu_def_fault_handler # Protection Fault

.word 0x00000000

.word _cpu_def_fault_handler # Machine Fault

.word 0x00000000


.word 0x00000000

.word _cpu_def_fault_handler # Type Fault

.word 0x00000000


.word 0x00000000


.word 0x00000000


.word 0x00000000


.word 0x00000000


.word 0x00000000

.word _cpu_def_fault_handler # Override Fault

.word 0x00000000

.rep 30 # Rest of table empty


.endr

ROME doesnot usesystem calls, sothesystemproceduretableis empty:

_SystemProcedureTable:.word 0x00000000 # Reserved entries.word 0x00000000.word 0x00000000.word 0x00000001 # Superv stack base + trace.word 0x00000000 # Reserved entries.word 0x00000000.word 0x00000000.word 0x00000000.word 0x00000000.word 0x00000000.word 0x00000000.word 0x00000000.rep 260 # There are 260 entries.word 0x00000000 # Procedure slot.endr

Finally, the initialisation codesetsup the registers to make the routine call to rome_start running atprocessorpriority zero.

ldconst CPU_SUPERVISORSTACK,fp # Set frame pointerlda 0x40(fp),sp # and stack pointerlda 0x001f2002, g1 # PC maskmov 2, g2 # reset to priority zeromodpc 0, g1, g2

23


and g1, g1, g1 # Wait for modpc ...and g1, g1, g1callx _rome_startfmark

The fmark instruction foll owing the call causesthe processorto fault should rome_start ever executeareturn instruction.

7.5.2 Interrupt Handler

The next major componentof the assembler file is the first-level interrupt handler, pointed to by all theinterruptvectors in thetableabove. Youmaybewonderingwhy atwo-level interrupt handler is necessary,it should bepossible just to jump directly to theappropriate handler routine sinceall the interruptscomefrom uniquesources.Therearetwo reasonsfor writing theinterrupt dispatcher at two levels. Thefirst isto providea consistententryenvironmentfor thehandlers,with a C stack, further interruptsdisabled,andthe ROME scheduler disabled (to prevent context switchesinside interrupt handlers). The second is tohandle context switches generatedby messaging eventsinside thehandler but deferredby thescheduler.

Thefirst partof theinterrupt handler savestheglobal registersanddisablesthescheduler:

_rome_interrupt_handler:intdis # Prevent other ext. intsmovq g0, r4 # Store globals in localsmovq g4, r8 # (can only stash 12 regsmovq g8, r12 # this way)stt g12,(sp)lda 12(sp), sp # Carve out room on stackmov 0, g14 # prevent cx switchesst g14, _rome_allow_reschedule

Thesecond part loads theinterrupt vectornumber andcall theappropriate handler:

#ifdef CPU_BIG_ENDIANldob -5(fp), g0 # which intr vector (BE)

#elseldob -8(fp), g0 # which intr vector (LE)

#endifld _rome_exception_handlers[g0*4], g1callx (g1) # Call the handler; ino in g0

Notethat thebyteaddress of theinterrupt numberdependson theendiannes of themachine. Thecodeisdesignedto work with either endianness.In fact, if you look at the actual _link_first.sfile, you will seethat it will work acrossa rangeof I960-classCPUs,with otherper-CPU testsbeing applied asneeded.This is onereason why many of thevaluesaresetthroughthetargetfile (suchastheREGIONdescriptors)rather thanbeing hard-codedinto thesource. Youwill alsoseethattheactual codedoesnot following thisorder. Thereorderingis describedin thechapter on optimisationsbelow.

On return from the handler, the next runnableprocesswill be at the headof the rome_run_queuechain.If this is the sameasthe current process(rome_this_ptr) a context switch is not needed, andthenormalreturn-from-interrupt sequencecanbefoll owed:

24


mov 1, g2 # allow cx switches against g2, _rome_allow_rescheduleld _rome_this_ptr, g1 # reschedule?ld _rome_run_queue, g0cmpobne g0, g1, IRSchedldt -12(sp), g12 # Restore g12 from stackmovq r4, g0 # Restore rest from localsmovq r8, g4 #movt r12, g8 #inten # restore interruptsret

Otherwise, the current interruptedprocesscontext mustbe saved, asthough it hadcalled cpu_suspend,andthesystem modechangedto simulate aninterrupt return:

IRSched:ld 0(g1), g1 # -> register areastq r8, 16(g1) # out of the way earlystq r4, 0(g1)stq r12, 32(g1)mov pfp, r10 # r10 is PFP, old G15mov 1, r11 # process was enabledldt -12(sp), r4ldl -16(fp), r8 # saved PC and AC -> R8,R9modify 7, 0, r10 # set PFP to normal returnstt r4, 48(g1)stq r8, 64(g1)ldconst 0x001f2403, r9 # PC modification maskld 0(g0), r4 # -> register areald 64(r4), r4 # R4 PC of NEW processmodpc r9, r9, r4 # emulate an IRETb _cpu_scheduler

Because this is a context-switch caused by an interrupt, the original processmust have beenoutside acritical section, sothe imskflag is setto 1 (throughr11) in thesavedcontext. Sincethenew processmaynot have beensavedout of aninterrupt handler, all processcontext switchesaredoneas‘normal’ returns,sothereturn typein thepfp register is setto zero.However, asthis is still within aninterrupt handler, the‘interrupted’ bit in theProgramControls register mustbeexplicitly cleared,which is doneby loading thePCfrom thenew process.

7.5.3 SchedulerSupport

The third element of the assemblerfile concernsthe context switching mechanism. cpu_suspend is theroutinecalled by themessage systemto save thecurrent process’ context andswitchto thenext runnableprocess:

_cpu_suspend:ld _rome_this_ptr, r8 # current processmodac 0, 0, r4 # get AC valueld 0(r8), r8 # -> register areamov pfp, r5mov 0, r6

25


stq g0, 0(r8) # save globalsstq g4, 16(r8)stq g8, 32(r8)stt g12, 48(r8)stt r4, 68(r8) # AC, PFP, imask flag

Sincecpu_suspend is alwayscalledfrom within a critical section, the imskflag is setto 0, via r6. Thiscodefalls through to the cpu_scheduler routine, which is also called by the startup codeto perform acontext switchto theheadof therun queue andby theinterrupt handler above:

_cpu_scheduler:flushregld _rome_run_queue, r3subo 1, 0, r8 # AC modification maskst r3, _rome_this_ptrld 0(r3), r3 # -> register arealdt 68(r3), r4 # R4 AC, R5 pfp, R6 imskldq 0(r3), g0ldq 16(r3), g4ldq 32(r3), g8mov r5, pfpldt 48(r3), g12modac r8, r4, r4intctl r6, r6ret

This is probably the hardest piece of codeto follow in the whole of the ROME system. Firstly, anyregistersheldin theon-chipRAM areaareflushedto thestackof thecurrent process.Then,rome_this_ptris updated to point to the new processat the head of the run queue. The cpu-dependent register areaaddressedoff the first word of the processcontrol block is loadedand from it the global registers arerestored. The condition codeandother flagsarerestored into the Arithmetic Controls register, andtheprevious framepointer is setready for a normalprocedural return. The imskflag, loaded into r6, is usedto control how the interrupt stateis restored. If ‘1’, theprocesswill berestoredto enabled state, if ‘0’ iswill (remain) disabled.

7.5.4 Debugger Support

The fourth element of the assembler file handlesentry to the debugging system. The reason there is anassembler interfaceto thedebuggeris to provide theC codewith additional information. In this casetheframepointer for thecalling process’ stack is passedto thedebuggerasanargument:

_rome_debug:flushregmov pfp, g0 # FPointer from call stackcallx _cpu_idebugret

A similar routineis usedto enterthedebuggeronanunhandled interrupt. In thiscasetheinterruptnumberis alreadyin thefirst argumentposition, sotheframepointer is passed asthesecond parameter:

26


_cpu_pre_debug_int:flushreglda 0(pfp), g1 # FPointer after int. recordcallx _cpu_debug_intret

A third variant of this codeenters the debuggeron a fault, in this casethrough the C cpu_fault_routinecode:

_cpu_def_fault_handler:flushreg lda -8(fp), g0 # Load the fault record into g0mov pfp, g1modify 7, 0, g1 # Load the faulting routines fpcallx _cpu_fault_routineret

The assembler code is needed to extract pointers to the fault record and fault routine’s frame pointer.While this canbedonein C by manipulating theaddressesof elements on thestack, it is lesserror-proneto pick it up from assembler.

7.5.5 SpecialSupport Routines

The final part of this file contains any special assembler routines particular to this CPU. In this case,thearchitecturesupportshardwaretraceevents, which arehandledby a special fault handler, storing therecordsin thestandard ROME tracecircular buffer:

_cpu_trace_handler:ld _rome_trace, r4 # trace bufferld _rome_itrace, r6 # itrace variableld -8(fp), r8 # load trace typeld _rome_this_ptr, r9 # current processstl r8, 4(r4)[r6*16] # save type & processst rip, (r4)[r6*16] # Address of called func.addo r6, 1, r6ldconst 255, r5and r6, r5, r6st r6, _rome_itraceret

This section alsocontains thesupport routinesfor theC library longjmp andsetjmproutines.A longjumpis implementedasaprocedurereturn to adifferentreturn addressfrom theonecalled, andwith adifferentstackframe.

_cpu_setjmp:flushreg # no cached registersst pfp, 0(g0) # save pfpld 8(pfp), r4st r4, 4(g0) # save RIPmov 0, g0ret

_cpu_longjmp:

27

7.6 k960.c 7 THE CPUPLUG-IN

flushreg # cached is unsafeld 0(g0), pfp # restore pfpld 4(g0), r4st r4, 8(pfp) # restore RIPmov g1, g0 # return coderet

Becausethecall chain is brokenin longjumptheregistercachemustbeflushed.Similarly, thecachemustalsobeflushedin setjmpto ensurethattheRIPis in thestack frame.Thelongjumpcodeeffectively movesthestack backto thepoint whensetjmpwascalled, andreturns to thepoint from whichsetjmpwascalled,but with thereturn codesupplied to longjmp.

This completesthecontentsof the_link_first.sfile.

7.6 k960.c

The ‘kernel’ file containstheC routinesusedto complete initialisation of the system. First comessomedatadefinitions:

ROME_TRACE *rome_trace; /* ROME event tracing array */int rome_itrace = 0; /* current trace index */uint cpu_freemem; /* start of free memory */

Therome_tracearrayis sharedwith theassemblertrace fault handler to providea circular buffer of traceevents. TheAPI to thetracesystemis through therome_add_traceroutine:

void rome_add_trace(ptr a0, int a1, ptr a2){

rome_trace[rome_itrace].address = a0;rome_trace[rome_itrace].type = a1;rome_trace[rome_itrace].current = rome_this_ptr;rome_trace[rome_itrace].spare = a2;

#ifdef CPU_KPRINTF_TRACESrome_kprintf("@%x proc %x code %02x arg %08x\n", a0,

rome_this_ptr, a1, a2);#endif

rome_itrace = (rome_itrace + 1) & 255;}

TheCPU_KPRINTF_TRACESoptionprintstracerequestsasthey aregeneratedfrom explicit callswithinthecode(but not from thetrace-fault handler). This is auseful debugging tool for earlysystem debugging.

The machine-independent corecalls the cpu_prologue routine to perform any early cpu-specific initial-isation. Onerequirementis that the cpu_freememvariable mustbe initialised on return from this rou-tine. Becausethis routine is called very early in the system, no memorymay be allocatedhere,andthe rome_kprintf routine is not available. Theprologueroutine completesthe initi alisation of the mem-ory manager by enabling the datacacheon main memory, andsetting the SVC64region big-endian, asdescribedin theUserManual.

28

7 THE CPUPLUG-IN 7.6 k960.c

void cpu_prologue(void){

extern int bssend;

/* default uncached, little-endian */

CPU_IOWR4(CPU_DLMCON, 0);

/* region A, cached */

CPU_IOWR4(CPU_LMAR0, 0xa0000002);CPU_IOWR4(CPU_LMMR0, 0xf0000001);

/* region F uncached, big endian */

CPU_IOWR4(CPU_LMAR14, 0xf0000001);CPU_IOWR4(CPU_LMMR14, 0xf0000001);

icu_setup_default_handlers();cpu_freemem = &bssend;

}

Thecpu_epilogue routine is calledafter all the processeshave beeninitialised, just before the scheduleris calledfor thefirst time:

void cpu_epilogue(void){

int i;

rome_this_ptr = (ROME_PROCESS *)&i;}

The cpu_setup_processroutine is called oncefor eachmain process in a module. It is not called forprocessesthat only have initi alisation routinesanddo not thenreceive messages (‘init only’ processes).The routine mustperform all the machine-dependent initialisation of the processstructure. In this case,thatmeansallocatingandsetting up thestackandtheprocess’register area:

void cpu_setup_process(ROME_PROCESS *here, ROME_INIT_PROC *proc){

CPU_I960_REGISTERS *regs = (CPU_I960_REGISTERS*)rome_alloc(sizeof(CPU_I960_REGISTERS), 3, 1);

here->rome_stack = (int *)rome_alloc(proc->stksize, 2, 0);here->rome_stack[1] = ((uint)here->rome_stack)+64;here->cpu_dep = regs;

The processentry point is either to the process’ main routine, or to _main to initialise the standard I/Ofiles:

if (proc->main_flag){

#ifdef ROME_NO_STDIOrome_fatal("Stdio process in this system!");

#elsehere->rome_stack[2] = (int)_main;regs->gregs[0] = (int)proc->main;regs->gregs[1] = (int)proc->name;

#endif}else{

here->rome_stack[2] = (int)proc->main;}

29

7.7 debug.c 7 THE CPUPLUG-IN

Thereturn framepointer is setto the framecontaining the startaddress,andthe restof the registersaresetfrom theprocess’ initialisation record:

here->rome_stack[0] = (int)here->rome_stack;regs->gregs[15] = (int)here->rome_stack + 64;regs->ac = 0x00001000;regs->pc = (2 | proc->trace_flag);regs->pfp = (int)here->rome_stack;regs->imsk = 1; /* restore to global */

}

Theroutine alsochecks if theprojecthasdisabledstandard-I/Oprocessing, andif sofatals if any processis declaredasrequiring stdio. This is mainly for extremelysmallROME systemsthatareusing aminimalC library.

7.7 debug.c

Most of the debugger is commoncode, for example to display memoryor to format messages. Themachine-dependent partof thedebuggeris concernedwith printing machineregistersandformatting thetraceoutput. First comesa list of themachine-specific faults andtraces:

#define TRACE 0x01 /* Trace */#define OPERATION 0x02 /* Operation */#define ARITHMETIC 0x03 /* Arithmetic */#define CONSTRAINT 0x05 /* Constraint */#define PROTECTION 0x07 /* Protection */#define TYPE 0x0A /* Type */

/* Fault Sub-Types */

#define ITRACE 0x02 /* Instruction Trace */#define BRTRACE 0x04 /* Branch Trace */#define CTRACE 0x08 /* Call Trace */#define RTRACE 0x10 /* Return Trace */#define PTRACE 0x20 /* Prereturn Trace */#define STRACE 0x40 /* Supervisor Trace*/#define BPTRACE 0x80 /* Breakpoint Trace*/#define INVOPCODE 0x01 /* Invalid Opcode */#define UNIMP 0x02 /* Unimplemented */#define UNALIGN 0x03 /* Unaligned */#define INVOPERAND 0x04 /* Invalid Operand */#define INTOVER 0x01 /* Integer Overflow*/#define DIVZERO 0x02 /* Divide by Zero */#define CRANGE 0x01 /* Constraint Range*/#define PRIV 0x02 /* Privileged */#define LENGTH 0x01 /* Length */#define TYPEMIS 0x01 /* Type Mismatch */

The print_cpu_registers routine is called from the processdisplay codeto format the machine-specificpartof theprocessstructure.This prints theglobal registers andthetop of thestack:

static void print_cpu_registers(){

uint *proc_fp;

30

7 THE CPUPLUG-IN 7.7 debug.c

rome_kprintf("G0: %x %x %x %x\n", regs->gregs[0], regs->gregs[1],regs->gregs[2], regs->gregs[3]);




rome_kprintf("PC: %x AC: %x PFP: %x IMSK% %x\n", regs->pc,regs->ac, regs->pfp, regs->imsk);

proc_fp = (uint *)(regs->gregs[15] & 0xfffffff0);rome_kprintf("Stack-- fp: %x ", proc_fp);rome_kprintf("pfp: %x ", proc_fp[0]);rome_kprintf("sp: %x ", proc_fp[1]);rome_kprintf("rip: %x\n", proc_fp[2]);

}

The traceback routine givesa call-by-call tracebackof a process’stack, wherepossible:

static void traceback(char *args){

uint *cfp = (uint *)((uint)fp & (~15));int i = 0;int res;char *tmp;

rome_kprintf("Starting traceback at %x\n", cfp);while (cfp > (uint *)CPU_PRIV_RAM_BASE && i < 20){

rome_kprintf("pfp %x sp %x rip %x ", cfp[0], cfp[1], cfp[2]);

if ((tmp = find_symbol(cfp[2], &res)) == NULL){

rome_kprintf("\n");}else if (res == 0){

rome_kprintf(" = %s\n", tmp);}else{

rome_kprintf(" = %s+%x\n", tmp, res);}i++;cfp = (uint *)(cfp[0] & (~15));

}}

Thetraceback routinereliesonthepfpregisterlinkageto moveto thecaller’sstackframe.Onarchitecturesthatdo not support suchlinkedframes, whereeach procedure‘knows’ how muchstackis allocated, thismaybeimpossible to implement.

Thedisplay_local_reg routine displays thelocal registers on thestack of thecurrentprocess:

static void display_local_reg(char *args){

31

7.7 debug.c 7 THE CPUPLUG-IN

uint *ptr = fp;

rome_kprintf("frame pointer: %x\n", fp);rome_kprintf("pfp: %x sp: %x rip: %x r3: %x\n",

ptr[0], ptr[1], ptr[2], ptr[3]);rome_kprintf(" r4: %x r5: %x r6: %x r7: %x\n",

ptr[4], ptr[5], ptr[6], ptr[7]);rome_kprintf(" r8: %x r9: %x r10: %x r11: %x\n",

ptr[8], ptr[9], ptr[10], ptr[11]);rome_kprintf("r12: %x r13: %x r14: %x r15: %x\n",

ptr[12], ptr[13], ptr[14], ptr[15]);}

On the I960, this routine relies on theflushreg instruction in thevarious fault handler support routinestomove all theregistersfrom theon-chip SRAM into themainstack.

The trace_it processformats the tracerecords strored in the trace structure. This is a long routine, so asampleonly is givenhere:

static void trace_it(char *args){

ROME_TRACE *ptr;int res;int i;int j = 256;

i = rome_itrace;while (j--){

char *rn;

ptr = &rome_trace[i];if (ptr->type & 255){

rome_kprintf("%s: ", ptr->current->name);

switch (ptr->type & 255){

case ITRACE:rome_kprintf("Itrace at %x", ptr->address);break;. . . .

case ROME_TT_STARTINT:rome_kprintf("Start interrupt %x at %x", ptr->spare, ptr-

>address);break;. . .

default:rome_kprintf("Event %d at %x", (ptr->type & 255), ptr-

>address);break;

}

The switch statementmusthandle both the architecture-specific traceentries(suchas ITRACE) andthegeneric entries commonto all implementations, suchasROME_TT_STARTINT. Theroutine alsotries tomake useof thesymboltable to interprettheaddressfield in therecord:

if ((rn = find_symbol(ptr->address, &res)) != NULL)

32

7 THE CPUPLUG-IN 7.7 debug.c

{if (res == 0){

rome_kprintf(" (%s)", rn);}else{

rome_kprintf(" (%s+%x)", rn, res);}

}rome_kprintf("\n");

}i = (i+1) & (ROME_TRACE_COUNT-1);

}return;

}

Thedebug.cfile alsocontains thecodefor theunhandledinterrupt handler. Thefirst partof thecodedoesa sanity check thattheinterrupt vector andinterrupt stackrecord areconsistent:

void cpu_debug_int(int vector, uint *frame){

if (vector != (frame[-2] & 0x000000ff)){

rome_kprintf("\nVector (%x) and record (%x) do not match\n",vector, frame[-2]);

rome_kprintf("Interrupt Stack may be invalid \n");}

it thenprintsout theunhandledinterrupt frame:

rome_kprintf("Frame at %x\n", frame);rome_kprintf("\n Vector Number (from interrupt record): %x\n",

frame[-2] & 0x000000ff);rome_kprintf(" IP: %x\n", frame[2]);rome_kprintf(" PC: %x\n", frame[-4]);rome_kprintf(" AC: %x\n", frame[-3]);rome_kprintf(" FP: %x\n", frame);rome_kprintf("Note: The interrupt stack is the current stack.\n");

Theremainderof this routinehandlestheentry to thedebugger:

if (vector == I_MANUAL){

rome_kprintf("Non-maskable interrupt occurred.\n");rome_idebug(frame);

}else{

rome_kprintf("Unhandled interrupt has occurred.\n");. . .

}}

33

7.8 disassembler.c 7 THE CPUPLUG-IN

Thefault handler is calledwhenever a machinefault (asopposedto aninterrupt) is detected.This routineis called from the assembler routine describedabove andprints out the fault information andcalls thedebugger.

void cpu_fault_routine(uint *fault_record, uint *fp){

int f_type;int fs_type;int old = rome_start_critical();

f_type = ((*fault_record >> 16) & 0x000000ff);fs_type = (*fault_record & 0x000000ff);

rome_kprintf("****FAULT RECORD at %x: ", fault_record);rome_kprintf("address %x, type %x = ", fault_record[1], fault_record[0]);

switch(f_type){case OPERATION:

rome_kprintf("Operation-");switch(fs_type){case INVOPCODE:

rome_kprintf("Invalid Opcode\n");break;

. . . .}break;

. . . .default:

rome_kprintf("Unknown Fault Type %d.\n", f_type);break;

}

rome_kprintf("Entering the debugger with frame %x\n", fp);cpu_idebug(fp);rome_end_critical(old);

}

Thesearearetheonly machine-dependent routinesin thedebugger.

7.8 disassembler.c

Thedisassemblerroutine formatsmemoryasinstructionswhencalled from thedebugger discommand ordirectly from within applicationcode.Theroutine is prototypedas:

void cpu_disassembler(uint **instr)

It should format thedataat address*instr asa machine instruction (if possible, elseprint out a 32bit hexnumber), andupdate the instr pointer to point beyond the instruction (for machineswith variable-lengthinstructions, such as the I386). All output should be donethrough rome_kprintf. Thereis littl e pointin describing how to write a disassembler here, I’m sureyou canfind oneor adaptoneof the suppliedversionsto suit your needs.

34

8 THE ICU PLUG-IN

8 The ICU plug-in

TheICU (Interrupt ControlUnit) plugin is thecomponentof theROME corethat dispatchesinterruptstodevice drivers.

8.1 icu.h

The icu header file containsthestandardexported dataandroutinescommonto all interrupt controllers,plusany local additions.For theI960, four additional routinesaredefined, to enableanddisable individualinterruptsandto handle thepending interruptflags. icu.halsoprototypesthethreestandardromeroutinesrome_add_handler, rome_end_critical and rome_start_critical. The other routines in the modulearemachine-dependent andusethe icu prefix.

8.2 icu.c

The main interrupt-control unit codecontains the storage definition for the handlers vector, exportedthrough theheaderfile:

int icu_exception_handlers[260]; /* shared with cpu plugin */

Theroutinesto enable anddisable interruptssetor cleartheappropriatebits in thememory-mappedIMSKregister:

void icu_enable_interrupt(int ino){

CPU_IOSET4(CPU_IMASK_ADDR, BIT(ino));}

void icu_disable_interrupt(int ino){

CPU_IOCLEAR4(CPU_IMASK_ADDR, BIT(ino));}

The rome_add_handler routine adds a routine pointer into the tableandenables the coresponding inter-rupt. Becausethe interruptsareconfiguredin dedicatedmode,XINT0 corresponds to vector12hetc. sothemaskbit canbecalculatedfrom thevector number:

void rome_add_handler(int where, void (*rtn)(int)){

int bitno = (where >> 4) - 1;

icu_exception_handlers[where] = rtn;icu_enable_interrupt(bitno <= 7 ? bitno : bitno+4);

}

Thetestin thecall to icu_enable_interrupt handlesthecaseof thetimer interrupt which occupy bit posi-tions12 and13,but generatevectors 92handa2h.

Theicu_setup_default_handlers routineis called from thecpu_prologueto initialise theinterrupthandlersto a known state before any of thedriver initi alisation routinesarecalled(which maymodify thetable bycalling rome_add_handler)

35

8.3 icu_asm.s 9 THE SERIAL INTERFACE

void icu_setup_default_handlers(){

int i;

for (i = 0; i < 256; i++){

icu_exception_handlers[i] = (int)cpu_pre_debug_int;}

}

8.3 icu_asm.s

The assembler portion of the ICU modulecontains routines that cannot easily be expressedin C. Thisincludesthetwo routinesfor critical section support:

_rome_start_critical:intctl 0, g0ret

_rome_end_critical:intctl g0, g0ret

Thestart routineusestheintctl instruction to disable interruptswhile returning thepreviousinterruptstateinto g0(andsobackto thecaller). Theendroutineusesthesupplied valuein g0 to restore thestateto thatlevel of criticality.

Thefile alsocontainstwo routinesfor examining andclearing thepending-interrupt flags:

_icu_clear_ipend:ldconst CPU_PRIV_RAM_BASE,r5atmod r5, 0, r5 # Clear out the bus unitldconst CPU_IPND_ADDR,r5 # r5 -> IPND registermovl 0, r6 # Mask and data registerssetbit g0, r6, r6 # Set bit in mask registeratmod r5, r6, r7 # Do itret

_icu_check_ipend:ld CPU_IPND_ADDR, r6 # get IPND contentsbbc g0, r6, clr # check and branch if bit is clearmov 1, g0 # bit not clear return 1ret

clr: mov 0, g0ret

Theclear routineis slightly complicatedbecauseof theneedto ensurethatclearing abit doesnot interferewith any otherbits beingsetasynchronously from external sources,hence the useof the atomic-modifyoperations.

9 The Serial Interface

At a minimum,thesystemneeds somebasic character-modeinput andoutput, particularly during initialdevelopment.Usually, this is providedthrougha UART connectedto a serialline. In this case,theboard

36

10 TESTING THE IDLE PROCESS

hasanbuilt-in TI16C550UART chip. As this is a very commonchipset,thereis already a ROME driverfor this UART, in the serial_uart16550module. The modulewaswritten to be quite generic. From themoduledocumentation, a number of parametersneedto beconfiguredto usethedriver. Theseare:

name description default T/O

SERIAL_UART16550_ADD_INCLUDE additional includefile notdef TSERIAL_UART16550_REG_SPACING addressingstep 4 TSERIAL_UART16550_BAUD_SHIFT baud ratemultiplier 4 TSERIAL_UART16550_BAUD_RATE line rate 9600 TSERIAL_UART16550_XTAL_FREQ crystal frequency 1843200 T

SERIAL_UART16550_BASE0 baseof registers none TSERIAL_UART16550_NO_POLLEDIO don’t definepolledroutines unset O

SERIAL_UART16550_SKIP_INIT do not processinit unset OSERIAL_UART16550_ENTER_DEBUG chracter to triggerdebugger unset O

SERIAL_UART16550_CLEAR_INT0 codeto clear interrupt notdef TSERIAL_UART16550_VEC_INT0 interrupt vectornumber none T

SERIAL_UART16550_ADD_HANDLER routine to addhandler none T

Thefinal column indicatesa valuein theTargetfile or a moduleOption.Thethreevalueslabelled ‘none’in the table must be supplied in order that the modulewill build. The others need only be defined orchangedif required. Fromthemotherboardmanualthebaseaddressandregisterspacing of theUART areknown, andthesevalues have already been entered into thetarget file above.

If you arewriting your own serial interfacedriver you will needto defineyour own setof values. Allserial modulesshould support thethreeoptions: NO_POLLEDIO, to stopgenerationof theserial_poll etc.routines(in thecasewherea system hasmultiple serial interfaces); SKIP_INIT, to disable theactionsofserial_initp during systemdebugging; andENTER_DEBUG, which should besetto a characterconstantto causeentry to the debugger from the serial interrupt handler. The moduleshould otherwise providethe four polled-modeinterfaceroutines: serial_in, serial_initp, serial_out andserial_poll in addition tohandling all themessagesfrom theStandard message-set.

It is alsoa good ideato providea routine thatcanbecalled from thedebuggerto display thestateof theserial interface.

10 Testingthe idle process

At this point enough of the systemis in placeto build andtestit. UsingRTB, load the cvmetest projectthatyou usedto enter thetarget file above andcreate new modulesfor CPU_I960,of classKernel.PlugInand ICU_I960, of class Driver.Interrupt. Add the source andheader files to the modulesandmark theheaderfilesasexported.UseCVSto checkout thefour remainingmodulesfor aminimalsystem:ROME;rome_if; Clib; andserial_uart16550. If youhavehadto write yourown serial driver, thenenterthatmoduleinto theprojectanduseit insteadof theserial_uart16550one. Also checkout theStandard message-setandmark it in-use. Build thesystemfrom within RTB andthen‘make’ it in theModules directory. If allgoeswell, you should geta targetfile, anda target.mapfile (andno warningson thecompilations). If youthendownloadthesystemonto your target board andstartit, you should seesomething like this:

ROME Initialising.

37

10 TESTING THE IDLE PROCESS

Copyright 1997 NEC USA Inc.Built by leslie on pc-rome.ccrl.nj.nec.com at Thu Jan 11 17:30:06 2001Starting the Scheduler.....

Followedby nothing. If you do,congratulations,you’ve got quitea long way already. If you don’t, well,my systems don’t usually work the first time either, so skip to the “What if it doesn’t work?” sectionbelow.

If yougot this far, thengobackto theproject view in RTB andselect theoptionsscreen.EnabletheoptionnamedSERIAL_UART16550_ENTER_DEBUG andrebuild thesystem. Now, whenthesystemis running,the‘!’ key should dropyou into thedebugger, like this:

!rome debug>

Now you can try out someof the debuggercommands, and make surethe system is really there, forexampleyou canlist the(two) processesin thesystem:

rome debug> lpProcess serial located at 0xa03fede0.Process idle located at 0xa03fdbd0.

Process pointer is currently pointing to idle.

Try disassemblingthecode at theentrypoint andcomparing it to the_link_first.sfile:

rome debug> dis a00a00000xa00a0000 0x08000024 b 0x000000240xa00a0004 0x8cf83000 0xa005c000 lda 0xa005c000, fp0xa00a000c 0x8c0fe040 lda 0x00000040(fp), sp0xa00a0010 0x86003000 0xa00aa0a0 callx 0xa00aa0a00xa00a0018 0x86003000 0xa00a0158 callx 0xa00a01580xa00a0020 0x66003e00 fmark 0xa00a00240x5f801e00 movq 0, g00xa00a0028 0x5fa01e00 movq 0, g40xa00a002c 0x5fc01e00 movq 0, g8

Thenchecktheregister values in oneof theprocesses:

rome debug> cp serialSwitching process pointer to process serial.rome debug> pinfoProcess serial at 0xa03fede0.G0: 0x00000000 0x00000000 0x00000000 0x00000000G4: 0xa03fede0 0x00000002 0xa03fdbd0 0x00000000G8: 0x00000000 0x00000000 0x00000000 0x00000000G12: 0x00000000 0x00000000 0x00000000 0xa03fddb0PC: 0x00000002 AC: 0x00001002 PFP: 0xa03fddb0 IMSK 0x00000000Stack-- fp: 0xa03fddb0 pfp: 0xa03fdd70 sp: 0xa03fddf0 rip: 0xa00a96f0State: 2, Priority: 5No messages on queue.

Youcangoonto look at thestack,othermemory, tracebacksetc.,but by now, this is agood indicationthatyour debugger is working properly (you’re sureto needit later) andthat there really is a ROME systemrunning on this machine.

38

11 THE TIMER

11 The Timer

Thesecond essential componentfor mostsystem is a timerprocess,to generatetimeoutsatprogrammableintervalsfor applications. TheI960Hx processorhasanon-chip timer accessedthroughmemory-mappedregisters. The addressesfor theseregistershave already been enteredinto the target file above. Sincethe clock frequency is determinedby the bus clock rate, it is necessaryto know the processorspeedinorder to deteminetheclock counter interval. TheCVME965board hasanother memory-mappedlocationwhich givesanindex valueinto a tableof CPUfrequencies. This is theCPU_FREQ_REGISTERlocationdefinedin thetarget file. Theusualconfiguration for thetimer generatesoneinterrupt every millisecond,giving aHZ valueof 1,000.Undersomecircumstances(seebelow, andthenext section) it is interesting torun differentclock values.ThestandardTIMER_DEBUG option allowsthedefault valuesto beoverridenwith inli nedvaluesin thesource,andto print theidle loop counter aftera number of ticks have elapsed.

11.1 timer.c

This is themaindriversourcefile. As with theotherfiles,only extractsof thecodewill beusedto describeparticular points. Thereis a ‘standard’ meansof implementingROME timer queues, wherethe TIMERmessagesarelinked in a single chain in expirationorder, with the ticksvaluesgiving thenumber of ticksbetweeneachmessageon the queue. In this scheme,only the top messageof the queue is examined oneachtick, andall messagesat the headof the queuewith a ticks value of zeroexpire at the sametime.This slowsdown theaddition of messages to thequeue,sincethedeltavaluesmustbesubtractedfor eachmessage,but makestheinterrupt codefaster. Of course,you cando it a differentway (andtell therestofusif it’s better), but this exampleusesthe‘standard’ code.

Another ‘standard’ feartureof the timer moduleis theTIMER_DEBUG option which overridesthestan-dardvaluesasfoll ows:

#ifdef TIMER_DEBUG#define TIMER_DEBUG_TIX 3000#define TIMER_DEBUG_CNT 300000static int tick_ending = TIMER_DEBUG_CNT + 1;static uint cit1, cit2;#endif

In normal operation theroutine counts secondsandsub-seconds:

static int secs = 0;static int tick_counter;static ulong tc;

Thetimerinitialisationroutineprogramsthevaluesinto thetimerto interruptthesystemTIMER_TICKS2SECtimesa second:

void timer_init(void){

tick_counter = 0;timerq = NULL;rome_add_handler(TIMER_VEC_INT, timer_isr);

#ifdef TIMER_DEBUG

39

11.1 timer.c 11 THE TIMER

tc = clock_table[CPU_FREQ] / TIMER_DEBUG_TIX;#else

tc = clock_table[CPU_FREQ] / TIMER_TICKS2SEC;#endif

CPU_IOWR4(TIMER_TMR0, 0);CPU_IOWR4(TIMER_TRR0, tc);CPU_IOWR4(TIMER_TMR0, TMR_AUTO_RELOAD | TMR_SUP_WRITE | TMR_1_X_CLOCK);icu_clear_ipend(TIMER_INT);CPU_IOSET4(TIMER_TMR0, TMR_ENABLE);

}

The tc variable is the number of bus clock cycles in eachtimer interval. The timer is programmedforcontinuous operation with automaticreloadingof thecounter whenit expires.

Messages are added to the timer queue from the queue handler, so no context switch is needed intothe main process. For the default linking strategy, the timer queuehandler routine is commonto allimplementations andis not described in detail here. The routine first returns any messageswith zeroornegative time. Otherwise themessageis linkedinto thetimerqueuein expirationorder. If thetimerqueueis empty, this becomesthe only messageon the queue. Note that asthe queue handler runsasa criticalsection, the queuemanipulation is automatically protectedfrom theasynchronousaccessesmadeby theinterrupt routine.

Themainprocessitself doesnothing, it exists only to give aprocesscontrol block with thename“timer”for thesharedlibrary.

void timer_process(void){

while (1==1){

ROME_MESSAGE *m = rome_await_message(0, 0);

rome_kprintf("timer message %x??\n", m);}

}

This codewill only be invoked in the(unlikely) event that a message otherthanTIMEOUT is sentto thetimer process.

Theinterrupt handler thenoperatesonly on thetop elementof thequeue:

void timer_isr(int ino){

if (timerq){

register ROME_T_TIMEOUT *tp = RCASTP(ROME_T_TIMEOUT, timerq);

tp->ticks--;while (timerq && tp->ticks <= 0){

ROME_MESSAGE *head = timerq;

timerq = head->link;rome_reply(head);tp = RCASTP(ROME_T_TIMEOUT, timerq);

}}

40

11 THE TIMER 11.1 timer.c

Thehandler alsomaintains thetick counter, andclears theinterrupt-pending bit:

if (++tick_counter == TIMER_TICKS2SEC){

secs++;tick_counter = 0;

}icu_clear_ipend(TIMER_INT); /* Clear the IPEND bit */

}

Theoperationof thehandler maybemodifiedby theTIMER_DEBUG flag. In this mode,thehandler firststores thevaluesof theidle process’loop count, thenaftera programmablenumberof ticks,displaysthenumberof idle loops elapsed:

if (--tick_ending == TIMER_DEBUG_CNT){

cit1 = idle_one;cit2 = idle_two;

}else if (tick_ending == 0){

uint dt1, dt2;

if (cit1 > idle_one){

dt1 = 0xffffffff - cit1 + 1 + idle_one;dt2 = idle_two - (cit2+1);

}else{

dt1 = idle_one - cit1;dt2 = idle_two - cit2;

}rome_kprintf("%d %d\n", dt2, dt1);

}

Thecomputation is designedto handle thecasewherethemachine idlessorapidly thatmorethat� ��

idleloops occurin theinterval.

Finally, the timer modulecontainsa routine that canbe called from the debugger to dumpthe contentsof the timer queue. In this manner, the debuggeris largely independent of theparticular arrangementorimplementationdetails of theindividual modules, andtheprogrammercanstill getuseful informationintheeventof a crash or strangeoperation:

void timer_show_timerq(){

ROME_MESSAGE *head = timerq;

while (head){

register ROME_T_TIMEOUT *tp = RCASTP(ROME_T_TIMEOUT, head);

rome_kprintf("R %d -> %s\n", tp->ticks, head->src->name);head = head->link;

}}

Theother routine in thesourcefile, timer_tag returns thecurrent timercount valuesfor programswishingto do their own interval timings.

41

11.2 timerlib.c andtimerlib.h 12 INTERRUPTTIMING

11.2 timerlib .c and timerlib .h

The timerlib.c and timerlib.h files contain the machine-independent interface to the timer, handling themessage queueing andcallbackroutines. Thesetwo files canbe copied from any existing timer moduleinto your own module andshould not require any modifications.

11.3 Timer Testing

Add the new timer moduleinto your project, andrebuild the system. The system should run asbefore,except that the “timer” processshould be in the system. To verify the basic interval timer, enable theTIMER_DEBUG option andmodify thesourcewith thefoll owing values:

#define TIMER_DEBUG_TIX 1000#define TIMER_DEBUG_CNT 120000

Whenyou next start the system, usean external timer (stopwatch or another computer) to measure thetime interval betweenstarting the scheduler andthe timer codeprinting the idle counts. This should bealmostexactly 2 minutes.This codetests that thetimer valueshave beenprogrammedcorrectly, andthattheinterrupt handler is capable of processingthe120,000 interrupt needed to getthis far. Longer-runningtiming testsusingPerf below canbeusedto tunetheexactvaluefor thetimer registers,at this stage it issufficient for it to bewithin a second or two of therealvalue.

12 Interrupt Timing

The timing debug modecanbe usedto estimatethe processing time required to handle the timer inter-rupt. Giventhat the timer is a very simple interrupt handler, this givesa good ideaof thebasicinterruptdispatching overhead in thesystem. The idea is to notetheconsumption of extra idle loops asmoreandmoretimer interruptsareprocessedpersecond. Runthetimer debug codewith theTIX valuesetto 1000,2000.. 5000andtheCNTvaluesetto �� (i.e. to make themeasurementsover100secondsinde-pendently of theTIX value). This should producea setof results with the following trend(but obviouslywith differentnumbersfor a differentmotherboard):

ticks/sec idles/sec delta

1,000 2.750,7402,000 2,741,674 9,0663,000 2,732,415 9,2594,000 2,723,391 9,0245,000 2,714,362 9,029

This showsthat eachinterrupt is equivalentto 9 idle loops, andthat an uninterrupted system would run2,759,800 idle loops/second, which is equivalentto 306,600 timer interrupts. Thuseachtimer interrupttakes approximately 3.25 � s. This calculation is very approximate, but it gives a ‘ballpark’ figure forworking out how thesystemwill perform under real load.

42

14 THE SCV64VMEBUSCONTROLLER

13 Running the perf process

By now you arein aposition to make someusefulperformancemeasurementson your system in additionto theinterrupthandling timemeasured above. In theRTB project, unset theTIMER_DEBUG option, andaddto the project the Consolemoduleandthe Perf moduleandrebuild. Perf measuresthe peformanceof the basic rome messaging core by sending and replying to messages in the various configurationspossible, testing bothQueueHandling andfull Context switching. RunningPerf producesfour numbersin a continuousloop, with onemeasurementevery 10 seconds.Thenumberare:

QueueHandling Thetime takento sendasinglemessagewhich is replied to from within aqueuehandlerandto receive thereply at thesending process.

Context Switching The time taken to passa message through a QueueHandler (unhandled) into a mainprocess, generate the reply (not passing through a QueueHandler) andswitch back to thesending processto receive thereply. That is, this number represents1 Queue-Handlerplus2individual context switches.

Routine Call Thetime to call a routine,which just increments a counterandreturns.

ProtectedRoutine Call Thetimeto call aroutinewhichstartsacritical section, incrementsacounter, endsthecritical section, andreturns.

The time for a individual context switch is �� . Running Perf on the system asbuilt abovegivesthefollowing numbers:

field value ( � s)

QueueHandling 6.93Context Switch 37.10RoutineCall 0.41

ProtectedRoutineCall 0.99SingleContext Switch 15.09

Thesearenot badnumbers for a 33MHz system. Thesenumbers serve asa useful comparison againstother operating systems, as long as the other figuresare measuring the samevalues. The Perf valuesrepresentrealuser–usermessaging, not the‘kernel-only’ partor somecarefully- crafted subset of therealfunctionality. Theinterruptdispatchingvaluemeasured above(3.25 � s) is in thesamerangeastheQueueHandler figure,which is a goodindicationthat this implementation is internally consistent andsimilar toother portsof ROME.

The Perf program canbe left running over a much longer interval. Every 15 minutes it prints out theelapsed time, which can be usedas a more accuratemeasure of the real-time timer programming,forexampleby running thecodeover 24 hours. This is alsoa goodtestthat there areno spurious‘glitches’in thesystemthatmanifestthemselvesonly infrequently.

14 The SCV64VMEb us controller

Theprecedingsectionsareenoughto getyou to thestage whereyou canstart writing device drivers andapplications for mostsystems.On somemachines,thereis still one‘missing link’ to accessing devices,

43

14.1 Additionsto theTargetFile 14 THE SCV64 VMEBUS CONTROLLER

andthat is thebuscontroller. Often,this is a PCIbuscontroller which givesaccessto theoptionalplug-incardson the motherboard. For the Cyclone boards, it is the SCV64 VMEbus controller, giving accessto theexternal interfacesconnectedthrough theVME backplane.This controller is neither a realdevicedriver, norpartof theROME core,but understandinghow to write theROME codefor it is akey to portingROME to suchsystems,soI will describeit here.

The main feature that distinguishesa bus controller from otherdrivers is that thereis usually no mainprocess(or queue handler) since it does not itself handle any messages. The moduleconsists of an ini-tialisation routine,andinterrupt handler, anda shared library usedby device driverson the bus. It is an‘initonly’ process.Also, the initi alisation routine mustbe calledbefore the initialisation routinesof anydevicesconnectedto thebus,sothemoduleinformationin RTB mustreflectthis.

14.1 Additions to the TargetFile

The SCV64chip hasa number of memory-mappedregisters in the f000.0000hregion. The region hasalready beenset to big-endiananduncachedin the cpu_prologue routine above, in preparation for thiscode. The target file containstheusual definitionsof whereto find the relevent registers,taken from theCyclone BoardUserManual

#define VME_SCV64_BASE 0xf0000000#define VME_IACK0 0#define VME_IACK1 0xf4000013#define VME_IACK2 0xf4000007#define VME_IACK3 0xf4000017#define VME_IACK4 0xf400000b#define VME_IACK5 0xf400001b#define VME_IACK6 0xf400000f#define VME_IACK7 0xf400001f#define VME_IPL_REG 0xb4200000#define VME_XINT0 0#define VME_XINT1 1#define VME_VEC_XINT0 0x12#define VME_VEC_XINT1 0x22#define VME_SIZE_SHARED SHARED_MEM_SIZE_4M

The IACK addressesareusedhandle the different levels of VMEbus interrupt. As is commonwith buscontrollers, multiple device interrupts are combined into a few processor interrupts. In this case,theSCV64is connectedonly to external interruptpins0 and1 on theprocessor. Finally, theamountof sharedmemoryon thesystemis definedhere.

14.2 The vme.hheaderfile

TheVME buscontroller is acombinationof interruptdispatcher, shared-memorymanagerandDMA con-troller. Theexternal interfacecontainsroutinessimilar to theICU module,vme_add_handler, vme_enable_interruptandvme_disable_interrupt. It alsosupport dynamic plugging of modules, sohasa vme_remove_handlerroutineanddynamicinterruptgenerationthroughthevme_generate_interrupt routine.Theheaderfile alsoexportstheinterfaceto theDMA controller ‘done’ handler, throughvme_add_doneandvme_remove_done,andtheability to requestmemoryin thesharedspace,through vme_shared_malloc.

44

14 THE SCV64VMEBUSCONTROLLER 14.3 Thesvc64.csource file

14.3 The svc64.csourcefile

Most of the source codefor the controller driver is very specific to the VMEbus architecture, so thisdescription will concentrate on theprinciples, rather thanthedetails.

14.3.1 The init routine

Thepurposeof theinit routine is to preparethebussothatdeviceson theremoteside(from theCPU)canbe accessedandidentified. The baseaddressof the VME shared memorycanbe setby the switches onthefront panel of theboard. Ratherthewrite a separatetarget file for eachpossiblesetting, theaddressiscalculateddynamicallyby reading theswitchsettings;

int addr = CPU_IORD1(VME_IPL_REG);

/* setup shared memory */

vme_shared_mem_size = VME_SIZE_SHARED;vme_shared_mem_base = ((addr >> 3) << 27);

Mostof theremainderof theroutine is writing valuesinto thememory-mappedregistersusing thevariousCPU_IOmacros.

The ROME-specificpart at the endof the routine setsup the default VME interrupt handlers andclearsthe‘DMA done’ user-specifiable routine:

for (i = 0; i < VME_MAX_VECTORS; i++){

vme_proc[i].handler = vme_default_handler;vme_proc[i].parameter = i;

}done_routine_ptr = done_def_routine;

It alsocallsthestandardROME routinesto addhandlersfor thetwo interruptsconnectedto theVMEbus,andclears any pending interrupt (causedby theinitialisation codeor left over from thebootROM):

rome_add_handler(VME_VEC_XINT0, vme_scv64_isr0);icu_clear_ipend(VME_XINT0);rome_add_handler(VME_VEC_XINT1, vme_scv64_isr1);icu_clear_ipend(VME_XINT1);

At this point, other devices on theVMEbusshould beableto access their register setsanddataareas.

14.3.2 The interr upt handlers

TheSCV64controller uses2 interrupts. Thehigher priority (isr0) is usedto critical buserrorsandfaultswhile the lower priority (isr1) is usedfor other faults and device interrupts. Interrupts comea one ofseven levels andareaccompanied by a ‘vector’ (or device code)written to theVMEbus,which mustbereadfrom a level-specific location. For external device interrupts, the handler reads the level from theVME_IPL_REG location, thereads thevectorfrom thecorresponding VME_IACKn addressasdefined in

45

14.4 TheSCV64Module in RTB 14 THE SCV64 VMEBUS CONTROLLER

thetarget file. It thenusesthevectorto index theinterrupt table, in thesameway asthe ICU modulesetsup its array.

Onedisadvantageof this approachis that there is now a three-level interrupt dispatcher. First throughthe interrupt handler in the CPU module, thento the VMEbus handler andfinally to the device handler.However, thealternative is to try to build theVMEbushandler directly into theCPUplugin, whichmakeitmuchharderto separatetheprocessorfrom themotherboard. Also, sincemostof thecodeof theVMEbushandler is directedtowardserrorprocessing, this is bestleft outsidethemaincore of thesystem.

14.3.3 The shared library

Theimplementation of theshared library is straightforward based on thedescription given in theheaderfile section above.

14.3.4 Tracing and Debugging

TheVMEbusis not theworld’s mostforgiving busto program.It is quiteeasy to lock thebus, or generatebus timeouts, which will causeerror-level interruptsto the SCV64 handlers. Debugging theseproblemsusually requiresadumpof theregisterof thecontroller, andabrief history of thebusactivity. Themoduleimplements a trace record similar to the standard ROME trace, but for VMEbus events. The reasonthis wasseparatedfrom the main traceis partly history andpartly so that the VME tracecanbe viewedseparatelyfrom themainROME tracing (which might have progressedbeyond thepoint wheretheusefulVME eventwasstored).

Thetracerecordsareaddedfrom theinterrupthandlersasthey occur, andthedisplay routine,vme_view_errors,is writtensothat it canbecalled directly from within thedebugger(assumingthetarget wascompiled withsymbol tablesupport). Although the register settings for the SCV64chip canbe examinedfrom withinthe debugger(using the dm.w command), decoding the various bit patterns to find the error indicationsis tedious, andprone to error. The moduledefinesanother debug-callable routine scv64_debug whichformatsandinterpretstheregisters. This is a very long routine, andit might bepossible to conditionallycompileit out of a system (using a module option) if thespaceit takeswasneededfor other code.On theotherhand, VMEbus errorstendto be unpredictable,andhaving support for this level of informationisoftenvaluable. Writing suchdebug-callable routinesis a useful way of keeping thesystem asmodular aspossible, since thecoredebuggerknows nothing about thespecifics of theVMEbuscontroller chip.

14.4 The SCV64Module in RTB

Therearea few points to notewhencreating this modulein RTB. Adding thedescription andsourcefilesis straighforward. You mustalsoadda processto theprocesslist. Theprocessnamedoesnot matter(but"vme" seemslike a good choice). After entering thename,thefields for thequeuehandler andthemainprocessmustbeclearedto zero, andtheinitialisation routinesetto vme_scv64_init to correspond with theactual file.

Theotherentry thatmustbemodifiedis the‘link order’ field, whichdefaults to 99(unset). Thevaluemustbesmallerthanthatfor any deviceusingtheVMEbus.My preferenceis to use0–9for system-critical linkpositions,10–19 for bus controllers and20–29for early-link devices,which makes‘10’ a good numberfor thelink order.

At this point, it should bepossible to build themoduleinto thesystem,andtry it out.

46

15 TUNING THE SYSTEM

15 Tuning the System

The first rule of tuning is only tunea working system.. If it doesn’t work reliably, you’re not tuning it,you’re fixing it, andthis is thesubject of thenext section, not this one. If you’re here,I assumethatyoucanrun Perf for dayson end(or overnight at least) with no strange errors, thatstarting thesystem worksreliably time aftertime after time andthatyou don’t getany odd‘glitches’ (like strangecharacters on theserial interface)thatyou have promisedyourself you’re going to look into ‘one day’.

The second rule of tuning is to be very suspicious of the changesyou make, especially in the low-levelroutines. Only change one thing at a time, andback it off if it doesn’t work (you kept the old versionsomewheresafe, right?).

Thethird rule of tuning is don’t sacrificereliability for speed. If you havea ‘speedup’ thatwill work 99%of thetime(you know, unless theinterrupthappensbetween thosetwo particularinstructions,or thetimeris just about to roll over) you’re not tuning thesystem, you’re breaking thesystemandit’s time to go fora walk.

Sowhatcanyou do to make a systemgo faster? At this stage, look at theimplementation of theinterruptandcontext handling andthecriticality routines.Thecriticality routinesarereally simple, with oneline ofassembler each,andarequitesuitable for inlining. Thefile icu.hwasmodifiedto inlinethesetwo routines:

#define rome_start_critical() \({ int __old; \

__asm __volatile ("intctl 0, %0" : "=r" (__old)); \__old; \

})

#define rome_end_critical(_o) \__asm __volatile ("intctl %0, %0" : : "r" (_o))

andthe whole system wasrebuilt. The assembler listing for the protectedroutine call code in Perf waschecked to make surethe inlined codewascorrectly generated. The following extract shows the intctlinstructions at thestartandendof theroutine:

230 03c8 1E16805C mov g14,g0231 03cc 001EF05C mov 0,g14232 03d0 001CA865 intctl 0, g5233 03d4 0030A090 ld _ih,g4233 F4050000234 03dc 0108A559 addo 1,g4,g4235 03e0 0030A092 st g4,_ih235 F4050000236 03e8 1514A865 intctl g5, g5237 03ec 00100484 bx (g0)

As expected,this improvedtheperformancefigures.Thatthevalue for a regular routinecall increasedisnot unusual, sincetheindividual performancefiguresvary slightly depending on exactly how theinstruc-tionsarealignedwithin thecache.

47

16 WHAT IF IT DOESN’T WORK?

field value ( � s)

QueueHandling 6.17Context Switch 35.52RoutineCall 0.36

ProtectedRoutineCall 0.65SingleContext Switch 14.68

Otheroptionsto tunethesysteminvolve taking a careful look at thesave/restore codefor context switch-ing, for exampleto interleave instructions is possible for maximumparallelismor to placeusefulcodeinotherwise wastedbranch-delay slots on thosearchitecturesthat have them. I’ve already donethis oncefor mostof the I960 code,so thereisn’t anything elsemuchto do here. The oneplaceI did not tune inthis examplewasthe first-level interrupt handler. The following codeperformsthesamefunction astheversion listedabove (theinstructions areidentical) but theorderis changedto maximisethepipelining:

intdisstt g12,(sp)movq g0, r4 # Store globals in localslda 12(sp), sp # Carve out room on stackmovq g4, r8 # (can only stash 12 registers)

#ifdef CPU_BIG_ENDIANldob -5(fp), g0 # which intr vector (BE)

#elseldob -8(fp), g0 # which intr vector (LE)

#endifmov 0, g14 # prevent cx switchesld _icu_exception_handlers[g0*4], g1movq g8, r12 # this way)st g14, _rome_allow_reschedulecallx (g1) # Call the handler ; ino in g0

This reducedthe ‘delta’ valuefrom 9,020to 8,920. At this point, I hopeyou’re asking if it wasworththeeffort. Thenew codeis harder to foll ow thantheold one,andany changesto arelikely thebreak theoptimisation. This is probably a goodcaution againstgoing all-out to tunebits of the core. Given thatthe timer interruptsoccupy under 0.5% of the CPU, this may be a wasteof effort. On the other hand,improving the context switching time wasworthwhile. The next useful placesto look for optimisationsarein thedatamovementareas, for exampleby providing fastimplementations of memsetandmemcpy.

Eventhough thecriticality routinesarereplacedwith macros,theoriginal routineswereleft in thecode.This allowsinterpreters to usethemby looking up theroutinenamein thesymboltable. It’s slower thaninlining, but interpreting is really slow anyway.

16 What if it doesn’twork?

Soyou loadup your code, startthesystemat its entrypoint andnothing happens,or the‘f ault’ light goeson on themotherboard, or theROM reportssomecatastrophic error, now what?

This is theplaceto find ‘Don’t Panic’ in large friendly letters.This section is proof thatwe’vebeenthere,andpassed beyond it. By now I’ve probably seenjust abouteverything that cango wrongwith a system(but therearealwaysnew surprises),including the dreaded puff of orange smoke (which smellsreallyterrible). Whereto startreally dependson whatyour working with, andhow far you think you’re gettingwith thesystem.

48

16 WHAT IF IT DOESN’T WORK? 16.1 New Hardware

16.1 NewHardware

Probably the hardest system to get working is a brand-new board, especially if it also contains some‘experimental’ hardware,andyou’re alsowriting the boot ROM yourself. If you’ve got the money, buyanICE (In-Circuit Emulator)for your chosen CPUanddebug thesystemwith that. If that’s beyondyourfinances,astoragescopeor logic analyserwill help(to seeif there’sany busactivity, or if read/write linesaregoingup anddown). However, this manualisn’t really aboutdebugging hardware. Theeasiestway towork with a new board(or anold one)is to make sureyour systemhassome(i.e. one)LED(s) that canbecontrolled from software,preferably by just writing to a location in memory. It’s alsouseful to put anLED somewherein thepower line, or to check the+5V line with a Voltmeteronce in a while — its hardto debuga fried board.

For the mostpart, I hopeyou areworking with a board that’s fairly stable, andwith a boot ROM that’sbeenseento work with other systems, or cantalk to its UART andconvinceyou it’s alive.

16.2 Loader Problems

It’s not uncommonthat theboot ROM refusesto loadthe target file you’ve just made.It’s likely that theformatgeneratedis not 100% compatible with theROM loader. If you have a file which doeswork, thenyou cancompare the headers (are they both the samerevision level of the selected loader-file format?)andthe contents(e.g. doesyour file contain any ‘extended’ options suchaslong addressesin MotorolaS-records?).If you have thesource for a working file, compileandlink it with theROME toolchainandcomparetheoutput to theworking version. Youmightalsowantto try compiling atiny assemblerfile (sayto turn on theLED) andloading that. If thefile formatuseschecksums,you might write a littl e util ity onLinux thatverifiesthechecksums,thentry sending afile with abadchecksumandseeif theROM detectsit.

If the boot ROM loads the file, but fails to run it, thenyou might want to checkthe following. Hasthefile loaded at the right place? If the ROM candisplay data, doesthe dataat the entry point correspondto thecompiled file? Is thefile compiled for the right endianness? If theROM hasa disassembler, doesthe instruction at theentrypoint look reasonable? Hasthetarget beenlinked in the right order? You cancheck with the target.mapfile that _link_first.o is at the start of the target. Doesthe loader report thecorrect entry point (if it supportstarget file formats wheretheentrypoint canbedynamicallyvaried)? Ifnot,you mayneedto adda line to thelinker file to forcethecorrect entrypoint.

If you can load a small file (turning on the LED) but not a ROME system, does the loader have someinbuilt limits on file size,or is it timing out over thelink (e.g.theserial line) usedto downloadthefile? Ifso,you need to get(or write) a betterbootROM. You might alsocheck thatyour systemdoesfit into thememoryon theboard, you mayneedto reduce thesizeof datastructuresor stacks to make it fit.

16.3 Initialisa tion Problems

The most commonproblems with new systemslie in the assembler code betweensystem startup andcalling rome_start. This is especially trueof systemswhich have an‘all-or-nothing’ modechange in thispartof thecode(for exampletheI960sysctlto movetheprocessortables, or thechangeto protectedmodeon the I386). Most often,you will start the systemandsee— nothing! No output on the serial line, noflashinglights, nothing. Therearetwo or three strategiesto copewith this, all of which I’ve usedat onetime or another.

49

16.3 Initialisation Problems 16 WHAT IF IT DOESN’T WORK?

1. if you canpersuadethe serial interfaceto work from within ROME, you may be able to usethedebuggerto tracewhat’s going on.

2. if you cancontrol theLED from within thecode, you canlocate faults by moving theon/off LEDposition around

3. if you have anexternal instruction-set-simulatorfor theCPUyou cantry running the initialisationcode on it andseeif it detects thefault.

It might besurprising,but option 3 oncesolvedan‘impossible’ problemwith context switching whereanerrorocurredvery rarely in awaythatcould nevereasily bereproduced,anddestroyedmostof thesystemafter it happened.

Option1 is whatlies behind that‘alternativeentrypoint’ atoffset4 in thestartup codeabove. If you havea system with a working serial interface,thensetthe SKIP_INIT option (so you don’t breaka workingconfiguration) andtry starting thesystem at thealternateentrypoint (or re-compile with thefirst instruc-tion asa no-op). It’s actually quite likely that thedebuggerwill run,andyou canuseit to checkmemoryanddisassemblecode. If thisworks,youcanmovethedebug entrypoint furtherdown thecodeandcheckwhathappensat eachstage(for exampleafterclearing BSSto zero).

However, it’s morelikely that you’ve broken the serial line too, so the LED is your only option. Evenif there isn’t an obvious indicator light, sometimes you can be creative to get an indication of what’shappening. I debuggedthe I386 ROME system using the ‘motor’ light on the floppy drive, which thestandardROM boot codeleft on after booting thesystemfrom a floppy disk, andI could switchoff usingan outbyte instruction. On the Cyclone board, the ROM set the ‘run’ light on, but when the machinefaultedthe‘f ault’ light cameon,soat least I couldseesomething washappening.

By moving the LED code around, you cannarrowdown the range of failing code, or at least detect thepoint afterwhich everything is bad. Thatdoesn’t necessarily meanthat it is theprecedinginstruction thatcauses theproblem,for exampleclearing thewrongareaof memoryinstead of theBSSmaynotaffect thesystem until muchlater, dependingon just whatwascleared by mistake. Thehardestproblem is usuallythe ‘all-or-nothing’ instruction, in the example above the sysctl reset. If any of the tablesareincorrect,you won’t getvery far beyond thatcode.

The first version of the CVME965 system I wrote usedthe wrong valuefor the REGIONdescriptor forthe main RAM (becauseI looked at a manualfor an older boardby mistake). The result wasthat thecodecould not proceedoncethe new tables wereloaded. I solved it by examining the memory-mappedlocationsfor thesystemregistersthatcontainedtheversionsassetby theROM, andcompared eachentrywith the value I wasabout to usein my new tables. It may be that you really want to alter someof thevalues (for example to change processormode). If you can,you might beableto startby duplicating theROM values, getting the‘null’ modechangeto work, andthenmakingtherequired changesslowly untilthesystembreaks again. Ultimately it comesdown to staring at thecodeandlooking for something thatshouldn’t bethere, or is missing.

Thereis still one litt le point that might go wrong. OnesystemI hadstarted perfectly and printed thefollowing string:

EMORinI laitnisi.gn

50

16 WHAT IF IT DOESN’T WORK? 16.4 Serial-Line Problems

followed by similar ‘garbage’. This is clearly not the usualbaud-rateproblem on the serial line asthecharacterswereall correctASCII. If you haven’t workedit out yet, it’s “ROME Initialising\n” in reverse-endianness! The processorfetched instructions in 32-bit wide endian-independent loads, so the codeworkedfine,but all thecharacterpointer accesseswereback-to-front. Changing theendianness fixedthatone.

16.4 Serial-Line Problems

If entering thedebuggerdirectly doesn’t work, or theLED showsyoureturningfrom thefirst rome_kprintfin thestartup code, but there’snooutput, or thesystem appearsto havetranslateditself into Icelandic, thenyou’ve got problemswith thepolled-mode serialline.

If disabling the serial_initp routine doesn’t help, thenit’s likely oneof two things hashappened:eithertheboot ROM resettheserial interfacebefore starting your code(yes, I’ve seenoneROM thatdid that),or thecodeto accesstheserial interfaceis corrupted, perhapsbecauseof storesinto thewrong locationsduring initialisation, or is just plain wrong, for example by addressing the wrong memoryareas, or thewrongregisterspacing. A really simple(i.e. ‘dumb’) exampleof thishappenedwith my I386 system.Theserial interfacewasconfiguredto usethestandardserialport I/O addressat 2f8h. However, theBIOShadCOM1setto the‘alternate’addressof 3f8handCOM2(at2f8h) wasdisabled.SinceI hadtheserial cablepluggedin to COM1on thebackof themotherboard,I didn’t getany output. It alsohelps to try putting aNULL modemin theline in case thesystemandthedisplay have different ideas about which is theDTEend.

If theserial line worksin thebootROM, andwith noserial_initp, but failsafterre-initialising,at leastyouknow it’s the init code. The sameis true of the ‘Icelandic’ output (becauseof the ThornsandEthsthatcomein the garbage),which meansthe baudrateis wrong. SomemachinesuseUART crystals that arenot at theusual1.8MHz. If you canget the interfaceto work without theinit code, on someUARTs youcanreadout the valuesprogrammedinto the baud-rate, parity andflow control flags,andseehow theymatchup with thevaluesyou areabout to put in.

If changing the link order makes the problem go away (try editing the order of the object files in theld.input file) then you have got dataor codecorruption. Obvious placesto look are the dataclearingportionsof the initialisation code,particularly if you clearmemory‘backwards’ (starting at the endandworking downwards). In onecaseI remember, I calculated the numberof bytesto clearusing the bsssymbols, and then cleared memoryusing word operations, so clearing four times as much spaceas Iexpected. This wiped out a chunk of the initialised datain the system,including, of course, the placewherethebaseaddressof theserialinterfacewasstored.

If thebootROM doesn’t usetheserial port, andyou can’t get theport work at all, try putting a breakoutbox on theline andlooking for thelightschanging to indicatedata. If all elsefails, treatit asa ‘hardware’problem andattackit with a ’scope.

16.5 Context Switching

Thenext mostlikely placeto fail, onceyou’vegot throughinit, is thefirst context switchin cpu_scheduler.Therearetwo separate reasonswhy this is going to fail. Thefirst is thecontext switch itself, which willtransfer your programinto hyperspaceif it goesatall wrong.Thesecond is thatthis is wheninterruptsareenabled for thefirst time,andyou will handle any pending interruptsthatmight besitting around. Theseproblemswill manifest themselvesin oneof two ways. If you’re lucky, you will enter thedebugger with

51

16.6 Interrupt Handling 16 WHAT IF IT DOESN’T WORK?

a fault message at somestrange location. If you areunlucky, whenyou try to enter thedebuggerwith the‘!’ key to testtheidle process,nothing will happen.

Thegoodnews is thatyou should beableto usesomeof thefacilitiesbuilt in to ROME to helpfrom thispoint on. As a first step, you canturn on theROME_TRACE_CXSWITCHandCPU_KPRINTF_TRACESoptionsandrerunthesystem.This should tell you if you aregetting past thefirst context switch, since itshould print two linessimilar to thefollowing:

@0x00000000 proc 0xa005c0c0 code 83 arg a03fec80@0x00000000 proc 0xa03fec80 code 83 arg a03fda70

Thefirst line is thetrace entry from rome_start just before it calls cpu_scheduler. Thesecond line comesfrom within rome_await_message calledby theserial processto switchto theidle process.You probablygetthefirst line, but not thesecond.

Herearesomestrategiesfor tackling this. First,youcanedit acall to rome_debug into theROME modulejust before the call to cpu_scheduler anduseit to check the dataareasfor the processes(you should beableto usethe‘lp’ commandin thedebugger at thispoint). Second,youcanbypassthecodein thecontextswitch to enableinterrupts,so the serial process,andthenidle areentered with interruptsstill disabled;this will catchthe problem of the badpending interrupt. Third, you canreplace the ret instruction (orequivalent)at theendof thecontext switching codewith acall to thedebugger. Thesystemshould appearto be in the serial processat this point, with a stackframeand register setappropriate to that process.Fourth,you caninsert acall to rome_kprintf at thestartof theserial codeto checkthatit is beingentered,andoneat thestartof rome_await_message.

If you get the entry into the fault handler instead, you should try to matchup the faulting addresswithoneof thevaluesneartheprocesscontrol block. For example, hasthesystem tried to branch to thestackpointer insteadof the entry point? Doestheprocessstacklook plausible for initial entry to the process?If the fault addressis ‘close’ to the process entry point, try disassemblingfrom the entrypoint onwardsandlook for corruptedcode. If you find some,thenwork backwards,by putting calls to rome_debug intothe code,until you trace the point at which it is corrupted. If the fault appearsto be in valid code, thenlook moreclosely around that area.Oneproblem I hadwith another I960 systemwasa fault just insiderome_await_message. All thecodelookedvalid, but theinstruction pointer wasconsistently setjust aftera callx instruction. The call was to (a broken version of) rome_start_critical, which wasgenerating aprogramfault,delayeduntil afterthefollowing ret.

Faultsjust aftera call to rome_end_critical usually indicatea problemin disabled codeor in aninterrupthandler. The TRACE_INTERRUPTSoption may help show which interrupt handler is being called justbefore the fault. In onecase I wasgetting VMEbus system errorsshowing up at a rome_end_critical.Tracingthe interrupt showed the mostrecent handler wasalwaysa VMEbusethernet card,which hadavery long interrupt handler. I addeda VMESTATUS tracerecordto the systemandrecordedthe stateofthebuson entryandexit to thehandler. Indeedit wasgeneratingabusfault. By adding further tracecallswithin the handler it waspossible to narrowthe problem down to a incorrectly-initialised pointer. It is,of course, not necessaryto run such testswith theKPRINTF_TRACESoption set,indeedthatalone maymasksomeproblems.Theusualapproachis to usethedebugger trace commandonce thefault hasbeendetectedto review theevents leading up to thecrash.

16.6 Interrup t Handling

If context switching workswith interruptsdisabled, but not otherwise,or the‘!’ characterdoesnot enterthedebugger, there is aproblem with theinterrupt code.This is almostasbadasdebuggingtheinitialisa-

52

17 AND FINALLY...

tion code,but not quite. Thefirst stepis to find out if the interrupt handler is getting calledat all, eitherby putting code at the start to turn on an LED or by inserting a call to rome_debug. The chancesarethereis analignmentor offset problem in theinterrupt table, soexternal interruptsarenot being vectoredcorrectly. Basically, if enabling interruptscauses thesystemto fail, andfirst-level interrupt handler is notbeing entered,thenthereis a problem with theinterrupt table configuration. Theonly solution is to stareat thecodeuntil enlightenment dawns,aidedperhapsby looking at theactual systemwith thedebuggertospotdiscrepanciesbetweenwhatyou thought you wroteandwhatis in themachine.

If you’vemadeit into idle, but ‘!’ doesnothing,thentheproblemis eitherthesameasabove,or theUARTinterrupt is notcorrectly configured.Unlessyou’rereallydesperate,don’t putarome_kprintf or equivalentinto the serial interrupt handler, it’s a goodway to lock up the system with infinite interrupts(becauseItried it). You might want to put someoutput (or a call to rome_debug) after theserial initi alisation code,to check that thehandler is installed in thecorrect vectorandthe interrupt maskvalueis setto allow theinterrupt through.You might alsocheck in theBuild (or in Hardware.h) that theENTER_DEBUG optionhastheright valueof ‘!’. If you still have theLED codein theinterrupt handler, you should seethelightgo onwhenyou pressany key; if not, thecharacter-receivedinterrupt is not enableon theUART. Youcantry to enable ROME_TRACE_INTSandremake the serial module (andsetCPU_KPRINTF_TRACESifyou’re brave) andseeif it really makesit into thehandler. If it doesn’t, thenthe interrupt vectoris beingdecodedincorrectly. If it does, thentheinput character is being reador testedincorrectly.

Onceyou’re passedthis point, the only otherproblem is likely to comein Perf after the first roundoftesting and the “QueueHandler” message is printed. This will be the first time that a processcontextswitchwill occur asaresult of aninterrupt— whenthefinal characterof thestringis printed, thebuffer isreturnedto thePerf processwhich is waiting for it, duringwhich timetheidle processhasbeenscheduled.This will test if the IRSchedcode, or its equivalent, correctly saved the context informationsuchthat itcanberestored.Hitting a problem in this codewasoneof themoresurprising bugs I found while portingtheROME core,sinceI really wasn’t expecting it this ‘late’ in theprogram. It just goesto show, thecodedoesn’t work until until it’ s actually beenrun

17 And Finally...

Everything works, the systemis tunedand the performancefigureslook good. All you’re going to donow is documentthe new modules andsubmit themto the server, right? If, like me,you’ll admit to themistakesaswell, you might wantto addsomemorehints to thepreviouschapter, about how yougot yourROME system working.

53

Documents

Porting ROME to a new architecturerome.sourceforge.net/downloads/v1.0/docs/PortingGuide.pdfPorting ROME to a new architecture May 1, 2001 Leslie J. French Distributed Systems Software