6
Evaluation of some proposed name-space architectures using ISPS J.Djordjevjc, M.Sc, Ph.D., R.N.Ibbett, M.Sc, Ph.D., F.B.C.S., and F.H.Sumner, B.Sc, Ph.D., F.B.C.S. Indexing terms: Digital computers and computation Abstract: In name-space architectures, the mapping of names onto fast registers is a hardware, rather than a software, function. The MU5 computer is an example of such an architecture, having a single-address instruc- tion format with some stacking facilities, and this paper introduces proposed two-store-address and three- store-address architectures developed from MU5 concepts. ISPS descriptions of all three architectures have been written, verified and used in a series of experiments conducted at Carnegie-Mellon University, Pittsburgh, from Manchester University, England, using the ARPA Network. Results are presented of measure-, ments of static and dynamic code usage for a number of benchmark programs run on the ISPS simulation models of these systems, and comparisons between the three architectures are made on the basis of these results. 1 Introduction An important characteristic of the architecture of a computer is the number of addresses contained in its instruction format. Decisions about this number are generally based on the intuitive feelings of the designer(s) in relation to economic considerations, the expected nature of the implementation, and the choice of operand address size. For example, if an instruction contains only register addresses, so that the main store is addressed indirectly through registers, then several addresses can be accom- modated (e.g. the three addresses of the CDC 7600). Furthermore, it is generally assumed that machines in which instructions have several addresses will require fewer instructions to carry out a given computation. However, such machines inevitably involve greater hardware cost and complexity, and where full store addresses are used, multiple-address instructions have generally been regarded as prohibitively expensive both in these terms and in terms of static and dynamic code requirements. Thus, one store address per instruction is usually the limit, although there are exceptions; for instance, some register machines (e.g. the PDP-11) have variable-sized instructions and allow up to two full store addresses to occur in their long-instruction format. Arguments in favour of register machines are the compactness of code obtained because of the short register addresses, and the short access time to operands held in the fast-programmable registers. Against these arguments must be set the need for complex optimising compilers to allow efficient use of these registers by high-level language programs, and the need to dump and restore these registers during procedure entry and exit. An examination of the operands in high-level languages indicates that the forms commonly used are scalars, array elements, strings, literals and procedure calls. The scalar variables, identified by names, are particularly important, since studies of programs run on Atlas, and confirmed by measurements made on MU5 1 show that, over a large range of programs, 80% of operand accesses are to named operands, and that only a few of these names are in frequent use at any one time. A system which keeps these operands in fast registers should clearly be able to achieve high performance. However, to Paper 792E, first received 8th January and in revised form 28th March 1980 The authors are with the Department of Computer Science, Univer- sity of Manchester, Manchester M13 9PL, England 120 0143-7062/80/04 0120 + 06 $01-50/0 avoid the use of fast-programmable registers, the MU5 computer 2 ' 3 uses an associatively-addressed 'name store' which forms part of a 'one-level store' with the main store of the processor, and in which the allocation of names to registers is performed solely by hardware. Identification of names is made by operand-type information in the instruc- tion, and the address within an instruction is a short offset (corresponding to the name of the operand) from an implicit name-base register containing the full store address of the start of the name space of the currently executing procedure. Since the number of names in use at any one time is small, a short offset can be used in the instruction, and this has allowed a 30-bit virtual address space (of 32-bit words) to be referenced by 16-bit instructions in MU5. By exten- sion, the possibility arises of defining architectures with a two-address instruction format of 24 bits and a three- address instruction format of 32 bits all referring to the same 30-bit virtual address space. In order to assess the static and dynamic code requirements of such machines, ISPS descriptions of a two-address architecture (MU5/2) and a three-address architecture (MU5/3), both derived from MU5, were written and evaluated using the Carnegie-Mellon University ISPS computer architecture evaluation system via the ARPA Network, as described in a companion paper 4 in this issue. The short sequences of hand-coded instructions that are generally used to compare instruction formats having different numbers of addresses tend to be mis- leading, and do not give any real indication of the relative performance to be expected when complete programs are considered. The synthetic scientific benchmark devel- oped by Curnow and Wichmann 5 and a subset of the Computer Family Architecture (CFA) test programs developed at Carnegie-Mellon 6 were therefore used in this evaluation. Each of these programs was compiled on MU5, using its standard software, and hand coded for MU5/2 and MU5/3, in order to produce the necessary simulator command files. In order to avoid introducing differences in programming style, the programming practice used by MU5 compilers was closely followed in the hand coding; so that, for each program, the code for MU5/2 and MU5/3 is virtually a transliteration of the MU5 code into two-address and three-address format. The use of a zero-address, or stacking-machine design, for MU5, has been rejected on a number of grounds, 3 including, in particular, the fact that a single stack is under IEEPROC, Vol., 127, Pt. E, No. 4, JULY 1980

Evaluation of some proposed name-space architectures using ISPS

  • Upload
    fh

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Evaluation of some proposed name-space architectures using ISPS

Evaluation of some proposed name-spacearchitectures using ISPS

J.Djordjevjc, M.Sc, Ph.D., R.N.Ibbett, M.Sc, Ph.D., F.B.C.S., andF.H.Sumner, B.Sc, Ph.D., F.B.C.S.

Indexing terms: Digital computers and computation

Abstract: In name-space architectures, the mapping of names onto fast registers is a hardware, rather than asoftware, function. The MU5 computer is an example of such an architecture, having a single-address instruc-tion format with some stacking facilities, and this paper introduces proposed two-store-address and three-store-address architectures developed from MU5 concepts. ISPS descriptions of all three architectures havebeen written, verified and used in a series of experiments conducted at Carnegie-Mellon University,Pittsburgh, from Manchester University, England, using the ARPA Network. Results are presented of measure-,ments of static and dynamic code usage for a number of benchmark programs run on the ISPS simulationmodels of these systems, and comparisons between the three architectures are made on the basis of theseresults.

1 Introduction

An important characteristic of the architecture of a computeris the number of addresses contained in its instructionformat. Decisions about this number are generally based onthe intuitive feelings of the designer(s) in relation toeconomic considerations, the expected nature of theimplementation, and the choice of operand address size.For example, if an instruction contains only registeraddresses, so that the main store is addressed indirectlythrough registers, then several addresses can be accom-modated (e.g. the three addresses of the CDC 7600).Furthermore, it is generally assumed that machines in whichinstructions have several addresses will require fewerinstructions to carry out a given computation. However,such machines inevitably involve greater hardware cost andcomplexity, and where full store addresses are used,multiple-address instructions have generally been regardedas prohibitively expensive both in these terms and in termsof static and dynamic code requirements. Thus, one storeaddress per instruction is usually the limit, although thereare exceptions; for instance, some register machines (e.g.the PDP-11) have variable-sized instructions and allow up totwo full store addresses to occur in their long-instructionformat.

Arguments in favour of register machines are thecompactness of code obtained because of the short registeraddresses, and the short access time to operands held in thefast-programmable registers. Against these arguments mustbe set the need for complex optimising compilers to allowefficient use of these registers by high-level languageprograms, and the need to dump and restore these registersduring procedure entry and exit. An examination of theoperands in high-level languages indicates that the formscommonly used are scalars, array elements, strings, literalsand procedure calls. The scalar variables, identified bynames, are particularly important, since studies of programsrun on Atlas, and confirmed by measurements made onMU51 show that, over a large range of programs, 80% ofoperand accesses are to named operands, and that only afew of these names are in frequent use at any one time. Asystem which keeps these operands in fast registers shouldclearly be able to achieve high performance. However, to

Paper 792E, first received 8th January and in revised form 28thMarch 1980The authors are with the Department of Computer Science, Univer-sity of Manchester, Manchester M13 9PL, England

120

0143-7062/80/04 0120 + 06 $01-50/0

avoid the use of fast-programmable registers, the MU5computer2'3 uses an associatively-addressed 'name store'which forms part of a 'one-level store' with the main storeof the processor, and in which the allocation of names toregisters is performed solely by hardware. Identification ofnames is made by operand-type information in the instruc-tion, and the address within an instruction is a short offset(corresponding to the name of the operand) from animplicit name-base register containing the full store addressof the start of the name space of the currently executingprocedure.

Since the number of names in use at any one time issmall, a short offset can be used in the instruction, and thishas allowed a 30-bit virtual address space (of 32-bit words)to be referenced by 16-bit instructions in MU5. By exten-sion, the possibility arises of defining architectures with atwo-address instruction format of 24 bits and a three-address instruction format of 32 bits all referring to thesame 30-bit virtual address space. In order to assess the staticand dynamic code requirements of such machines, ISPSdescriptions of a two-address architecture (MU5/2) and athree-address architecture (MU5/3), both derived fromMU5, were written and evaluated using the Carnegie-MellonUniversity ISPS computer architecture evaluation systemvia the ARPA Network, as described in a companion paper4

in this issue. The short sequences of hand-coded instructionsthat are generally used to compare instruction formatshaving different numbers of addresses tend to be mis-leading, and do not give any real indication of the relativeperformance to be expected when complete programsare considered. The synthetic scientific benchmark devel-oped by Curnow and Wichmann5 and a subset of theComputer Family Architecture (CFA) test programsdeveloped at Carnegie-Mellon6 were therefore used in thisevaluation. Each of these programs was compiled on MU5,using its standard software, and hand coded for MU5/2 andMU5/3, in order to produce the necessary simulatorcommand files. In order to avoid introducing differences inprogramming style, the programming practice used by MU5compilers was closely followed in the hand coding; so that,for each program, the code for MU5/2 and MU5/3 isvirtually a transliteration of the MU5 code into two-addressand three-address format.

The use of a zero-address, or stacking-machine design,for MU5, has been rejected on a number of grounds,3

including, in particular, the fact that a single stack is under

IEEPROC, Vol., 127, Pt. E, No. 4, JULY 1980

Page 2: Evaluation of some proposed name-space architectures using ISPS

considerable pressure in a high-performance environment,where it has to support all the functions involved incounting, address calculation, and main calculation, andsuch a system was therefore not considered here.

2 MU5/2 and MU5/3 instruction sets

The MU5 architecture is described in the companion paper,4

where the summary instruction set is shown in its Fig. 1.The basic instruction formats for MU5/2 and MU5/3 aresummarised in Table 1 of the present paper. An MU5/3instruction comprises four 1-byte fields: the function-codefield (J), two source fields (sS7 and S2),and the destinationfield (D). Most MU5/3 functions take operands specifiedby SI and S2, and return the result to D. An MU5/2instruction comprises three 1-byte fields: the function-codefield (/), the first source and destination field (Sl/D), andthe second source field (S2). Most MU5/2 functions takeoperands from Sl/D and S2, and return the result to SI /D.The one-address instruction format is also used in theinstruction sets of both MU5/3 and MU5/2 for someorganisational instructions, since they fit best into thisformat. It contains a 1-byte function code field if), andonly one operand field (S/D), which is used as either sourceor destination.

In common with MU5, long names (16 bits) and literals(up to 64 bits) are also allowed, leading to the extendedinstruction formats (Figs. Id and e). For MU5/3, theextended-operand specifications follow immediately afterthe 4 or 2 bytes of the basic instruction. The maximumpossible instruction length is thus 22 bytes (4 bytes for thebasic-instruction format, 2 x 8 bytes when both sourceoperands are 64-bit literals, and 2 bytes for a 16-bit destina-tion name). The MU5/2 instruction length expands in thesame way, and varies between 3 bytes for the basic format,and 19 bytes when both extended operands are 64-bitliterals.

Addresses in MU5/2 and MU5/3 are identical with thosein MU5, and the same name-segment layout is assumed.Registers NB, XNB and SF are therefore provided in bothsystems, together with a set of one-address instructions thatoperate on them. These are the only visible registers ineither system, since all computational functions return aresult to the destination operand specified in the instruc-

Table 1: Instruction formats in MU5/2 and MU5/3

(a) MU5/3: 3-address basic instruction format

S1 S1 S2

(b) MU5/2: 2-address basic instruction format

S1/D S2

(c) MU5/3 and MU5/2: 1-address basic instruction format

S/D

{d) MU5/3: 3-address extended instruction format

(e)

f

MU5/2:

f

SI S2 EXSJ EXS2 EXD

2-address extended instruction format

S1/D S2 EXS1/EXD

1

EXS2

{f) MU5/3 and MU5/2: 1-address extended instruction format

tion. These functions are also assumed to be identical withthose in MU5. Unconditional and conditional controltransfers in MU5/2 are one-address instructions, and con-ditional transfers in MU5/3 are combined with the COMPfunction in a three-address instruction. Instructions involvedin procedure-calling are identical for all three architectures.

Primary operand forms in MU5/2 and MU5/3 are similarto those in MU5 and are encoded for both source anddestination as shown in Table 2. In short mode, the 2-bitKSH field specifies a 6-bit literal, a 32-bit or 64-bit variable,or extended mode, while the 6-bit NSH field is treated asthe literal or name to be added to NB. In extended mode,NSH is split into a 2-bit KXT field and a 4-bit NXT field.KXT specifies the kind of operand, while NXT defines thelength of literal or addressing register, as appropriate. WhenSF is used, the effect is to unstack a source operand andstack a destination operand, so that if SF is used for alloperands in one instruction, the effect is equivalent tothat of a stacking machine.

Data-structure elements are accessed by functions inMU5/2 and MU5/3, using a descriptor mechanism similar tothat in MU5, but in the absence of visible registers, the twosource operands are used to specify explicitly the descriptorand modifier. In MU5/3, the element itself is returnedto/taken from the destination operand for a read/writeoperation, whereas in MU5/2 the top-of-stack location isused implicitly.

3 Evaluation of results

Criteria by which different computer architectures may becompared have proved difficult to establish, particularlywhen the architectures have widely different characteristicsand where differences can arise from implementationfeatures rather than from only the instruction set.4'7

Criteria which have gained acceptance are the numbers ofobeyed instructions, and the memory space occupied bythe compiled code for the same programs run on eacharchitecture.6'8 Memory space is taken here to mean thenumber of bytes required for the static code, and has beenobtained in each case by examination of the programlistings. The numbers of instructions obeyed were obtainedfrom the ISPS simulation runs. These simulation runs alsoallowed an additional measurement to be taken, thenumbers of instruction bytes supplied from memory duringexecution, i.e. the number of instruction bytes executed.

Table 2: Operand types in MU5/2 and MU5/3

Short operands (KSH =£3):

2KSH

6NSH

KSH = O LITERAL (NSH = 6-bit signed integer)KSH = 1 32-bit variable (at address NSH+NB)KSH = 2 64-bit variable (at address NSH+NB)KSH = 3 extended-operand format

Extended operands (KSH = 3):

2KSH

2KXT

4NXT

S/D EXS/EXD

KXT = O LITERAL

KXT = 1 32-bit variableKXT = 2 64-bit variable

KXT = 3 system registers

(NXT specifies 16, 32, 64-bitsigned and unsigned literals)

(NXT specifies the addressingregister used: SF, NB, XNB, 0)

IEEPR0C, Vol., 127, Pt. E, No. 4, JULY 1980 121

Page 3: Evaluation of some proposed name-space architectures using ISPS

3.1 CFA test program results

The static code space requirements for each of the CFAprograms run on the MU5, MU5/2 and MU5/3 ISPS simula-tion models are shown in Table 3, together with the ratiosof the MU5/2 and MU5/3 figures relative to MU5. Theaverage values for both MU5/2 and MU5/3 are not sig-nificantly different from the MU5 average (with ratios of1-05 and 0-94 respectively), although the ratios for individ-ual programs range from 0-91 to 1-21 for MU5/2 and from0-74 to 1 -06 for MU5/3. More significant are the variationsin the numbers of obeyed instructions, shown in Table 4.The average ratio for MU5/2 is 0-74, with ratios forindividual programs ranging from 0-57 to 0-88; whereas, forMU5/3, the average ratio is 0-51, with individual ratios ofbetween 0-43 and 0-59.

Thus, for this set of programs, the two-address archi-tecture requires, on average, three-quarters the number ofinstructions to be obeyed as compared with the one-address architecture, and the three-address architectureneeds only half that number. However, the two-address andthree-address instructions are each proportionately longerthan the one-address instructions, and counts of thenumbers of instruction bytes executed produce the valuesshown in Table 5. Here it can be seen that, on average,MU5/2 requires 1-04 times as many bytes as MU5 andMU5/3 0-97 times as many, i.e. not significantly differentin either case, and in close agreement with the static coderequirements.

3.2 Wichmann Algol benchmark results

The corresponding figures for the Wichmann Algol bench-marks are shown in Tables 6-8. Here, the memory-spacerequirements are higher for both MU5/2 and MU5/3, with

Table 5: Instruction bytes executed by CFA programs

MU5MU5/2MU5/3

Instruction bytes212822222020592

Ratio to MU5

1040-97

ratios of 1-18 and 1-22, respectively, although someindividual modules require less (e.g. 0-77 for module 10 onMU5/2 and 0-91 for module 4 onMU5/3). The worst figuresare for module 2, which requires 1 -48 times as much spaceon MU5/2 and 1-82 times as much on MU5/3. Module 2(and module 3) exercises the array-element accessingmechanisms, which involve additional instructions in thetwo-address and three-address schemes proposed here,rather than the simple use of an addressing mode, as inMU5. The numbers of instruction bytes executed are alsorelatively greater than for the CFA programs, being 1-14times higher than MU5 for MU5/2 and 1-16 times higherforMU5/3.

The numbers of obeyed instructions for the twoproposed architectures again show significant decreasesrelative to MU5, however, with ratios of 0-84 for MU5/2and 0-70 for MU5/3. The reasons for the less dramaticdecreases, as compared with the CFA programs, becomeapparent when the Wichmann benchmark is analysed. Manyof the program statements contained within it are long onesinvolving, typically, four arithmetic operators plus an assign-ment operator. These statements can be coded efficientlyin one-address instructions, but not in two-address or three-address instructions. The CFA programs are more typicalof programs generally, which contain much higherproportions of short statements.9 In addition, modules 7and 11 test mathematical functions and, for the sake ofcompatibility, these have been transliterated from MU5

Table 3: Memory space occupied by CFA compiled code

Character searchBoolean matrix transposeRecord unpackVector-to-scan line conversionHash table searchLinked list insertionPresort on large address spaceAutocorrelationArray manipulationTarget tracking

Total

Memory space inMU5

9880

18440410633210086

140384

1914

MU5/210097

180451123302

9585

155425

2013

bytesMU5/3

8282

164430

92274

7478

142386

1804

Ratio toMU5/2

1 0 21-210-981-121-160-910-950-991-111-11

1 0 5

MU5MU5/30-841 0 20-891 0 60-870-830-740-911 0 11 0 1

0-94

Table 4: Numbers of obeyed instructions for CFA programs

Character searchBoolean matrix transposeRecord unpackVector-to-scan line conversionHash table searchLinked list insertionPresort on large address spaceAutocorrelationArray manipulationTarget tracking

Total

MU5493175

14623130

399669441

1209551

1104

9633

Instructions obeyedMU5/2

388154830

2364346506302878413940

7121

MU5/3272104631

1586230379190624301617

4934

Ratio toMU5/20-790-880-570-760-870-760-680-730-750-85

0-74

MU5MU5/30-550-590-430-510-580-570-430-520-550-56

0-51

122 IEEPROC, Vol., 127, Pt. E, No. 4, JUL Y 1980

Page 4: Evaluation of some proposed name-space architectures using ISPS

Table 6: Memory space occupied by Wichmann Algol compiled code

ModuleMemory space in bytes

MU5 MU5/2 MU5/3Ratio to MU5

MU5/2 MU5/3

1 Simple identifiers2 Array elements3 Array as parameter

Procedure pa4 Conditional jumps6 Integer arithmetic7 Trig, functions8 Procedure calls

Procedure p 39 Array references

Procedure p 010 Integer arithmetic11 Standard functions

509017994456

137343542412641

74134

16144

3766

148424043462044

70164

181434064

138453545522543

1-481-480-941-450-841-181 0 81-241 -141 0 21-120-771 0 7

1-401-821 0 61-440-911-141 0 11-321001071-270-961 0 5

Total for modulesMathematical functionsTotal

712699

1411

854817

1671

882838

1720

1-201-171-18

1-241-201-22

Table 7: Numbers of obeyed instructions for Wichmann Algol benchmark

ModuleInstructions obeyed

MU5 MU5/2 MU5/3Ratio

MU5/2to MU5

MU5/312346789

1011

78710

663170838835

149563327620352

1315728

60729

706243197569

137362787814186

814980

41725

529832845259

114322248414183

812469

0-771 0 31060-610-860-920-840-700-620-95

0-531020-800-460-600-760-680-700-620-79

Total 107662 90527 75183 0-84 0-70

Table 8: Instruction bytes executed by Wichmann Algol benchmark

Instruction bytes Ratio to MU5MU5MU5/2MU5/3

338854386900392082

1-141-16

code into MU5/2 and MU5/3 code without consideration ofmore suitable algorithms. Thus, the six mathematicalfunctions tested by the Wichmann benchmark are calculatedin MU5 by the evaluation of the appropriate polynomial.This method involves successive accumulative additions andmultiplications, which can be coded very elegantly in one-address instructions using the accumulator, but not in two-address or three-address instructions.

3.3 Comparisons with other architectures

In terms of overall architecture, the only significant differ-ence between MU5 and the proposed MU5/2 and MU5/3systems is the number of addresses per instruction. Further-more, since the MU5/2 and MU5/3 programs are transliter-ations of those produced by the MU5 compilers, it is theeffects of this difference alone which are being evaluated.However, it was also felt that a comparison with othertwo-address and three-address architectures should beattempted. Mainly because of their importance, but alsobecause of their availability at Manchester University, thePDP-11 and CDC 7600 were chosen for this purpose.However, in the absence of an ISPS description of the 7600and the Algol compiler for the PDP-11 concerned, a differ-ent technique was adopted for obtaining the numbers of

IEEPROC, Vol., 127, Pt. E, No. 4, JULY 1980

obeyed instructions. The first step was the generation ofassembly listings of the Wichmann Fortran benchmark forthe PDP-11, CDC 7600 and MU5 by compiling this pro-gram on each machine, and the production of MU5/2 andMU5/3 listings by handcoding from the MU5 listing.Examination of these listings then allowed the number ofinstructions inside the main loop of each module to becounted, so that by multiplying these numbers by theweights assigned to the modules (i.e. the number of timesthe main loop in the module is executed) and adding theresults, the total number of instructions obeyed in theprogram could be estimated. This technique was validatedin two ways, by applying it to the Wichmann Algol bench-mark and comparing the values obtained with the ISPSresults of MU5, MU5/2 and MU5/3, and by similarlyapplying it to the Wichmann Fortran benchmark for MU5.Agreement between these figures was extremely close inall cases, thus giving confidence in the method.

The static code measurements for the Wichmann Fortranbenchmark on MU5, MU5/2, MU5/3, the PDP-11 and CDC7600 are shown in Table 9, and the calculated numbers ofobeyed instructions for modules 2, 3, 4, 6, 8, and 9 inTable 10. The static code figures for MU5, MU5/2 andMU5/3 are closely comparable with the Algol figures, anddifferences which do arise can be accounted for veryaccurately by examination of the compiled code. (Differ-ences in the code arise from the different techniques usedin these languages for array-bound checking, parameterpassing, display updating etc.)

In terms of both code space and numbers of instructionsobeyed, the figures for MU5/2 are very close to those for

123

Page 5: Evaluation of some proposed name-space architectures using ISPS

Table 9: Memory space occupied by Wichmann Fortran compiled code

Module

1 Simple identifiers2 Array elements3 Array as parameter

Procedure pa4 Conditional jumps6 Integer arithmetic7 Trig, functions8 Procedure calls

Procedure p 39 Array references

Procedure pO10 Integer arithmetic11 Standard functions

TotalRatio to MU5

MU5

12016040

13096

116296100

54102566092

1422

Memory space in bytesMU5/2

142209

32144

77134331115

6598524495

15381 0 8

MU5/3

142240

3418286

122310108

5498585294

15801-11

PDP-11

15617040

122108154214

706688627868

13960-98

CDC 7600

163172

75150165150382

9067

12052

105120

18111-27

Table 10: Numbers of obeyed instructions for the Wichmann Fortran benchmark

Module

234689

TotalRatio to MU5

MU5

540397673808610

3506113552

69119

MU5/2

468334641827560

287688008

523320-76

Instructions obeyedMU5/3

468334634444830

215768008

416720-60

PDP-11

336259051667560

2247512936

510630-74

CDC 7600

468397668886090

2876812936

591260-86

the PDP-11, although there are noticeable differences forindividual modules; the static code figures given in thecompanion paper for the CFA programs4 also show closeagreement between MU5/2 and the PDP-11. Thus, the per-formance of the two-address version of MU5 is comparablein these terms with a conventional two-address registermachine, even though all MU5/2 addresses refer to storelocations which form part of a very much larger addressspace than that of the PDP-11.

The three-address version of MU5 again performs betterthan the two-address version but it also performs consider-ably better than the CDC 7600, which, despite having athree-address instruction format, requires more instructionsto be obeyed and more code space than any of the othermultiple-address systems considered here. (Since the CDC7600 has a 60-bit word, the number of bytes was obtainedby dividing the total number of bits required by eight. Interms of static code, the 7600 suffers because its long-format instruction (30 bits) cannot cross 60-bit wordboundaries, and labels must always point to the start of a60-bit word. Both these factors lead to the planting of extra'PASS' instructions in the code. Dynamically, the 7600suffers from the conflicting requirements of hardwareperformance and compiler uniformity. For instance, in theinterests of hardware performance, the floating-point unitsdo not implicitly normalise their results, but the multiplyunit has to have normalised-input operands for the single-precision multiply instruction. The compiler, therefore,has to plant a normalise instruction after an addition orsubtraction.

4 Conclusions

The most important conclusion which can be drawn fromthis evaluation is that traditional arguments against two-

store-address and three-store-address instruction formats donot apply to name-space architecture. The systems proposedhere require approximately the same amount of code to bestored in memory and to be supplied to the processor asthe one-address architecture from which they are derived.Both require significantly fewer instructions to be obeyedthan the one-address system; the performance of the two-address system is comparable with that of the PDP-11, atwo-address register machine, and the three-address systemperforms significantly better than the CDC 7600, a three-address register machine.

The important question which remains to be answeredis whether such systems can give significant performanceimprovements over the corresponding one-address system.MU5 is a highly pipelined asynchronously controlledprocessor incorporating associatively addressed bufferstores. Assuming that the two-address and three-addresssystems have comparable floating-point execution units,and that this is a limiting factor on performance, then theimportant differences between the three systems are theinstruction and operand accessing rates required to keepthis unit occupied. For the Wichmann Algol benchmark,the total numbers of instruction bytes executed are approx-imately 15% higher for the multiple-address architectures,so that in order to sustain the same overall instructionexecution rate (and hence obtain the maximum improve-ment in program execution rate), the rate at whichinstruction bytes are supplied must be increased not onlyby 15%, but also in inverse proportion to the numbers ofinstructions being obeyed (i.e. by a total of 152% in thetwo-address system and 192% in the three-address system).Furthermore, one three-address instruction clearly requiresthree operands, and there is a significant engineeringproblem in providing three operands at a rate sufficient to

124 IEEPROC, Vol., 127, Pt. E, No. 4, JULY 1980

Page 6: Evaluation of some proposed name-space architectures using ISPS

maintain the same instruction execution rate as that of theone-address system. Some of the multiple-port integrated-circuit stores now available would appear to offer a possiblesolution to this problem, but the problem of operandinterdependency between instructions also becomes moreimportant with multiple-address instructions. Thus, adetailed design study would be needed in order to assesshow much of the benefit accruing from the use of a two- orthree-address name-space architecture could be translatedinto overall system-performance improvement.

5 Acknowledgments

The authors wish to acknowledge the co-operation ofM.R. Barbacci of Carnegie-Mellon University and thesupport of the Defense Advanced Projects Agency (DARPA)of the Department of Defense, USA. Access to the ARPANetwork was made possible through the good offices ofthe INDRA group at University College, London.J.Djordjevic acknowledges financial support from theInstitute M. Pupin, Belgrade, Yugoslavia.

Jovan Djordjevic graduated with theFaculty of Electrical Engineering atthe University of Belgrade, Yugoslavia,in 1970. In the period between 1970and 1975 he worked on the design ofa hybrid computer system in theComputer Engineering Department ofthe Institute 'Mihailo Pupin', Belgrade.He received M.Sc. and Ph.D. degrees incomputer science from the Universityof Manchester in 1977 and 1979,

respectively. He is currently a Visiting Research Associatewith the Department of Computer Science at Carnegie-Mellon University. His research interests include computerarchitecture, design automation, and computer performanceevaluation.

Frank H. Sumner received his B.Sc.degree with honours in Chemistryfrom the University of Manchesterin 1951 and started working withcomputers at Manchester in 1952 as auser of the Mark 1. He later wrotesystems programs for the Mercurycomputer and was involved in thelogical design of the Atlas and MU5computers. He was awarded his Ph.D.in 1954 and became a Professor of

Computer Science at Manchester in 1967. His continuingresearch interests are in computer architecture, mainly ofhigh-performance systems. Professor Sumner has been amember of the UK Science Research Council's ComputerScience Committee, and chairman of that Committee'sEducation and Training Panel. He was a member of theComputer Board from 1974 to 1978 and President of theBritish Computer Society for the year 1978-79. He ischairman of the Program Committee for the 1980 IFIPCongress.

6 References

1 IBBETT, R.N., and HUSBAND, M.A.: 'The MU5 name store',Comput. J., 1977, 20, pp.227~231

2 IBBETT, R.N., and CAPON, P.C.: The development of the MU5computer system', Commun. ACM, 1978, 21, pp. 13-24

3 MORRIS, D., and IBBETT, R.N.: 'The MU5 computer system'(Macmillan, 1979)

4 DJORDJEVIC, J., IBBETT, R.N., and BARBACCI, M.R.:'Evaluation of computer architecture using ISPS' (see pp 126-135)

5 CURNOW, H.J., and WICHMANN, B.A.: 'A synthetic benchm-mark', Comput. J., 1976, 19, pp.43-49

6 FULLER, S.H., SHAMAN, R., and LAMB, D.: 'Evaluation ofcomputer architecture via test programs', AFIPS NCC Conf.Proc, 1977,46, pp.147-160

7 FULLER, S.H., STONE, H.S., and BURR, W.E.: 'Initial selec-tion and screening of the CFA candidate computer architectures',AFIPS NCC Conf. Proc., 1977, 46, pp.139-146

8 LAVINGTON, S.H., and KNOWLES, A.E.: 'Assessing the powerof an order code'. Proceedings of IFIP congress, 1977, pp 477 —480

9 KNUTH, D.E.: 'An empirical study of Fortran programs',Software-Pract. & Exper., 1971,1, pp. 105-133

Roland N. Ibbett received his B.Sc.degree with honours in Physics fromthe University of Manchester in 1962,and his M.Sc. degree in 1964. From1962 to 1966 he worked mainly atHull University on computer tech-niques in astronomical spectrometry,and was awarded his Ph.D. in 1967. In1966 he joined the staff of theComputer Science Department atManchester University, where he

became involved in the design and implementation of MU5.Since then he has published numerous papers on the designand performance of MU5, and is co-author of a book onthis subject. In 1976 he spent the fall semester at Carnegie-Mellon University, Pittsburgh, as Visiting Associate Profes-sor in Computer Science, and currently holds the positionof Senior Lecturer in Computer Science at ManchesterUniversity.

Mario R. Barbacci is a research Com-puter scientist with the Department ofComputer Science at Carnegie-MellonUniversity. His current researchinterests include computer architec-ture, archival memories and designautomation. He received the B.S.E.E.and Engineer degrees from theUniversidad Nacional de Ingenieria,Lima, Peru, in 1966 and 1968, res-pectively. Dr.Barbacci received a Ph.D.

degree from Carnegie-Mellon University in 1974. Whilecompleting his doctoral studies at CMU, he held a researchassistantship, working in design automation and computerdescription languages. During the summers of 1970 and1971 he worked at the Thomas J. Watson Research Center,International Business Machines Corporation, doing studiesof performance evaluation of time-sharing systems. Duringthe 1977-78 academic year he was on leave of absence fromCMU, working with the Research and Development Groupat Digital Equipment Corporation. He is a Senior Memberof the IEEE, and a member of the ACM and Sigma Xi. Inaddition, he serves as Chairman of the InternationalFederation for Information Processing (IFIP) WorkingGroup 10.2 (Digital Design Descriptions and Tools). He isthe author of numerous articles and technical reports andserves as consultant to several government and industrialorganisations.

IEE PROC, Vol., 127, Pt. E, No. 4, JUL Y1980 125