INDEX [link.springer.com]978-1-4615-3930-8/1.pdf · Index ACORNAMR See VLSI VL86COI0 Addresses address field, 52, 53, 55, 56 address path, 9 ... current window pointer, 98-99 data

INDEX

Index

ACORNAMR See VLSI VL86COI0

Addresses address field, 52, 53, 55, 56 address path, 9 breakdown with octal notation, 52 disk controller translator, 59 dynamic-address-translation facility, 59 index field, 52, 53, 54 logical address, 43, 57 physical address, 43 virtual address, 57

Addressing modes as delitieating instruction set architecture, 10 as delineating program architecture, 12 complexity, 24, 61-62, 63 defmed by mode bits, 24 design concern, 19,24 extension word, 24 high-level language support, 24 in advanced microprocessors, generally, 9, 23-24 in reduced-instruction-set computers, generally, 63,66 substituting indexed addressing for direct addressing, 66 See also Motorola 68XXX family

Advanced Micro Devices, Inc. AMD2900, 152 AM29~,93-94 AM29334,94 AM29337,94 AM29332,94 development tools for AM29000

ADAPT29K, 173

architecture simulator, 174-177 ASM29K, 174 High C29KANSI C compiler, 174 PCEB29K, 171-173 XRA Y29K, 174

See also AM29000

Algorithm affecting program locality, 49 for calculating factorial, 136, 138, 139 for digital-signal processing, 201 for hierarchical decomposition in Occam2, 183, 194 granularity, 17 numerical algorithm in data-flow computer, 156 routing algorithm, 223, 224, 225, 228

Altered bit, 56

AM29000 (AMD RISC) addressing

base-plus-offset, 162 burst mode accesses, 164 relative addressing, 162

arithmetic operations, 170 array processing, 177 bounds checker, 94,99 branch target cache (btc), 161, 166-167 clock cycles per instruction, 161 CMOS technology, 159 constant generator, 95 constants, 95, 96, 161 control signals, 95 control unit, 95 current window pointer, 98-99 data flow, 160 delayed branching, 94 development tools, 171-177

245

index 246

AM29000 (continued) embedded applications, 159 exceptions, 94 execution unit, 159, 161

address unit, 169 arithmetic-logic unit, 94, 95, 96, 169,170

barrelshlller, 94, 96 fiwnnelshlller, 94, 96

decode circuit, 161 register ftle, 95, 96, 169

absolute register number, 162 general-purpose registers, 161, 162,163 global registers, 161, 162 local registers, 161, 162 register banks, 162 special-purpose registers, 161, 162, 163, 164, 165 updating dynamically, 164

external instruction memory, 161 graphics, 177 instruction prefetch buffer (ipb), 161, 166 instructions

branch target instructions, 166, 169 call instruction, 94 fixed-instruction format, 94 immediate instructions, 169 instruction set, 95 number of, 159 operands, 161,170 orthogonal, 96 prefetching, 165

instruction fetch unit, 159, 161, 167-169 interrupts, 166 logic operations, 170 memory management unit, 159, 161

least-recently-used hardware, 171 pipelined address translation, 171 software db reload, 171 translation look-aside buffer (tlb), 171 . virtual memory, 171

mentioned, 16, 93-94 mips, 159 multiplexer, 94, 95 nested procedures, 162 operating system values, 162 overlapping register windows, 96, 97 passing parameters, 162 pipeline, 41,94, 161, 169,171 program counter unit, 94, 167

program counter, 167, 168 program counter buffer, 168 program counter multiplex, 167 return address latch, 167, 169

programmable logic array (PLA), 95 relational operations, 170 similarity to Berkeley RISe, 94 stack pointer, 162 task switching, 162 trap, 166 32-bit device, 159 uses for, 177 variables, 162 write-back stage, 161

Application program, 10

Architecture application architecture, 10 as subset of computer system organization, 10 classes of, 11-12 control flow, 8 data-flow architecture

actors, 129, 130, 131, 135 centralized control unit absent, 129 conditionals, 132, 135, 136 concurrency, 129, 156 dynamic data flow, 140-141 interconnection schema, 134 iterations, 132 iterative schema, 135, 137 mentioned, 8, 13 nodes, 140, 141 operands,129,156 programs

apply function, 135-136

arcs, 130 activity store, 137, 139 activity templates, 134, 135, 137, 141 conditional graph, 131, 132, 135 copy function, 135-136 cyclic graph, 131, 132 deadUyembrace, 132, 133, 134 data-driven type, 130, 131 decision diamond, 130-131 demand-driven type, 130 directed graph form, 130, 131 fetch unit, 139 firing rules, 130 instruction, 137 instruction queue, 137, 139, 140 loop graph, 131, 132 merge symbol, 130, 131, 132, 135 operation packet, 137, 139 operation unit, 139-140 program counter absent, 129 race condition, 132-133 result packet, 137, 140, 141 switch symbol, 130, 131, 132,135 tokens, 130, 131, 133, 156 update unit, 140

static data-flow, 140-141 See also DDM1 computer, EDFG data-flow computer, LAU dataflow computer, Manchester dataflow computer, MIT dynamic and static data-flow computers, NEe PD7281, Texas Instrument DDP

deftnition ofterm, 10-11 diagram of, 10 generic architecture, 12 Harvard architecture

bandwidth, 14 data memory as module of, 13 diagram of, 13 instruction memory as module of, 13 mentioned, 12, 202

instruction set architecture, 10 internal architecture, 10

index 247

language-directed in reduced-instruction-set computer, 66 LOAD/STORE architecture, 80 of8008,2,3 of 4004, 2 program architecture

deftnition of term, 12 effect on cost-performance ratio, 12 semantic gap, 23

programming perspective, 10, 12 semantic gap, 67 system architecture

deftnition of term, 12 effect on cost-performance ratio, 12 parallel processing as describing, 14 multimicroprocessing as describing,14 multiprogramming as describing, 14 multitasking as describing, 14 semantic gap, 23

von Neumann architecture bandwidth, 13 bottlenecking, 8, 12, 13 control-flow architecture, 8, 13, 129 control unit, 12-13 data unit, 12-13 decision diamond, 130-131 diagram of, 12 program execution in, 129 word size, 13

See also reduced-instruction-set computer, very-Iong-instructionword computer, writable-instructionset computer

Arithmetic-logic unit, 13, 94

Arithmetic routines, 2 in 8-bit devices, generally, 3 in 8008, 3 in 8080, 4, 5

index 248

Arithmetic routines (continued) sign extension for, 19

Array Ii' f h d . app cation 0 strengt re uctlon method to, 78 of data structure, 22 manipulation of, 27, 29 MIT dynamic data-flow computer, 149 multiplier/accumulator array, 213 page register array, 46-47' processing, 177 programmable logic array, 95 systolic array, 196

Assembly language compatability of 8080/8086 at, 6 effect of assembler on program locality, 49 execution-time tools, 77 in reduced-instruction-set computer, generally, 63-64, 73, 77 instruction set architecture, 10 library procedure, 77 modules, 77 pseudocode (P code), 77

AT&T WE DSP32 arithmetic-logic unit, 210 control arithmetic unit, 210 data arithmetic unit

accumulators,207,210 control register, 207 floating point adder, 207 floating point multiplier, 207 throughput, 210

data type conversion, 210 data width, 211 diagram of, 209 double buffering, 211 instructions, 210 ilo,207,211 memory,207,211 multiply/accumulate operations, 210 pipeline, 210 program counter, 210

registers, 210, 211 32-bit device, 207 uses for, 207

Bandwidth definition of term, 13 effect of von Neumann architecture, 13 in advanced microprocessors, generally 17 in Harvard architecture, 14 in reduced-instruction-set computer, generally, 116, 117

Bank switching, 42

BCD arithmetic, 2, 4, 5

Berkeley RISC absence of floating-point capability, 71 addressing

indexed addressing for direct addressing, 67, 69 indirect addressing, 69 self-relative, 72

e language, 66, 72 cycles per instruction (cpi), 71 diagram of RIse I, 68 diagram of RIse II, 69 fIXed-instruction format, 94 global!local variables, 72 instructions, 66 integer, 71 memory, 67 mentioned, 64, 94 passed parameters, 72 registers

globaVIocal, 70, 72 high!low, 70 overlapping windows, 67, 72-73

see flag, 71 symbolic-processing language, 99 32-bit device, 67

Binary body system buddies, 45-46

list of memory segments, 45, queue, 45

Bit-slice device, 7,152

Bottlenecking diagram of, 36 frequent or sequential memory accessing as cause, 13 in 8080, 5 in von Neumann devices, generally, 8 Occam2, 182-183

Branches branch latency slot, 75 branch target cache in AM29000, 161, 166-167 branch unit of 88100, 82 canceling branches, 75, 76, 77 delayed branches, 39, 75-76, 80, 82 delay slot, 76 mentioned, 37 minimum accesses for, 39-40 nullifying branches, 76 optimizing compiler, 37-38 register mode with 16-bit displacement/immediate, 91 register Rl in 88100, 82 squashing branches, 76 subset of change-of-flow instruction, 39 to close loop, 76 26-branch displacement mode, 93

Buffer, 50

Buses address bus, 93 as delineating internal architecture, 10 bus error as precise exception, 83 data bus width derming word, 20 effect of instruction queue on, 39 in cache-memory system, 50 larger address buses, 42 memory bus, 3

index 249

Cache memoq advantages m cost and speed, 48 as organizational issue, 12 block fetch, 49-50 direct-mapping method, 52-54 fully associative method, 50-55 essential to program locality, 49 external and on-chip option, 48 for page table pointer, 48 interface between processor and RAM,48 mentioned, 9 organization affecting program locality, 49 organization as implementation issue, 12 set-associative method, 12,54-55 size affecting program locality, 49 size of block, 50 word choice, 49 write-back cache, 56-57 write-through cache, 55-56

Central processing unit (CPU) by Intel, 2 by MIPS Computer Systems, Inc., 65 8008 as, 2 4004 as, 2 register file, 66, 67 stack machine, 117 throughput in CRA Y systems, 64 with complex instruction set, 64 See also pipeline

Character handling, 2

Character string, 49

Clones, 6

Code access to segments in 8086, 43 ascending order, 50 protection of, 42 reference to segments, 50 usual execution of, 49

index 250

Comp~ler Bulldog, 119, 120 High C29KANSI C compiler, 174 Occam2 compilers, 185-186 optimizing compiler

activation table, 73 assembly language pseudocode (P code),77 execution-time tools, 77 global optimization, 77 in reduced-instruction-set computers, 65, 73, 74, 76 linker, 77 local optimization, 77 object module, 77 optimization of loop, 38, 77 overflow with register, 80 pipeline, 67, 77 pipeline reorganizer, 101-102 reduction of branch latency, 38, 75-77, 78 reduction of load latency, 74-75 reduction of pipeline latency, 74 redundancy elimination, 38, 77-78 register allocation, 77 register ftle, 67 replacement of memory access, 38 speedup, 38, 77 scheduling of program flow, 38

orthogonality as aid to, 17 percolation scheduling, 122 silicon implementation of addressing modes and instructions, 63-64 trace scheduling, 120-122

Complex-instruction-set computer (CISC)

instruction set for central processing unit, 64 overlapping register windows, 72 procedure calls, 70

Computer Terminal Corporation, 2

Concurrency in control-flow processor, 129 mentioned, 8, 62

on multiple and single processors, 14, 172 simulation of, 182 See also Inmos transputer, Motorola 88100 family

Connection machine, 7

Constants, 58, 95, 96,117,161

Context switches, 49

Control Data Corporation See gallium arsenide rise

Control function, 2

Control program, 2

Control signals, 13, 35, 129

Control unit function of, 13, 35 in reduced-instruction-set computer

hardwired,66,82 layout,64 size, 72

microprogrammed, 9 module in von Neumann architecture,12-13 with complex instruction set, 63

Cost cost-performance ratio in reducedinstruction-set computer, 64,117 decline for microprocessors, generally,l disk storage, 57 effect of block size in cache-memory system, 50 effect of program and system architectures on, 12 in paging scheme, 48 in write-back cache, 57 of memory chips, 4, 57 microcode versus program memories, 116 pipeline latency, 73, 79 with complex instruction set, 63

Cray, Seymore, 64, 65

Data access in instruction queue, 39 big-endian format, 124, 125 cache-memory circuit, 48 dependencies, 39, 111 effect on bandwidth, 13 little-endian format, 124, 125 module in Harvard architecture, 13 overlapping of accesses, 14

Data element, 19

Data path, 9, 63

Datapoint Corporation, 2

Data size, 24

Data stream, 11, 156

Data structure array, 22 behavior of, 49 effect on reduced-instruction-set computer, 67 in 8080, 6 in 8086, 6 record, 22-23

Data types array, 22 as delineating architecture, 10 Boolean, 20, 191 byte, 191 design concern, 19 effect on reduced-instruction-set computer, 67 formats of, 10 function, 19 in 8080, 4 in 68020, 63 integer, 20, 21, 71,105,115,191,210 literal, 22 notation, 20 ordinal, 21 pixe~ 22, 124, 218-219

predefmed, 19 primitive, 22 real operand, 21 record, 22-23, string, 22 user defmed, 19

Data unit, 12, 13

index 251

DDM1 com,Puter commurucation bus, 152 eight-ary tree structure, 151 GPL language, 151-152 processing element (pse), 151, 152 program graph, 151-152 recursively structured, 151 subgraph (task level node), 152 switch element, 151

Defense Advanced Research Projects Agency (DAJUPA),109

Design considerations addressing modes, 19,24 block size for cache-memory system, 50 branch delays, 78 data type, 19 data word size, 13 input/output in 8008, 3 instruction set, 19, 24 instruction word length, 13 load latency, 37 memory management, 42 operand size, 13 page size, 47 pipeline depth, 41 pipeline latency, 37, 73-74 pipeline timing hazard, 37 reduced-instruction-set computer, generall~8,23,37,63-63,66 semantic gap, 19 write-through cache, 56

Digital signal processing mentioned, 8 types, 201 uses for, 201

index 252

Digital signal processing (continued) See also AT&T WE DSP32, INMOSA100 and All0, Texas InstrumentsTMS320 family

Direct mapping, 12, 52

Disk storage cost advantage, 57 in virtual memory system, 57, 60 of inactive programs, 45

DOS, 174

~amic-address-translation facility,

EDFG dynamic data-flow computer bit -slice version, 152 data formats, 152 instruction formats, 152 tagging scheme, 152

Effective address, 28, 31 calculation of, 35 combinations of, 26 defInition of, 24 double, 24 mode bits, 24 register bits, 24 single, 24, 26 table of combinations, 26

Ethemet,l64

Exception condition, 61

Execution unit, 10, 59

Extension word, 24, 31

Fault, 66

Flags aUxiliary carry flag in 8080, 5 before branch, 78 carry flag in 8008, 3 immediate mode flag in RISC, 71 in 8080, 4, 5 in 8085, 5

in 8086, 6 overflow flag in 8080, 5 parity flag in 8008, 3 page table entry, 61 sign flag in 8008, 3 SSC flag in RISC I and II, 71 zero flag in 8008, 3

Floating-point capabilities absence in RISC I and II, 71 complexity, 115 coprocessor by MIPS Computer Systems, Inc., 65 in 88100, 80, 81 in gallium arsenide risc, 107, 110, 113-114,115 in Transputer, 71 numbers, 21, 105

Frames, 58, 61

Gallium arsenide (GaAs) risc arithmetic-logic unit, 112-113 arithmetic unit, 114, 115 buses, 110, 112 cache memory, 111 central memory control board, 111 central memory, 111 central processor, 109, 110, 111, 112, 115 concurrency, 115 control interface, 112 data cache, 110-111 data dependencies, 111 data paths, 110, 112, 114 delay slot, 112 diagram of, 110 exponent control, 114 floating-point coprocessor, 109, 110, 113-114,115 gate count, 109 i/o processor, 111 instrucions, 110

branch instructions, 112 cache, 110-111 control-transfer instructions, 113 memory instructions, 113

no-operation instruction, 112, 115 register-to-register instructions, 111 word addressed, 112

interlocks shifter, 115 interrupts, 110 memory-mangement unit, 109, 110, 115 operands,110, 112, 115 paged and segmented virtual memory system, 110 pipeline, 110, 111,113 process status register, 114 program counter, 112 program status register, 112 real and virtual address space, 110 register ftle, 112, 113, 114, 115 shift unit, 114 speed, 110 translator/reorganizer, 115

Gates, 10, 109

Generated code, 38, 39, 50

Graphics liiDit on direct addressing in 8-bit device, 41 pixel, 22, 124 See also AM29000, Intel i860, Texas Instruments TMS34010

Hard disk controller, 59

Hardware as delineating architecture, 11-12 instruction set architecture, 10 in reduced-instruction-set computer, generally,23,66 in writable-instruction-set computer, 117 interlock, 101

Hewlett-Packard Spectrum family, 64, 102

High-level languages application architecture, 10

array, 22 availability of, 9 C language, 10, 72

index 253

compilation as assembly language pseudocode (P code), 77 features affecting reduced-instruction-set computers, 66-67 Fortran, 10, 143 GPL language, 151-152 improvement of, 17-18 mentioned, 61-62 Occam2

algorithm, 194-196 arrays, 189, 191, 196 buffer process, 184-185 channels, 184,187,188,189, 196 compiler, 185-186, 191 constructs

ALT, 187-188, 193 IF, 187 PAR, 14, 186, 188, 193, 196 SEQ, 14, 186

design for multiple processors, 182 distributed implementation, 182, 197 guards, 187-188, 197 hierarchical decomposition for bottlenecking, 182-183, 186 inputs, 184, 186, 187 iterative, 184, 186 kernel primitives, 184 loop, 189, 190 multidimensional, 189 nonportable, 182 outputs, 184, 186 pipeline, 194, 196-197 primitive process, 183 procedures, 182 program development, 192-194 recursiveness, 191-192 replicator, 190, 191 time, 190 value parameters, 184, 191 variables, 183, 184, 186, 189, 191

optimizing compiler, 67, 74

index 254

Higb.levellangu~ (continued) optimum number of registers, 80 Parallel ADA, 18 Parallel C, 18, 179 Parallel Fortran, 18, 179 Parallel Pascal, 18, 179 Pascal, 10, 72, 182 pipeline, 67, 74 record, 22-23 reduced-instruction-set computer, 8, 66-67,72,77 register file, 67 semantic gap, 67 support for addressing modes, 24 support for instruction sets, 24 up-level local variables, 73 VAL data-flow language, 148 writable-instruction-set computer, 117

Hypercube, 7

IBM 801,64 PCRT,64 PCtx:r/AT, l71 personal computers, 6 ROMP, 64 very-Iong-instruction-word com-puter,119

Implementation, 11-12, 182, 197

INMOS Limited AI0l digital signal processQr

cascading, 211, 214 co~gurable,211-212 diagram of, 213 digital transversal filter, 211, 212 example of system, 214 interface, 213 multipliers, 211 registers, 213

A110 array of multiply accumulator, 214, 215,217 configurable, 211-212

diagram of, 215 example of system, 217 image processing, 211, 215 microprocessor interface, 215 postprocessing unit, 214-215 shift register, 215, 216 transversal filters, 215, 217

transputers BOO3 boards

arrays of channels, 233 butterfly topology, 234, 238-239 channels as parameters, 233-234 cube-connected topology, 234, 240-241 index, 233 link address, 233 machine identifier, 233 mapping array, 233 mapping index, 233 mesh topology, 234, 236-237 ring topology, 232, 233,234, 235

buses, 179 channels, 179 CMOS technology, 180 configuration as array or network, 179 diagram of, 180 floating-point capabilities, 71, 180 graphics, 180 high-level languages, 2, 14, 179 i/o communication links, 180 memory interface, 180 mentioned, 8-9, 16, 127, 129, 211, 241,242 order of, 15 performance, generally, 1 process, 14-15

definition, 179, 180 descheduling points, 181-182 execution, 180 linked process list, 181 microcoded scheduler, 180-181

RAM,l80 registers for workspace, 181 robot arm, 141

16- to 32-bit devices T414, 179 T8OO,179,182

software kernel unnecessary, 181 von Neumann architecture, 179 workspaces, 181

Input/output (i/o) direct, 3 in 8008, 3 in 8080, 5 in 8086,6 in virtual memory system, 61 memory-mapped, 3 monitoring, 2 status of, 19

Instruction format, 10, 66

Instructions branch instructions, 66, 72, 75-77, 78, 82, 91, 93 change-of-flow instruction, 39, 79-80 control signal, 13 data handling instructions, 88 definition by Flynn, 23 diagram of flow, 35 effect on bandwidth, 13 execution in registers, 80 feed-forward information technique, 82 floating-point instructions, 81 format, 23, 63 independent instruction, 75 in 68020, 63 integer-multiply instructions, 81 in VAX 11!78O, 63 length, 23, 66 lJJAJD, 37, 64,66, 71, 74, 75-76,80 mentioned, 58 module in Harvard architecture, 13 next instruction location, 23 no-operation (NOP) instruction, 75, 76 operand, 19, 23 operation, 23 orthogonality with addressing modes,

index 255

17 orthogonality of data registers with, 17 overlapping of accesses, 14 pointers in 88100, 84 prefetch unit, 118 program-control instructions, 87 queue, 39, 118 reduced-instruction-set architecture, 8,63-64,66 STORE, 37, 64, 66, 71, 80 word size, 14

Instruction sets adaptation to data path, 63 as delineating architecture, 10, 12 branch delays, 78 complexity, 63 controversy about size of, 9 defmition of, 23 design concern, 19,78 functional classes, 63 mentioned,61-62 of 8008, 2, 3, 4 of8080,4, 5,6 of 8085, 5 of 8086, 6 of4004,2 reduced-instruction-set computers, 63-64,72 theories about, 23

Instruction stream, 10

Integrapb Clipper Fairchild Semiconductors, 105 compromise between complex- and reduced-instruction set computer architectures, 105 data cache/memory-management unit, 108 data types, 105 diagrams of, 108, 109 instruction cache/memory-management unit, 108 instruction set, 105 instructions

index 256

Integraph Clipper (continued) data-handling mstructions, 105 format of, 105, 106 hardwired, 105 macroinstructions, 105 memory reference instructions, 105 operands, 105

integer/floating-point processor, 108 memory addressing modes, 105, 107 mentioned, 16, 64, 116 registers, 105 virtual address, 107

Integrated circuits, 1, 64

Intel Coryoration advertisement of microprocessor, 1 8008

accumulator, 3 addressing modes, 3 address stack, 3 architecture, 2, 3 arithmetic operations in, 3 as complete central processing unit, 2 compatability constraints on, 2 data handling, 3 description of, 2 development for Datapoint, 2 experimental use of NMOS technology with, 3-4 flags, 3 input/output, 3 instructions

accumulator-specific instruction,3 i/o instructions, 3 processor-control instructions, 3 scratchpad-register instruction, 3 transfer of control instruction, 3

instruction set, 2, 3, 4 interrupts, 3 memory space, 3

mentioned, 1, 2, 5 pins, 3 PMOS technology in, 2 register organization, 2, 3 scratchpad, 3, 4 stack, 3 stack pointer, 5

8080 accumulator, 4, 5 addressing modes, 4 arithmetic operations, 4, 5 assembly language, 6 bottlenecking, 5 comparison with 8085, 5 compatability of 8086 to, 6 compatability of Z80 to, 7 data structure, 6 data types, 4 flags, 4, 5 input/output, 5 instructions, 4, 5, 6

call instruction, 5 i/o instructions, 5 return instruction, 5

interrupts, 4 machine code, 5 memory accessing, 4 memory pointer, 4 mentioned, 1 opcodes, 4, 5 oscillator chip, 5 pins, 4 positive indexing, 5 power supplies, 5 program counter, 4, 5 pushdown stack, 4-5 read-only memory, 4 registers, 4, 5, 6 software, 6 stack pointer, 4, 5 symmetry, 4 system controller chip, 5 throughput, 4

8085 address latch enable, 93

data structure, 6 flags, 5 instructions

reset interrupt mask instruction, 5 set interrupt mask instruction, 5

instruction set, 5, 6 interrupts, 5 machine code, 5 mentioned, 1 oscillator, 6 power supply, 6 registers, 5 similarity to 8080 organization, 5

8OX86 family 8086

compatability with 8080, 6 data structure, 6 flags, 6 input/output, 6 instruction pointer, 6 instructions, 6 control-transfer instructions, 6 memory accessing, 6 mentioned, 1, 7 organization, 6 registers, 6 segmentation, 43-50 segmented addressing, 6 software of 8080, 6

8088,6 80286,1,6 80386,1,43 80486,1 segmentation

base address, 43 code segment, 43 data segment, 43 extra segment, 43 generation of physical address, 43-44 general limit on size, 43 modulo 64K addressing, 43 stack segment, 43

formation, 1

432,8 4004,2,17 i860

index 257

big- and little-endian formats, 124-125 bus, 124 bus and cache control unit, 123 concurrency, 129 core execution unit, 123 cycles per instruction, 124 data cache, 123 diagram of, 123 floating-point adder unit, 123, 124, 129 floating-point control unit, 123, 124,129 floating-point multiplier unit, 123, 124,126,129 graphics unit, 123, 124 instruction cache, 123 instructions

bit instructions, 123 control-transfer instructions, 123 double-instruction execution, 122,124,126 integer, 123 load, 123-124 store, 123-124

integer data, 123 mentioned, 1, 8, 16, 17 multiplier, 124 operands, 22 paged, virtual memory, 124, 126 paging unit, 123 pipeline, 124 pixel, 22, 124 register, 123, 124, 125, 126

load-control register, 124 store-control register, 124 vector-integer, 124

word size, 124 i960, 1, 8, 16 study of LSI for control function, 2

Interlock, 37, 101

index 258

Job,16

Jumps, 49

LAU processor control unit

~ta-co!ltrol unit (dcu), 145 mstructlon-control unit (icu) 145 146 ' ,

data-flow graphs, 145 data structure, 145 diagram of, 145 execution unit, 146 instruction format

control part, 145, 146 operation part, 145, 146

instruction queue, 146 memory unit, 146 mentioned, 141 nodes, 145, 146 prototype,l46 single-assignment language, 145

LISP, 8, 99,148

Ust, 50

Loop branch to close, 76 cache-memory size, 50 in code execution, 49 in dynamic data-flow computer, 141 nested loops, 120 optimizing compiler, 38,120

LSI chips, 2

Macroinstructions, 70, 105

Manchester data-Dow computer arcs,147 compiler, 148 data format, 148 diagram of, 147 instruction format, 148 matching unit, 147-148 node store, 148 pipeline, 148

processing unit, 148 prototype, 148 race problem, 147 result packet, 147 ring topology, 147 SISAULlanguage, 148 tagged tokens, 147 token queue, 147 transistor-transistor-Iogic, 148

Memory accessing

bottlenecking in von Neumann architecture, 13 for write reference, 56 ~eneral-purpose registers, 164 m advanced microprocessors, generally, 9 in Cray systems, 64 in 8080, 4, 5 in 8086, 6 in 88100, 81 in 68020, 7 reduced-instruction-set architecture, 8, 66, 73 reducti?n by compiler, 38

as part of mternal architecture 10 bank switching to increase physical memory,42 base address for segment, 43 bus, 3 direct addressing in 8-bit device 42 logical and physical address .:reas 41-42 ' memory management, 9 41-42

binary body system, 42 cache memory, 9, 48-49, 62

block fetch, 49-50 direct mapping cache 12 52-54 fully associative cach~, sO-55 set-associative cache, 12 54-55 write-back cache, 56-57' write-through cache, 55-56

paging, 45, 62 segmentation, 42-45, 62 virtual memory, 57-61, 62, 116

memory pointer, 4 shared in multimicroprocessing system, 14, 15 space, 3, 13 volatile, 57

Microcontroller, 8

~cromachine,41,101

Minicomputers, 3, 63

MIPS Computer Systems Inc., 65, 102 See also Stanford MIPS

MIT dynamic data-now computer arithmetic-logic unit, 150 arrays, 149 data format, 150 data memory, 149 diagram of, 149 emulated model, 148 enabled instruction queue, 150 host computer, 150 input (token receiving) module, 150 instruction format, 150 nodes, 148 operands, 150 output module, 150 processing elements, 149, 150 program memory, 149 routing network, 150 switching network, 149 tokens, 150 waiting-matching store, 149-150

MIT static data-now computer data packet format, 143 diagram of, 142 instructions, 143 mentioned, 141 tokens,142

Motorola Corporation 8-bit devices, 7 88100 family, 16

addressing modes control register mode, 93

index 259

register indirect with index mode, 90 register indirect with scaled index, 90 register indirect with zero-extended immediate, 88-90 register mode with nine-bit vector table index, 91 register mode with 16-bit displacement/immediate, 91 register mode with 16-bit immediate addressing, 88 register mode with 10-bit immediate addressing, 88 register mode with 26-branch displacement, 91, 93 triadic method, 88

arithmetic-logic units, 81, 82 buses, 81,82,83,89,90, 93

arbitration of, 82 data P bus, 84 external operation, 84 instruction bus, 83 paralle~ 81, 82

calculation unit, 84 concurrency, 78, 82, 83 control signals, 82-82 data unit, SO, 82, 84 dedicated circuits for speed, 78 delayed branches, SO diagram of, 79 exception handling, 82, 83

arbitration, 82 exception-time register, 86 imprecise and precise, 83 prioritizing of register writes from, 82 register mode with 9-bit vector index, 91 shadow register, 86 supervisor mode, 84

execution-instruction pointer (xip), 84,91,93 execution units, 80, 81, 82, 83 fault-tolerant applications, 84-86

index 260

Motorola Corporation (continued) feed-forward information technique, 82 fetch instruction pointer, 84, 91, 93 floating-point unit, SO, 81, 82, 83, 84,86,93 instruction unit, SO, 82-84 instructions

addition instruction, 81 arithmetic instructions, generally,88 bit-field instructions, 82, 88 branch instructions, 93 compare instruction, 87 control-register instruction, 81 conversion instruction, 81 data memory access instructions,87-88 division instructions, 87 fields of, 88 flow-control instructions, 87 length of, 88 load instruction, 89 logic instructions, 81, 82, 88 mentioned, 66 multiplication instructions, 81 number of, SO register-to-register instructions, 87,88 store instruction, 89-90 subtraction instruction, 81 trap instruction, 91 .

integer unit, SO, 81-82, 84, 86 load/store architecture, 80 memory-management unit, 84 multiple pipelines, 78, 79-80, 84 multipliers, 78 next instruction pointer (nip), 84 ports, 82 predicates for condition testing, 93 program initialization, 82 register me/sequencer, 80, 81, 82-83 registers, 80-81, 82, 83, 84, 85, 86, 88

condition-code register absent, 86 exception-time register, 86 shadow register, 86 stack pointer absent, 86

supervisor mode, 84 user mode, 84

16-bit devices, 7 68XXX family

addressing modes absolute addressing, 24, 31 address register direct, 27 address register indirect, 27-28 address register indirect with displacement, 29, 30 address register indirect with index and displacement, 30 address indirect with postincrement,28 address register indirect with predecrement, 28-29 array manipulation with memory address mode, 27, 29 block movement with memory address mode, 27, 28 branch, 32 data register direct, 27 destination operand, 33 effective address, 24, 26, 31 extension word, 24, 31, 32 immediate addressing, 24, 31, 32 inherent mode, 33 input/output, 29 memory address modes, 27-30 mode bits of effective address, 24 opcode,24 operand specification, 24, 26, 27,28,29,30,31,33 operation word, 33 orthogonality of, 17 position-independent program, 32 program counter with displacement,32

program counter with index, 32 reference pointer with memory address mode, 27, 28 register bits of effective address, 24 sequential data, 28, 29 stack pointer, 28 stacking operations with memory address mode, 27, 28-29 table of, 25 tables, 28 variables, 29 word boundary, 28

address strobe, 93. complex instruction set, 63 memory accessing, 7 mentioned,7, 16, 63 operands, 7 pipeline in 68020, 39-40 registers, 7

32-bit devices, 7

Multimicroprocessing systems Connection machine, 7 definition of term, 14 diagram of, 15 general usage of term, 14 Hypercube, 7 mentioned, 7, 12, 16, 242 See also Motorola 68XXX family, Topology

Multiprocessing average distance, 222-223 cost advantages, 221-222 deadlock, 221 definition of term, 221 expansion, 224 fault tolerance, 221, 222, 224 flexibility, 221, 222 memory management, 221 Motorola 68XXX family, 33 program development, 192 normalized average distance, 223

reliability, 221 starvation, 221 throughput, 221

MultiproJP1lmming defInition of term, 16

index 261

dynamically reallocable program, 44-45 general usage of term, 14 ready queue, 16

Multitasking dynamically reallocable program, 44-45 general usage of term, 14 register banking, 162

NECPD7281 address generatorlflow controller, 153 buffers

function table, 152 link table, 152, 155 queue, 152

diagram of, 153 digital signal processing applications, 152, 156 identifier field, 155 input controller, 152 output controller, 152 output queue, 152-153 processing unit, 152 Ram, 153 refresh controller, 153 ring pipeline, 152,155 tokens, 152,153,154,155

NMOS, 3-4

Octal notation, 52

Opcodes decoding, 35 effective address, 24 excessive registers, 80-81 in 8085, 5 packing, 117

index 262

Operands data unit, 13 destination operand, 33 effective address, 24, 35 fetching for READ, 35 in Harvard architecture, 13 in 68XXX family, 7, 24, 26, 27, 31, 33 in von Neumann architecture, 14 referencing in advanced microprocessors, 19-20 sign extension, 19 specification of, 24, 26, 27, 28, 29,30, 33 storing for WRITE, 35

Operating system disk operating system, 41 during multiprogramming, 16 mentioned, 10, 17 priorities in user address space, 42 resource protection, 42 UNIX, 10, 174 use of paging scheme, 46, 47

Operation word, 24, 33

Organization, 10, 11-12

Orthogonality, 17,63-64, %

OS/2,41

Overlapping in Harvard architecture, 14 for increased speed, 9 pipeline timing hazard, 37 register windows, 67, 72

Page table, 59-60

Pagins aVOidance of external fragmentation, 46 extension and remapping of address range,46 internal fragmentation, 47, 48 lookup table, 46 mentioned, 45 page faults, 47

page register, 46, 47-48 page register array, 46-47 page size, 46, 48 page table pointer, 48 time, 47 use of larger size, 48

Parallel })rocessing general usage of term, 14 mentioned, 16, 17 pipelining as, 34 transputer, 8-9

Performance as delineating architecture, 11-12 condition-code register as hinderance, 86 effect of clock speed, 34 in reduced-instruction-set computer, 23,64 in set -associative cache, 54, 55 number of registers, 80 parallel buses, 81 stack pointer as hinderance, 86 with write-back cache, 57

Petri nets, 130

Pins, 1,3,4

Pipeline bottlenecking, 8 branches, 37,38,39, 75-79,80 branch latency slot, 76 change-of-flow instruction, 39, 79 contribution to advanced microprocessors, 17 delay slot, 74-75, 77 depth, 37, 39-41 diagram of, 34 digital computer, 34 distribution in, 36 division instructions in 88100, 81 form of parallel processing, 34 in Cray systems, 64 in MIPS, 65 increased speed due to, 9, 62

interlock, 37, 65, 74, 101 load latency, 37, 74-75 optimizing compiler, 67, 74 pipeline latency, 37, 73, n, 79, 80 pipeline stalling, 76 poor design, 79 reduced-instruction-set computers, generally, 73-74 superiority to instruction queue, 39 synchronization, 36 timing hazard, 37, 74 ~tstate,37, 74, 75, 76 See also Motorola 88100 and 68XXX families

PMOS technology, 2, 3

Power supply,S, 6, 57

~ffication during processing, 16 contents of, 58 disk storage of, 45, 58 dynamically reallocable, 44 . execution in von Neumann deVIce, 129 initialization in 88100, 82 page and segment tables during execution of, 60 position independent, 44 program locality, 49-50 sequencing, 81 task, 16,42 usual reference to buffers, code segments, lists, stacks, subroutines, 50

~counter in8080,4, 5 in 68020, 7 in von Neumann device, 129 program counter relative address, 72 sequencing function, 129 updating, 35

Program flow, 38-39

Programming capabilities, 14

index 263

Pyramid 9OX, 64

Random-access memory (RAM), 1-2, 48,59,180

Read-only memory (ROM), 1-2, 4

Ready queue, 16

Reduced-instruction-set computer (RISe)

addressing modes, 65, 66 architecture, 66 caches, 65 clocks per instruction (CPI), 66, 73-74,116 compiler, 66,73-74, 116-117 condition codes, 101 control unit, 66, 116 coprocessors, 65 debate on, 23 execution time, 66 evolution, 64-66 execution time, 8 floating-point pperations, 65, 71 global registers, 72 hardwired sequencer in 88100, 82 higb-Ievellanguages

features affecting, 66-67 language-directed, 66, 72 see also compilers

instruction format, 66, 116 instructions

complex instructions as macroinstruction or subroutine, 70 fIXed length, 66 for memory accessing, 66 mentioned, 8 number of, 65, 66 semantic content, 117

instruction set, 72, 116 load latency, 37 load/store architecture, 116 local variable, 72 memory accessing, 8 memory management, 65 nesting of procedures, 70-71

index 264

Reduced-instruction-set computer (RISC) (continued)

passed parameters, 72 philosophy of, 63-64, 72 pipeline latencies, 73 procedure calls, 70 register fIles, 65, 66, 72 register-to-register operations, 66, 71 registers, 9, 90

globaVlocal, 70 highllow, 70 stack for overflow, 70-71 window-based architecture, 67, 70, 72 trap, 71

Registers addressing, 24 as delineating instruction set architecture, 10 as delineating program architecture, 12 CPU register operations in Cray systerns, 64 effect of dedicated registers on bottlenecking, 5 effect on performance, 80 electrical and structural limits, 80 general-purpose registers, 9, 164 in advanced microprocessors, generally,17 in control unit, 13 in data unit, 13 in 8008, 2, 3 in 8080, 4, 5, 6 in 8085, 5 in 8086, 6 in 68020 optimum n~ber of, 80. . register fIle m reduced-mstructtonset computers, 66, 67, 72, 80 register-to-register operations, 66, 71 subfield of effective address, 24 table-origin register, 59 use by optimizing compiler, 38 user-visible registers, 10

Ridge 32, 64

Segment table, 59-60

Semantic gap . coupling of pipeline, regISter fIles, and optimizing compiler, 67 defmition of, 23 design concern, 19

Slots, 58, 61

Small talk, 99

Software bank switching, 42 branches, 76 kernel-type, 84 segmentation, 43

Speed affecting architecture, 12 cache-memory circuit for interface, 48 cycles-per-instruction,73 experiments with NMOS technology, 3-4 hardwired control unit, 66 loss with virtual memory system, 57-58 loss with write-through cache, 56 operand decoding, 117 optimum number of registers, 80 reduced-instruction-set computer, 64,66 techniques for increasing, 9, 62 system clock, 9 unique features of 88100, 78, 93

Spreadsheets, 41

Stack definition of, 3 during code execution, 49 in 8008, 3 in 8080, 4-5 in 68XXX family, 27, 28-29 in writable-instruction-set computer, 117

manipulation for memory accessing, 66 pushdown stack, 2, 4-5 stack overflow handler, 73 stack pointer, 4, 5, 86 usual reference to, 50

Stanford MIPS allocation of transistors, 99 arithmetic-logic unit, 99,101 barrel shifter, 99 buses, 99-100 cache on chip, 99 compiled code, 99 condition codes absent, 101 diagram of, 100 exceptions, 100 instruction cache, 65 instructions

compare instruction, 101 control-flow instructions, 101 jumps, 101 length of, 100 load instruction, 101 procedural linkage instructions, 101 register-to-register instructions, 101

instruction set compiler-based encoding of micromachine,l00

instruction word, interfaces, 100 interrupts, 100 pipeline, 65, 100 pipeline reorganizer for interlocks 101 ' program counter unit, 100 registers, 99, 100 speed, 99 32-bit device, 65 word-addressing device, 101

Subroudnes activation table, 73 effect of linking on reduced-instruc-

tion-set computer, 66 nested, 4

index 265

register Rl in 88100, 82 register window, 67, 73 return address, 117 sharing of, 203 treatment of complex instruction as 70 ' usual reference to, 50

Symbolics Corporadon, 148

Symbol table, 49

System clock, 9,17,34

Tables, 58

Tags, 57

Task, 16, 42, 151

Technological improvements, 8

Texas Instruments custom integrated circuits, 2 data-driven processor (DDP)

arithmetic-logic unit, 143 buses

E-bus interconnection network 143 ' maintenance bus, 143-144

diagram of, 144 Fortran, 143 host processor, 143 instruction format, 144 mentioned,141 nodes, 143 packets, 143073 pending instruction queue, 143 prototype, 143

microprocessor with LISP, 8 TMS320 family

accumulator, 201 addressing modes, 201-202 bit extraction, 202 interrupt, 202 instruction set

index 266

Texas Instruments (continued) Boolean instruction, 202 branch instruction, 201 general purpose, 201 special instructions for digital signal processing, 201

lack of floating-point multiplier/adder, 210 list of, 202 modified Harvard architecture, 202 pipeline, 201 software, 202 throughput, 201, 205, 207 TMS34010

diagram of, 218 graphics, 217,218 instruction set, 218-219 special hardware, 218 32-bit device, 217

uses for, 217-218 TMS32OC10,203 TMS32011

central processing unit, 203, 205 external address bus absent, 203 ports, 203 pulse code modulation companding function, 203 ROM, 203 timer, 203

TMS32010 arithmetic-logic unit, 203 auxiliary register, 203 barrel shifter, 203 diagram of, 204 masked, programmed ROM, 203 multiplier, 203 off-chip program memory, 203

TMS32010-25, 205 TMS32020

arithmetic unit, 205 auxiliary register, 205 diagram of, 206 floating-point operations, 207

global data memory interface, 207 i/o,205 instructions, 205, 207 memory, 205 ports, 205 software compatability, 207 throughput, 205

TMS320C35 auxiliary register, 207 diagram of, 208 ROM, 207 stack, 207 throughput, 207

See also gallium arsenide risc

Throughput calculation of, 35-36 design goal for 8080, 4 feed-forward information technique, 82 improvement of, generally, 1 improvement with pipelining, 35-36 multiprocessing, 221 reduced-instruction-set computer, generally,64

Topology bus-oriented topologies

beta topology, 231-232 butterfly topology, 232, 234, 238-239 mesh with wraparound connections in same row or column, 232, 234,236-237 spanning bus hypercube

average distance, 230 buses, 230 diagram of, 231 expansion, 230 mesh structure, 230 nodes, 230

communication links, 223 definition of term, 16 enhancement of m advanced microprocessors, 9

link-oriented networks alpha network

average distance, 228, 230 communication links, 228 diagram of, 227 expansion, 228 fault tolerance, 228 generalized hypercube, 226 nodes, 226,227, 231 ports, 228 routing algorithm, 228

cube-connected cycles average distance, 225 communication links, 225 diagram of, 226 expansion, 226 fault tolerance, 226 interconnection rules, 225 normalized average distance, 225 ports, 225 routing algorithm, 225 with BOO3, 234, 240-241

hypertree average distance, 228 binary tree structure, 228 communication links, 229 diagram of, 229 expansion, 229 fault tolerance, 228 ports, 228 sibling nodes, 228 table of distances, 229, 230

ring topology, 147 average distance, 224 communication links, 225 diagram of, 224 fault tolerance, 225 local-area networks, 224 normalized average distance, 224,225 routing, 224, 225 with BOO3, 232-234

routing algorithm, 223, 225, 228

Transistor, 1,2,148

Trap, 66,71,91

1Jniprocessors,5

VAX microinstructions, 8 8600 mentioned, 1 11/780 mentioned, 63

index 267

Variables, 29 effect on reduced-instruction-set computer, 67 global variables, 70 local variables, 72 storage of, 67-70 up-level local variables, 73

Vectors, 49

Very-large-scale integration (VLSI), 8, 64

Very-Iong-instruction-word computer (VLIW)

arithmetic-logic unit, 119 compilers, 119 concurrency, 122 instruction word, 119 mentioned, 8, 127 percolation scheduling, 122 register file, 119, 123-124 trace scheduling, 120-122

Virtual memory cost, 57 disk storage, 57, 58 displacement, 59 dynamic-address-translation facility, 59 exception condition, 61 frame, 58, 61 hard disk controller, 59 i/o operation, 61 load/store architecture, 116 logical address as virtual address, 57 mapping to random-access memory, 57-58,59 page,58

index 268

Virtual memory (continued) page inlpage out, 61 page number, 59, 61 page table, 59-60, 61 real memory, 58 segment number, 59, 61 segment table, 59,61 slot, 58, 61 speed, 57

VLSI Technology, Inc. mentioned, 16, 102 VL86C010

addressing mode, 105 triadic method, 103 ALVEY, 102

Booth's multiplier, 102 data bus, 102 diagram of, 103

instructions, 66 block data-transfer instructions, 103,104 branch instructions, 103, 104 condition execution field, 103 data processing, 103 data transfer instructions, 103 software interrupt, 103, 104 speed, 105

load/store architecture, 102 mentioned, 16 program counter, 104 queues, 105 registers, 102

allocation of dedicated and general-purpose, 103 block movement, 104 diagram of, 103 overlapping, 102 program counter/processor status register, 103, 104 stacks, 105

speed, 102-03

Word as default data length, 20 choice

in block fetch, 49-50 in cache-memory system, 49 in direct mapping system, 53 in fully associative method, 50-51

size in advanced microprocessors, generally, 61-62 in Harvard and von Neumann architectures, 14 in writable-instruction-set computers,117

Work areas, 58

Workstations ~29000,159,177 MIPS Computer Systems, Inc., 65 Sun SPARC, 64 TMS34010, 217-218

Writable-instruction-set computer (WISC)

arithmetic-logic unit, 117, 118 clock-memory reference cycle, 117 condition-code testing, 119 data bus, 118 data path, 117 diagram of, 118 i/o, 118 instructions

access time, 119 decoding path, 118 delayed branches, 119 fixed-length format, 116 instruction set, 116

memory bandwidth, 116, 117, 118 memory references, 117 microcoded processor, 116, 117, 118 pipeline, 119 program counter, 117 program memory, 118, 119 ~,117 registers, 117, 118 stacks, 117, 118, 119

Zilog Z80, 7

Documents

INDEX [link.springer.com]978-1-4615-3930-8/1.pdf · Index ACORNAMR See VLSI VL86COI0 Addresses address field, 52, 53, 55, 56 address path, 9 ... current window pointer, 98-99 data