Upload
dangthuan
View
215
Download
0
Embed Size (px)
Citation preview
INDEX
Index
ACORNAMR See VLSI VL86COI0
Addresses address field, 52, 53, 55, 56 address path, 9 breakdown with octal notation, 52 disk controller translator, 59 dynamic-address-translation facility, 59 index field, 52, 53, 54 logical address, 43, 57 physical address, 43 virtual address, 57
Addressing modes as delitieating instruction set architecture, 10 as delineating program architecture, 12 complexity, 24, 61-62, 63 defmed by mode bits, 24 design concern, 19,24 extension word, 24 high-level language support, 24 in advanced microprocessors, generally, 9, 23-24 in reduced-instruction-set computers, generally, 63,66 substituting indexed addressing for direct addressing, 66 See also Motorola 68XXX family
Advanced Micro Devices, Inc. AMD2900, 152 AM29~,93-94 AM29334,94 AM29337,94 AM29332,94 development tools for AM29000
ADAPT29K, 173
architecture simulator, 174-177 ASM29K, 174 High C29KANSI C compiler, 174 PCEB29K, 171-173 XRA Y29K, 174
See also AM29000
Algorithm affecting program locality, 49 for calculating factorial, 136, 138, 139 for digital-signal processing, 201 for hierarchical decomposition in Occam2, 183, 194 granularity, 17 numerical algorithm in data-flow computer, 156 routing algorithm, 223, 224, 225, 228
Altered bit, 56
AM29000 (AMD RISC) addressing
base-plus-offset, 162 burst mode accesses, 164 relative addressing, 162
arithmetic operations, 170 array processing, 177 bounds checker, 94,99 branch target cache (btc), 161, 166-167 clock cycles per instruction, 161 CMOS technology, 159 constant generator, 95 constants, 95, 96, 161 control signals, 95 control unit, 95 current window pointer, 98-99 data flow, 160 delayed branching, 94 development tools, 171-177
245
index 246
AM29000 (continued) embedded applications, 159 exceptions, 94 execution unit, 159, 161
address unit, 169 arithmetic-logic unit, 94, 95, 96, 169,170
barrelshlller, 94, 96 fiwnnelshlller, 94, 96
decode circuit, 161 register ftle, 95, 96, 169
absolute register number, 162 general-purpose registers, 161, 162,163 global registers, 161, 162 local registers, 161, 162 register banks, 162 special-purpose registers, 161, 162, 163, 164, 165 updating dynamically, 164
external instruction memory, 161 graphics, 177 instruction prefetch buffer (ipb), 161, 166 instructions
branch target instructions, 166, 169 call instruction, 94 fixed-instruction format, 94 immediate instructions, 169 instruction set, 95 number of, 159 operands, 161,170 orthogonal, 96 prefetching, 165
instruction fetch unit, 159, 161, 167-169 interrupts, 166 logic operations, 170 memory management unit, 159, 161
least-recently-used hardware, 171 pipelined address translation, 171 software db reload, 171 translation look-aside buffer (tlb), 171 . virtual memory, 171
mentioned, 16, 93-94 mips, 159 multiplexer, 94, 95 nested procedures, 162 operating system values, 162 overlapping register windows, 96, 97 passing parameters, 162 pipeline, 41,94, 161, 169,171 program counter unit, 94, 167
program counter, 167, 168 program counter buffer, 168 program counter multiplex, 167 return address latch, 167, 169
programmable logic array (PLA), 95 relational operations, 170 similarity to Berkeley RISe, 94 stack pointer, 162 task switching, 162 trap, 166 32-bit device, 159 uses for, 177 variables, 162 write-back stage, 161
Application program, 10
Architecture application architecture, 10 as subset of computer system organization, 10 classes of, 11-12 control flow, 8 data-flow architecture
actors, 129, 130, 131, 135 centralized control unit absent, 129 conditionals, 132, 135, 136 concurrency, 129, 156 dynamic data flow, 140-141 interconnection schema, 134 iterations, 132 iterative schema, 135, 137 mentioned, 8, 13 nodes, 140, 141 operands,129,156 programs
apply function, 135-136
arcs, 130 activity store, 137, 139 activity templates, 134, 135, 137, 141 conditional graph, 131, 132, 135 copy function, 135-136 cyclic graph, 131, 132 deadUyembrace, 132, 133, 134 data-driven type, 130, 131 decision diamond, 130-131 demand-driven type, 130 directed graph form, 130, 131 fetch unit, 139 firing rules, 130 instruction, 137 instruction queue, 137, 139, 140 loop graph, 131, 132 merge symbol, 130, 131, 132, 135 operation packet, 137, 139 operation unit, 139-140 program counter absent, 129 race condition, 132-133 result packet, 137, 140, 141 switch symbol, 130, 131, 132,135 tokens, 130, 131, 133, 156 update unit, 140
static data-flow, 140-141 See also DDM1 computer, EDFG data-flow computer, LAU dataflow computer, Manchester dataflow computer, MIT dynamic and static data-flow computers, NEe PD7281, Texas Instrument DDP
deftnition ofterm, 10-11 diagram of, 10 generic architecture, 12 Harvard architecture
bandwidth, 14 data memory as module of, 13 diagram of, 13 instruction memory as module of, 13 mentioned, 12, 202
instruction set architecture, 10 internal architecture, 10
index 247
language-directed in reduced-instruction-set computer, 66 LOAD/STORE architecture, 80 of8008,2,3 of 4004, 2 program architecture
deftnition of term, 12 effect on cost-performance ratio, 12 semantic gap, 23
programming perspective, 10, 12 semantic gap, 67 system architecture
deftnition of term, 12 effect on cost-performance ratio, 12 parallel processing as describing, 14 multimicroprocessing as describing,14 multiprogramming as describing, 14 multitasking as describing, 14 semantic gap, 23
von Neumann architecture bandwidth, 13 bottlenecking, 8, 12, 13 control-flow architecture, 8, 13, 129 control unit, 12-13 data unit, 12-13 decision diamond, 130-131 diagram of, 12 program execution in, 129 word size, 13
See also reduced-instruction-set computer, very-Iong-instructionword computer, writable-instructionset computer
Arithmetic-logic unit, 13, 94
Arithmetic routines, 2 in 8-bit devices, generally, 3 in 8008, 3 in 8080, 4, 5
index 248
Arithmetic routines (continued) sign extension for, 19
Array Ii' f h d . app cation 0 strengt re uctlon method to, 78 of data structure, 22 manipulation of, 27, 29 MIT dynamic data-flow computer, 149 multiplier/accumulator array, 213 page register array, 46-47' processing, 177 programmable logic array, 95 systolic array, 196
Assembly language compatability of 8080/8086 at, 6 effect of assembler on program locality, 49 execution-time tools, 77 in reduced-instruction-set computer, generally, 63-64, 73, 77 instruction set architecture, 10 library procedure, 77 modules, 77 pseudocode (P code), 77
AT&T WE DSP32 arithmetic-logic unit, 210 control arithmetic unit, 210 data arithmetic unit
accumulators,207,210 control register, 207 floating point adder, 207 floating point multiplier, 207 throughput, 210
data type conversion, 210 data width, 211 diagram of, 209 double buffering, 211 instructions, 210 ilo,207,211 memory,207,211 multiply/accumulate operations, 210 pipeline, 210 program counter, 210
registers, 210, 211 32-bit device, 207 uses for, 207
Bandwidth definition of term, 13 effect of von Neumann architecture, 13 in advanced microprocessors, generally 17 in Harvard architecture, 14 in reduced-instruction-set computer, generally, 116, 117
Bank switching, 42
BCD arithmetic, 2, 4, 5
Berkeley RISC absence of floating-point capability, 71 addressing
indexed addressing for direct addressing, 67, 69 indirect addressing, 69 self-relative, 72
e language, 66, 72 cycles per instruction (cpi), 71 diagram of RIse I, 68 diagram of RIse II, 69 fIXed-instruction format, 94 global!local variables, 72 instructions, 66 integer, 71 memory, 67 mentioned, 64, 94 passed parameters, 72 registers
globaVIocal, 70, 72 high!low, 70 overlapping windows, 67, 72-73
see flag, 71 symbolic-processing language, 99 32-bit device, 67
Binary body system buddies, 45-46
list of memory segments, 45, queue, 45
Bit-slice device, 7,152
Bottlenecking diagram of, 36 frequent or sequential memory accessing as cause, 13 in 8080, 5 in von Neumann devices, generally, 8 Occam2, 182-183
Branches branch latency slot, 75 branch target cache in AM29000, 161, 166-167 branch unit of 88100, 82 canceling branches, 75, 76, 77 delayed branches, 39, 75-76, 80, 82 delay slot, 76 mentioned, 37 minimum accesses for, 39-40 nullifying branches, 76 optimizing compiler, 37-38 register mode with 16-bit displacement/immediate, 91 register Rl in 88100, 82 squashing branches, 76 subset of change-of-flow instruction, 39 to close loop, 76 26-branch displacement mode, 93
Buffer, 50
Buses address bus, 93 as delineating internal architecture, 10 bus error as precise exception, 83 data bus width derming word, 20 effect of instruction queue on, 39 in cache-memory system, 50 larger address buses, 42 memory bus, 3
index 249
Cache memoq advantages m cost and speed, 48 as organizational issue, 12 block fetch, 49-50 direct-mapping method, 52-54 fully associative method, 50-55 essential to program locality, 49 external and on-chip option, 48 for page table pointer, 48 interface between processor and RAM,48 mentioned, 9 organization affecting program locality, 49 organization as implementation issue, 12 set-associative method, 12,54-55 size affecting program locality, 49 size of block, 50 word choice, 49 write-back cache, 56-57 write-through cache, 55-56
Central processing unit (CPU) by Intel, 2 by MIPS Computer Systems, Inc., 65 8008 as, 2 4004 as, 2 register file, 66, 67 stack machine, 117 throughput in CRA Y systems, 64 with complex instruction set, 64 See also pipeline
Character handling, 2
Character string, 49
Clones, 6
Code access to segments in 8086, 43 ascending order, 50 protection of, 42 reference to segments, 50 usual execution of, 49
index 250
Comp~ler Bulldog, 119, 120 High C29KANSI C compiler, 174 Occam2 compilers, 185-186 optimizing compiler
activation table, 73 assembly language pseudocode (P code),77 execution-time tools, 77 global optimization, 77 in reduced-instruction-set computers, 65, 73, 74, 76 linker, 77 local optimization, 77 object module, 77 optimization of loop, 38, 77 overflow with register, 80 pipeline, 67, 77 pipeline reorganizer, 101-102 reduction of branch latency, 38, 75-77, 78 reduction of load latency, 74-75 reduction of pipeline latency, 74 redundancy elimination, 38, 77-78 register allocation, 77 register ftle, 67 replacement of memory access, 38 speedup, 38, 77 scheduling of program flow, 38
orthogonality as aid to, 17 percolation scheduling, 122 silicon implementation of addressing modes and instructions, 63-64 trace scheduling, 120-122
Complex-instruction-set computer (CISC)
instruction set for central processing unit, 64 overlapping register windows, 72 procedure calls, 70
Computer Terminal Corporation, 2
Concurrency in control-flow processor, 129 mentioned, 8, 62
on multiple and single processors, 14, 172 simulation of, 182 See also Inmos transputer, Motorola 88100 family
Connection machine, 7
Constants, 58, 95, 96,117,161
Context switches, 49
Control Data Corporation See gallium arsenide rise
Control function, 2
Control program, 2
Control signals, 13, 35, 129
Control unit function of, 13, 35 in reduced-instruction-set computer
hardwired,66,82 layout,64 size, 72
microprogrammed, 9 module in von Neumann architecture,12-13 with complex instruction set, 63
Cost cost-performance ratio in reducedinstruction-set computer, 64,117 decline for microprocessors, generally,l disk storage, 57 effect of block size in cache-memory system, 50 effect of program and system architectures on, 12 in paging scheme, 48 in write-back cache, 57 of memory chips, 4, 57 microcode versus program memories, 116 pipeline latency, 73, 79 with complex instruction set, 63
Cray, Seymore, 64, 65
Data access in instruction queue, 39 big-endian format, 124, 125 cache-memory circuit, 48 dependencies, 39, 111 effect on bandwidth, 13 little-endian format, 124, 125 module in Harvard architecture, 13 overlapping of accesses, 14
Data element, 19
Data path, 9, 63
Datapoint Corporation, 2
Data size, 24
Data stream, 11, 156
Data structure array, 22 behavior of, 49 effect on reduced-instruction-set computer, 67 in 8080, 6 in 8086, 6 record, 22-23
Data types array, 22 as delineating architecture, 10 Boolean, 20, 191 byte, 191 design concern, 19 effect on reduced-instruction-set computer, 67 formats of, 10 function, 19 in 8080, 4 in 68020, 63 integer, 20, 21, 71,105,115,191,210 literal, 22 notation, 20 ordinal, 21 pixe~ 22, 124, 218-219
predefmed, 19 primitive, 22 real operand, 21 record, 22-23, string, 22 user defmed, 19
Data unit, 12, 13
index 251
DDM1 com,Puter commurucation bus, 152 eight-ary tree structure, 151 GPL language, 151-152 processing element (pse), 151, 152 program graph, 151-152 recursively structured, 151 subgraph (task level node), 152 switch element, 151
Defense Advanced Research Projects Agency (DAJUPA),109
Design considerations addressing modes, 19,24 block size for cache-memory system, 50 branch delays, 78 data type, 19 data word size, 13 input/output in 8008, 3 instruction set, 19, 24 instruction word length, 13 load latency, 37 memory management, 42 operand size, 13 page size, 47 pipeline depth, 41 pipeline latency, 37, 73-74 pipeline timing hazard, 37 reduced-instruction-set computer, generall~8,23,37,63-63,66 semantic gap, 19 write-through cache, 56
Digital signal processing mentioned, 8 types, 201 uses for, 201
index 252
Digital signal processing (continued) See also AT&T WE DSP32, INMOSA100 and All0, Texas InstrumentsTMS320 family
Direct mapping, 12, 52
Disk storage cost advantage, 57 in virtual memory system, 57, 60 of inactive programs, 45
DOS, 174
~amic-address-translation facility,
EDFG dynamic data-flow computer bit -slice version, 152 data formats, 152 instruction formats, 152 tagging scheme, 152
Effective address, 28, 31 calculation of, 35 combinations of, 26 defInition of, 24 double, 24 mode bits, 24 register bits, 24 single, 24, 26 table of combinations, 26
Ethemet,l64
Exception condition, 61
Execution unit, 10, 59
Extension word, 24, 31
Fault, 66
Flags aUxiliary carry flag in 8080, 5 before branch, 78 carry flag in 8008, 3 immediate mode flag in RISC, 71 in 8080, 4, 5 in 8085, 5
in 8086, 6 overflow flag in 8080, 5 parity flag in 8008, 3 page table entry, 61 sign flag in 8008, 3 SSC flag in RISC I and II, 71 zero flag in 8008, 3
Floating-point capabilities absence in RISC I and II, 71 complexity, 115 coprocessor by MIPS Computer Systems, Inc., 65 in 88100, 80, 81 in gallium arsenide risc, 107, 110, 113-114,115 in Transputer, 71 numbers, 21, 105
Frames, 58, 61
Gallium arsenide (GaAs) risc arithmetic-logic unit, 112-113 arithmetic unit, 114, 115 buses, 110, 112 cache memory, 111 central memory control board, 111 central memory, 111 central processor, 109, 110, 111, 112, 115 concurrency, 115 control interface, 112 data cache, 110-111 data dependencies, 111 data paths, 110, 112, 114 delay slot, 112 diagram of, 110 exponent control, 114 floating-point coprocessor, 109, 110, 113-114,115 gate count, 109 i/o processor, 111 instrucions, 110
branch instructions, 112 cache, 110-111 control-transfer instructions, 113 memory instructions, 113
no-operation instruction, 112, 115 register-to-register instructions, 111 word addressed, 112
interlocks shifter, 115 interrupts, 110 memory-mangement unit, 109, 110, 115 operands,110, 112, 115 paged and segmented virtual memory system, 110 pipeline, 110, 111,113 process status register, 114 program counter, 112 program status register, 112 real and virtual address space, 110 register ftle, 112, 113, 114, 115 shift unit, 114 speed, 110 translator/reorganizer, 115
Gates, 10, 109
Generated code, 38, 39, 50
Graphics liiDit on direct addressing in 8-bit device, 41 pixel, 22, 124 See also AM29000, Intel i860, Texas Instruments TMS34010
Hard disk controller, 59
Hardware as delineating architecture, 11-12 instruction set architecture, 10 in reduced-instruction-set computer, generally,23,66 in writable-instruction-set computer, 117 interlock, 101
Hewlett-Packard Spectrum family, 64, 102
High-level languages application architecture, 10
array, 22 availability of, 9 C language, 10, 72
index 253
compilation as assembly language pseudocode (P code), 77 features affecting reduced-instruction-set computers, 66-67 Fortran, 10, 143 GPL language, 151-152 improvement of, 17-18 mentioned, 61-62 Occam2
algorithm, 194-196 arrays, 189, 191, 196 buffer process, 184-185 channels, 184,187,188,189, 196 compiler, 185-186, 191 constructs
ALT, 187-188, 193 IF, 187 PAR, 14, 186, 188, 193, 196 SEQ, 14, 186
design for multiple processors, 182 distributed implementation, 182, 197 guards, 187-188, 197 hierarchical decomposition for bottlenecking, 182-183, 186 inputs, 184, 186, 187 iterative, 184, 186 kernel primitives, 184 loop, 189, 190 multidimensional, 189 nonportable, 182 outputs, 184, 186 pipeline, 194, 196-197 primitive process, 183 procedures, 182 program development, 192-194 recursiveness, 191-192 replicator, 190, 191 time, 190 value parameters, 184, 191 variables, 183, 184, 186, 189, 191
optimizing compiler, 67, 74
index 254
Higb.levellangu~ (continued) optimum number of registers, 80 Parallel ADA, 18 Parallel C, 18, 179 Parallel Fortran, 18, 179 Parallel Pascal, 18, 179 Pascal, 10, 72, 182 pipeline, 67, 74 record, 22-23 reduced-instruction-set computer, 8, 66-67,72,77 register file, 67 semantic gap, 67 support for addressing modes, 24 support for instruction sets, 24 up-level local variables, 73 VAL data-flow language, 148 writable-instruction-set computer, 117
Hypercube, 7
IBM 801,64 PCRT,64 PCtx:r/AT, l71 personal computers, 6 ROMP, 64 very-Iong-instruction-word com-puter,119
Implementation, 11-12, 182, 197
INMOS Limited AI0l digital signal processQr
cascading, 211, 214 co~gurable,211-212 diagram of, 213 digital transversal filter, 211, 212 example of system, 214 interface, 213 multipliers, 211 registers, 213
A110 array of multiply accumulator, 214, 215,217 configurable, 211-212
diagram of, 215 example of system, 217 image processing, 211, 215 microprocessor interface, 215 postprocessing unit, 214-215 shift register, 215, 216 transversal filters, 215, 217
transputers BOO3 boards
arrays of channels, 233 butterfly topology, 234, 238-239 channels as parameters, 233-234 cube-connected topology, 234, 240-241 index, 233 link address, 233 machine identifier, 233 mapping array, 233 mapping index, 233 mesh topology, 234, 236-237 ring topology, 232, 233,234, 235
buses, 179 channels, 179 CMOS technology, 180 configuration as array or network, 179 diagram of, 180 floating-point capabilities, 71, 180 graphics, 180 high-level languages, 2, 14, 179 i/o communication links, 180 memory interface, 180 mentioned, 8-9, 16, 127, 129, 211, 241,242 order of, 15 performance, generally, 1 process, 14-15
definition, 179, 180 descheduling points, 181-182 execution, 180 linked process list, 181 microcoded scheduler, 180-181
RAM,l80 registers for workspace, 181 robot arm, 141
16- to 32-bit devices T414, 179 T8OO,179,182
software kernel unnecessary, 181 von Neumann architecture, 179 workspaces, 181
Input/output (i/o) direct, 3 in 8008, 3 in 8080, 5 in 8086,6 in virtual memory system, 61 memory-mapped, 3 monitoring, 2 status of, 19
Instruction format, 10, 66
Instructions branch instructions, 66, 72, 75-77, 78, 82, 91, 93 change-of-flow instruction, 39, 79-80 control signal, 13 data handling instructions, 88 definition by Flynn, 23 diagram of flow, 35 effect on bandwidth, 13 execution in registers, 80 feed-forward information technique, 82 floating-point instructions, 81 format, 23, 63 independent instruction, 75 in 68020, 63 integer-multiply instructions, 81 in VAX 11!78O, 63 length, 23, 66 lJJAJD, 37, 64,66, 71, 74, 75-76,80 mentioned, 58 module in Harvard architecture, 13 next instruction location, 23 no-operation (NOP) instruction, 75, 76 operand, 19, 23 operation, 23 orthogonality with addressing modes,
index 255
17 orthogonality of data registers with, 17 overlapping of accesses, 14 pointers in 88100, 84 prefetch unit, 118 program-control instructions, 87 queue, 39, 118 reduced-instruction-set architecture, 8,63-64,66 STORE, 37, 64, 66, 71, 80 word size, 14
Instruction sets adaptation to data path, 63 as delineating architecture, 10, 12 branch delays, 78 complexity, 63 controversy about size of, 9 defmition of, 23 design concern, 19,78 functional classes, 63 mentioned,61-62 of 8008, 2, 3, 4 of8080,4, 5,6 of 8085, 5 of 8086, 6 of4004,2 reduced-instruction-set computers, 63-64,72 theories about, 23
Instruction stream, 10
Integrapb Clipper Fairchild Semiconductors, 105 compromise between complex- and reduced-instruction set computer architectures, 105 data cache/memory-management unit, 108 data types, 105 diagrams of, 108, 109 instruction cache/memory-management unit, 108 instruction set, 105 instructions
index 256
Integraph Clipper (continued) data-handling mstructions, 105 format of, 105, 106 hardwired, 105 macroinstructions, 105 memory reference instructions, 105 operands, 105
integer/floating-point processor, 108 memory addressing modes, 105, 107 mentioned, 16, 64, 116 registers, 105 virtual address, 107
Integrated circuits, 1, 64
Intel Coryoration advertisement of microprocessor, 1 8008
accumulator, 3 addressing modes, 3 address stack, 3 architecture, 2, 3 arithmetic operations in, 3 as complete central processing unit, 2 compatability constraints on, 2 data handling, 3 description of, 2 development for Datapoint, 2 experimental use of NMOS technology with, 3-4 flags, 3 input/output, 3 instructions
accumulator-specific instruction,3 i/o instructions, 3 processor-control instructions, 3 scratchpad-register instruction, 3 transfer of control instruction, 3
instruction set, 2, 3, 4 interrupts, 3 memory space, 3
mentioned, 1, 2, 5 pins, 3 PMOS technology in, 2 register organization, 2, 3 scratchpad, 3, 4 stack, 3 stack pointer, 5
8080 accumulator, 4, 5 addressing modes, 4 arithmetic operations, 4, 5 assembly language, 6 bottlenecking, 5 comparison with 8085, 5 compatability of 8086 to, 6 compatability of Z80 to, 7 data structure, 6 data types, 4 flags, 4, 5 input/output, 5 instructions, 4, 5, 6
call instruction, 5 i/o instructions, 5 return instruction, 5
interrupts, 4 machine code, 5 memory accessing, 4 memory pointer, 4 mentioned, 1 opcodes, 4, 5 oscillator chip, 5 pins, 4 positive indexing, 5 power supplies, 5 program counter, 4, 5 pushdown stack, 4-5 read-only memory, 4 registers, 4, 5, 6 software, 6 stack pointer, 4, 5 symmetry, 4 system controller chip, 5 throughput, 4
8085 address latch enable, 93
data structure, 6 flags, 5 instructions
reset interrupt mask instruction, 5 set interrupt mask instruction, 5
instruction set, 5, 6 interrupts, 5 machine code, 5 mentioned, 1 oscillator, 6 power supply, 6 registers, 5 similarity to 8080 organization, 5
8OX86 family 8086
compatability with 8080, 6 data structure, 6 flags, 6 input/output, 6 instruction pointer, 6 instructions, 6 control-transfer instructions, 6 memory accessing, 6 mentioned, 1, 7 organization, 6 registers, 6 segmentation, 43-50 segmented addressing, 6 software of 8080, 6
8088,6 80286,1,6 80386,1,43 80486,1 segmentation
base address, 43 code segment, 43 data segment, 43 extra segment, 43 generation of physical address, 43-44 general limit on size, 43 modulo 64K addressing, 43 stack segment, 43
formation, 1
432,8 4004,2,17 i860
index 257
big- and little-endian formats, 124-125 bus, 124 bus and cache control unit, 123 concurrency, 129 core execution unit, 123 cycles per instruction, 124 data cache, 123 diagram of, 123 floating-point adder unit, 123, 124, 129 floating-point control unit, 123, 124,129 floating-point multiplier unit, 123, 124,126,129 graphics unit, 123, 124 instruction cache, 123 instructions
bit instructions, 123 control-transfer instructions, 123 double-instruction execution, 122,124,126 integer, 123 load, 123-124 store, 123-124
integer data, 123 mentioned, 1, 8, 16, 17 multiplier, 124 operands, 22 paged, virtual memory, 124, 126 paging unit, 123 pipeline, 124 pixel, 22, 124 register, 123, 124, 125, 126
load-control register, 124 store-control register, 124 vector-integer, 124
word size, 124 i960, 1, 8, 16 study of LSI for control function, 2
Interlock, 37, 101
index 258
Job,16
Jumps, 49
LAU processor control unit
~ta-co!ltrol unit (dcu), 145 mstructlon-control unit (icu) 145 146 ' ,
data-flow graphs, 145 data structure, 145 diagram of, 145 execution unit, 146 instruction format
control part, 145, 146 operation part, 145, 146
instruction queue, 146 memory unit, 146 mentioned, 141 nodes, 145, 146 prototype,l46 single-assignment language, 145
LISP, 8, 99,148
Ust, 50
Loop branch to close, 76 cache-memory size, 50 in code execution, 49 in dynamic data-flow computer, 141 nested loops, 120 optimizing compiler, 38,120
LSI chips, 2
Macroinstructions, 70, 105
Manchester data-Dow computer arcs,147 compiler, 148 data format, 148 diagram of, 147 instruction format, 148 matching unit, 147-148 node store, 148 pipeline, 148
processing unit, 148 prototype, 148 race problem, 147 result packet, 147 ring topology, 147 SISAULlanguage, 148 tagged tokens, 147 token queue, 147 transistor-transistor-Iogic, 148
Memory accessing
bottlenecking in von Neumann architecture, 13 for write reference, 56 ~eneral-purpose registers, 164 m advanced microprocessors, generally, 9 in Cray systems, 64 in 8080, 4, 5 in 8086, 6 in 88100, 81 in 68020, 7 reduced-instruction-set architecture, 8, 66, 73 reducti?n by compiler, 38
as part of mternal architecture 10 bank switching to increase physical memory,42 base address for segment, 43 bus, 3 direct addressing in 8-bit device 42 logical and physical address .:reas 41-42 ' memory management, 9 41-42
binary body system, 42 cache memory, 9, 48-49, 62
block fetch, 49-50 direct mapping cache 12 52-54 fully associative cach~, sO-55 set-associative cache, 12 54-55 write-back cache, 56-57' write-through cache, 55-56
paging, 45, 62 segmentation, 42-45, 62 virtual memory, 57-61, 62, 116
memory pointer, 4 shared in multimicroprocessing system, 14, 15 space, 3, 13 volatile, 57
Microcontroller, 8
~cromachine,41,101
Minicomputers, 3, 63
MIPS Computer Systems Inc., 65, 102 See also Stanford MIPS
MIT dynamic data-now computer arithmetic-logic unit, 150 arrays, 149 data format, 150 data memory, 149 diagram of, 149 emulated model, 148 enabled instruction queue, 150 host computer, 150 input (token receiving) module, 150 instruction format, 150 nodes, 148 operands, 150 output module, 150 processing elements, 149, 150 program memory, 149 routing network, 150 switching network, 149 tokens, 150 waiting-matching store, 149-150
MIT static data-now computer data packet format, 143 diagram of, 142 instructions, 143 mentioned, 141 tokens,142
Motorola Corporation 8-bit devices, 7 88100 family, 16
addressing modes control register mode, 93
index 259
register indirect with index mode, 90 register indirect with scaled index, 90 register indirect with zero-extended immediate, 88-90 register mode with nine-bit vector table index, 91 register mode with 16-bit displacement/immediate, 91 register mode with 16-bit immediate addressing, 88 register mode with 10-bit immediate addressing, 88 register mode with 26-branch displacement, 91, 93 triadic method, 88
arithmetic-logic units, 81, 82 buses, 81,82,83,89,90, 93
arbitration of, 82 data P bus, 84 external operation, 84 instruction bus, 83 paralle~ 81, 82
calculation unit, 84 concurrency, 78, 82, 83 control signals, 82-82 data unit, SO, 82, 84 dedicated circuits for speed, 78 delayed branches, SO diagram of, 79 exception handling, 82, 83
arbitration, 82 exception-time register, 86 imprecise and precise, 83 prioritizing of register writes from, 82 register mode with 9-bit vector index, 91 shadow register, 86 supervisor mode, 84
execution-instruction pointer (xip), 84,91,93 execution units, 80, 81, 82, 83 fault-tolerant applications, 84-86
index 260
Motorola Corporation (continued) feed-forward information technique, 82 fetch instruction pointer, 84, 91, 93 floating-point unit, SO, 81, 82, 83, 84,86,93 instruction unit, SO, 82-84 instructions
addition instruction, 81 arithmetic instructions, generally,88 bit-field instructions, 82, 88 branch instructions, 93 compare instruction, 87 control-register instruction, 81 conversion instruction, 81 data memory access instructions,87-88 division instructions, 87 fields of, 88 flow-control instructions, 87 length of, 88 load instruction, 89 logic instructions, 81, 82, 88 mentioned, 66 multiplication instructions, 81 number of, SO register-to-register instructions, 87,88 store instruction, 89-90 subtraction instruction, 81 trap instruction, 91 .
integer unit, SO, 81-82, 84, 86 load/store architecture, 80 memory-management unit, 84 multiple pipelines, 78, 79-80, 84 multipliers, 78 next instruction pointer (nip), 84 ports, 82 predicates for condition testing, 93 program initialization, 82 register me/sequencer, 80, 81, 82-83 registers, 80-81, 82, 83, 84, 85, 86, 88
condition-code register absent, 86 exception-time register, 86 shadow register, 86 stack pointer absent, 86
supervisor mode, 84 user mode, 84
16-bit devices, 7 68XXX family
addressing modes absolute addressing, 24, 31 address register direct, 27 address register indirect, 27-28 address register indirect with displacement, 29, 30 address register indirect with index and displacement, 30 address indirect with postincrement,28 address register indirect with predecrement, 28-29 array manipulation with memory address mode, 27, 29 block movement with memory address mode, 27, 28 branch, 32 data register direct, 27 destination operand, 33 effective address, 24, 26, 31 extension word, 24, 31, 32 immediate addressing, 24, 31, 32 inherent mode, 33 input/output, 29 memory address modes, 27-30 mode bits of effective address, 24 opcode,24 operand specification, 24, 26, 27,28,29,30,31,33 operation word, 33 orthogonality of, 17 position-independent program, 32 program counter with displacement,32
program counter with index, 32 reference pointer with memory address mode, 27, 28 register bits of effective address, 24 sequential data, 28, 29 stack pointer, 28 stacking operations with memory address mode, 27, 28-29 table of, 25 tables, 28 variables, 29 word boundary, 28
address strobe, 93. complex instruction set, 63 memory accessing, 7 mentioned,7, 16, 63 operands, 7 pipeline in 68020, 39-40 registers, 7
32-bit devices, 7
Multimicroprocessing systems Connection machine, 7 definition of term, 14 diagram of, 15 general usage of term, 14 Hypercube, 7 mentioned, 7, 12, 16, 242 See also Motorola 68XXX family, Topology
Multiprocessing average distance, 222-223 cost advantages, 221-222 deadlock, 221 definition of term, 221 expansion, 224 fault tolerance, 221, 222, 224 flexibility, 221, 222 memory management, 221 Motorola 68XXX family, 33 program development, 192 normalized average distance, 223
reliability, 221 starvation, 221 throughput, 221
MultiproJP1lmming defInition of term, 16
index 261
dynamically reallocable program, 44-45 general usage of term, 14 ready queue, 16
Multitasking dynamically reallocable program, 44-45 general usage of term, 14 register banking, 162
NECPD7281 address generatorlflow controller, 153 buffers
function table, 152 link table, 152, 155 queue, 152
diagram of, 153 digital signal processing applications, 152, 156 identifier field, 155 input controller, 152 output controller, 152 output queue, 152-153 processing unit, 152 Ram, 153 refresh controller, 153 ring pipeline, 152,155 tokens, 152,153,154,155
NMOS, 3-4
Octal notation, 52
Opcodes decoding, 35 effective address, 24 excessive registers, 80-81 in 8085, 5 packing, 117
index 262
Operands data unit, 13 destination operand, 33 effective address, 24, 35 fetching for READ, 35 in Harvard architecture, 13 in 68XXX family, 7, 24, 26, 27, 31, 33 in von Neumann architecture, 14 referencing in advanced microprocessors, 19-20 sign extension, 19 specification of, 24, 26, 27, 28, 29,30, 33 storing for WRITE, 35
Operating system disk operating system, 41 during multiprogramming, 16 mentioned, 10, 17 priorities in user address space, 42 resource protection, 42 UNIX, 10, 174 use of paging scheme, 46, 47
Operation word, 24, 33
Organization, 10, 11-12
Orthogonality, 17,63-64, %
OS/2,41
Overlapping in Harvard architecture, 14 for increased speed, 9 pipeline timing hazard, 37 register windows, 67, 72
Page table, 59-60
Pagins aVOidance of external fragmentation, 46 extension and remapping of address range,46 internal fragmentation, 47, 48 lookup table, 46 mentioned, 45 page faults, 47
page register, 46, 47-48 page register array, 46-47 page size, 46, 48 page table pointer, 48 time, 47 use of larger size, 48
Parallel })rocessing general usage of term, 14 mentioned, 16, 17 pipelining as, 34 transputer, 8-9
Performance as delineating architecture, 11-12 condition-code register as hinderance, 86 effect of clock speed, 34 in reduced-instruction-set computer, 23,64 in set -associative cache, 54, 55 number of registers, 80 parallel buses, 81 stack pointer as hinderance, 86 with write-back cache, 57
Petri nets, 130
Pins, 1,3,4
Pipeline bottlenecking, 8 branches, 37,38,39, 75-79,80 branch latency slot, 76 change-of-flow instruction, 39, 79 contribution to advanced microprocessors, 17 delay slot, 74-75, 77 depth, 37, 39-41 diagram of, 34 digital computer, 34 distribution in, 36 division instructions in 88100, 81 form of parallel processing, 34 in Cray systems, 64 in MIPS, 65 increased speed due to, 9, 62
interlock, 37, 65, 74, 101 load latency, 37, 74-75 optimizing compiler, 67, 74 pipeline latency, 37, 73, n, 79, 80 pipeline stalling, 76 poor design, 79 reduced-instruction-set computers, generally, 73-74 superiority to instruction queue, 39 synchronization, 36 timing hazard, 37, 74 ~tstate,37, 74, 75, 76 See also Motorola 88100 and 68XXX families
PMOS technology, 2, 3
Power supply,S, 6, 57
~ffication during processing, 16 contents of, 58 disk storage of, 45, 58 dynamically reallocable, 44 . execution in von Neumann deVIce, 129 initialization in 88100, 82 page and segment tables during execution of, 60 position independent, 44 program locality, 49-50 sequencing, 81 task, 16,42 usual reference to buffers, code segments, lists, stacks, subroutines, 50
~counter in8080,4, 5 in 68020, 7 in von Neumann device, 129 program counter relative address, 72 sequencing function, 129 updating, 35
Program flow, 38-39
Programming capabilities, 14
index 263
Pyramid 9OX, 64
Random-access memory (RAM), 1-2, 48,59,180
Read-only memory (ROM), 1-2, 4
Ready queue, 16
Reduced-instruction-set computer (RISe)
addressing modes, 65, 66 architecture, 66 caches, 65 clocks per instruction (CPI), 66, 73-74,116 compiler, 66,73-74, 116-117 condition codes, 101 control unit, 66, 116 coprocessors, 65 debate on, 23 execution time, 66 evolution, 64-66 execution time, 8 floating-point pperations, 65, 71 global registers, 72 hardwired sequencer in 88100, 82 higb-Ievellanguages
features affecting, 66-67 language-directed, 66, 72 see also compilers
instruction format, 66, 116 instructions
complex instructions as macroinstruction or subroutine, 70 fIXed length, 66 for memory accessing, 66 mentioned, 8 number of, 65, 66 semantic content, 117
instruction set, 72, 116 load latency, 37 load/store architecture, 116 local variable, 72 memory accessing, 8 memory management, 65 nesting of procedures, 70-71
index 264
Reduced-instruction-set computer (RISC) (continued)
passed parameters, 72 philosophy of, 63-64, 72 pipeline latencies, 73 procedure calls, 70 register fIles, 65, 66, 72 register-to-register operations, 66, 71 registers, 9, 90
globaVlocal, 70 highllow, 70 stack for overflow, 70-71 window-based architecture, 67, 70, 72 trap, 71
Registers addressing, 24 as delineating instruction set architecture, 10 as delineating program architecture, 12 CPU register operations in Cray systerns, 64 effect of dedicated registers on bottlenecking, 5 effect on performance, 80 electrical and structural limits, 80 general-purpose registers, 9, 164 in advanced microprocessors, generally,17 in control unit, 13 in data unit, 13 in 8008, 2, 3 in 8080, 4, 5, 6 in 8085, 5 in 8086, 6 in 68020 optimum n~ber of, 80. . register fIle m reduced-mstructtonset computers, 66, 67, 72, 80 register-to-register operations, 66, 71 subfield of effective address, 24 table-origin register, 59 use by optimizing compiler, 38 user-visible registers, 10
Ridge 32, 64
Segment table, 59-60
Semantic gap . coupling of pipeline, regISter fIles, and optimizing compiler, 67 defmition of, 23 design concern, 19
Slots, 58, 61
Small talk, 99
Software bank switching, 42 branches, 76 kernel-type, 84 segmentation, 43
Speed affecting architecture, 12 cache-memory circuit for interface, 48 cycles-per-instruction,73 experiments with NMOS technology, 3-4 hardwired control unit, 66 loss with virtual memory system, 57-58 loss with write-through cache, 56 operand decoding, 117 optimum number of registers, 80 reduced-instruction-set computer, 64,66 techniques for increasing, 9, 62 system clock, 9 unique features of 88100, 78, 93
Spreadsheets, 41
Stack definition of, 3 during code execution, 49 in 8008, 3 in 8080, 4-5 in 68XXX family, 27, 28-29 in writable-instruction-set computer, 117
manipulation for memory accessing, 66 pushdown stack, 2, 4-5 stack overflow handler, 73 stack pointer, 4, 5, 86 usual reference to, 50
Stanford MIPS allocation of transistors, 99 arithmetic-logic unit, 99,101 barrel shifter, 99 buses, 99-100 cache on chip, 99 compiled code, 99 condition codes absent, 101 diagram of, 100 exceptions, 100 instruction cache, 65 instructions
compare instruction, 101 control-flow instructions, 101 jumps, 101 length of, 100 load instruction, 101 procedural linkage instructions, 101 register-to-register instructions, 101
instruction set compiler-based encoding of micromachine,l00
instruction word, interfaces, 100 interrupts, 100 pipeline, 65, 100 pipeline reorganizer for interlocks 101 ' program counter unit, 100 registers, 99, 100 speed, 99 32-bit device, 65 word-addressing device, 101
Subroudnes activation table, 73 effect of linking on reduced-instruc-
tion-set computer, 66 nested, 4
index 265
register Rl in 88100, 82 register window, 67, 73 return address, 117 sharing of, 203 treatment of complex instruction as 70 ' usual reference to, 50
Symbolics Corporadon, 148
Symbol table, 49
System clock, 9,17,34
Tables, 58
Tags, 57
Task, 16, 42, 151
Technological improvements, 8
Texas Instruments custom integrated circuits, 2 data-driven processor (DDP)
arithmetic-logic unit, 143 buses
E-bus interconnection network 143 ' maintenance bus, 143-144
diagram of, 144 Fortran, 143 host processor, 143 instruction format, 144 mentioned,141 nodes, 143 packets, 143073 pending instruction queue, 143 prototype, 143
microprocessor with LISP, 8 TMS320 family
accumulator, 201 addressing modes, 201-202 bit extraction, 202 interrupt, 202 instruction set
index 266
Texas Instruments (continued) Boolean instruction, 202 branch instruction, 201 general purpose, 201 special instructions for digital signal processing, 201
lack of floating-point multiplier/adder, 210 list of, 202 modified Harvard architecture, 202 pipeline, 201 software, 202 throughput, 201, 205, 207 TMS34010
diagram of, 218 graphics, 217,218 instruction set, 218-219 special hardware, 218 32-bit device, 217
uses for, 217-218 TMS32OC10,203 TMS32011
central processing unit, 203, 205 external address bus absent, 203 ports, 203 pulse code modulation companding function, 203 ROM, 203 timer, 203
TMS32010 arithmetic-logic unit, 203 auxiliary register, 203 barrel shifter, 203 diagram of, 204 masked, programmed ROM, 203 multiplier, 203 off-chip program memory, 203
TMS32010-25, 205 TMS32020
arithmetic unit, 205 auxiliary register, 205 diagram of, 206 floating-point operations, 207
global data memory interface, 207 i/o,205 instructions, 205, 207 memory, 205 ports, 205 software compatability, 207 throughput, 205
TMS320C35 auxiliary register, 207 diagram of, 208 ROM, 207 stack, 207 throughput, 207
See also gallium arsenide risc
Throughput calculation of, 35-36 design goal for 8080, 4 feed-forward information technique, 82 improvement of, generally, 1 improvement with pipelining, 35-36 multiprocessing, 221 reduced-instruction-set computer, generally,64
Topology bus-oriented topologies
beta topology, 231-232 butterfly topology, 232, 234, 238-239 mesh with wraparound connections in same row or column, 232, 234,236-237 spanning bus hypercube
average distance, 230 buses, 230 diagram of, 231 expansion, 230 mesh structure, 230 nodes, 230
communication links, 223 definition of term, 16 enhancement of m advanced microprocessors, 9
link-oriented networks alpha network
average distance, 228, 230 communication links, 228 diagram of, 227 expansion, 228 fault tolerance, 228 generalized hypercube, 226 nodes, 226,227, 231 ports, 228 routing algorithm, 228
cube-connected cycles average distance, 225 communication links, 225 diagram of, 226 expansion, 226 fault tolerance, 226 interconnection rules, 225 normalized average distance, 225 ports, 225 routing algorithm, 225 with BOO3, 234, 240-241
hypertree average distance, 228 binary tree structure, 228 communication links, 229 diagram of, 229 expansion, 229 fault tolerance, 228 ports, 228 sibling nodes, 228 table of distances, 229, 230
ring topology, 147 average distance, 224 communication links, 225 diagram of, 224 fault tolerance, 225 local-area networks, 224 normalized average distance, 224,225 routing, 224, 225 with BOO3, 232-234
routing algorithm, 223, 225, 228
Transistor, 1,2,148
Trap, 66,71,91
1Jniprocessors,5
VAX microinstructions, 8 8600 mentioned, 1 11/780 mentioned, 63
index 267
Variables, 29 effect on reduced-instruction-set computer, 67 global variables, 70 local variables, 72 storage of, 67-70 up-level local variables, 73
Vectors, 49
Very-large-scale integration (VLSI), 8, 64
Very-Iong-instruction-word computer (VLIW)
arithmetic-logic unit, 119 compilers, 119 concurrency, 122 instruction word, 119 mentioned, 8, 127 percolation scheduling, 122 register file, 119, 123-124 trace scheduling, 120-122
Virtual memory cost, 57 disk storage, 57, 58 displacement, 59 dynamic-address-translation facility, 59 exception condition, 61 frame, 58, 61 hard disk controller, 59 i/o operation, 61 load/store architecture, 116 logical address as virtual address, 57 mapping to random-access memory, 57-58,59 page,58
index 268
Virtual memory (continued) page inlpage out, 61 page number, 59, 61 page table, 59-60, 61 real memory, 58 segment number, 59, 61 segment table, 59,61 slot, 58, 61 speed, 57
VLSI Technology, Inc. mentioned, 16, 102 VL86C010
addressing mode, 105 triadic method, 103 ALVEY, 102
Booth's multiplier, 102 data bus, 102 diagram of, 103
instructions, 66 block data-transfer instructions, 103,104 branch instructions, 103, 104 condition execution field, 103 data processing, 103 data transfer instructions, 103 software interrupt, 103, 104 speed, 105
load/store architecture, 102 mentioned, 16 program counter, 104 queues, 105 registers, 102
allocation of dedicated and general-purpose, 103 block movement, 104 diagram of, 103 overlapping, 102 program counter/processor status register, 103, 104 stacks, 105
speed, 102-03
Word as default data length, 20 choice
in block fetch, 49-50 in cache-memory system, 49 in direct mapping system, 53 in fully associative method, 50-51
size in advanced microprocessors, generally, 61-62 in Harvard and von Neumann architectures, 14 in writable-instruction-set computers,117
Work areas, 58
Workstations ~29000,159,177 MIPS Computer Systems, Inc., 65 Sun SPARC, 64 TMS34010, 217-218
Writable-instruction-set computer (WISC)
arithmetic-logic unit, 117, 118 clock-memory reference cycle, 117 condition-code testing, 119 data bus, 118 data path, 117 diagram of, 118 i/o, 118 instructions
access time, 119 decoding path, 118 delayed branches, 119 fixed-length format, 116 instruction set, 116
memory bandwidth, 116, 117, 118 memory references, 117 microcoded processor, 116, 117, 118 pipeline, 119 program counter, 117 program memory, 118, 119 ~,117 registers, 117, 118 stacks, 117, 118, 119
Zilog Z80, 7