Upload
lamkhanh
View
229
Download
0
Embed Size (px)
Citation preview
Data Processor Processed data
The brain of the computing system, meant to carry out the intended functionality, as and when needed.
Instructions
CPU
Processed data
Data
Opcode Mode register Address
Simplified example of an instruction:
Instruction / Data
Control
Memory
Process
Computer
Memory
InputOutput
SystemInterconnects
Peripherals
Communicationlines
CPU: CentralProcessing
Unit
Computer
Computer Arithmeticand Logic Units
ControlUnit
InternalInterconnects
Registers
CPU
I/O
Memory
Systembus
CPU
Computer Arithmeticand Logic Units
ControlUnit
InternalInterconnects
Registers
CPU
I/O
Memory
Systembus
CPU
Status Registers:A set of bits
Includes Condition CodesSign of last result
Zero, Carry, and EqualOverflow
Interrupt enable/disableSupervisor
V1 Coldfire user visible registers‐ 16 general‐purpose 32‐bit registers
(D0–D7, A0–A7)‐ 32‐bit program counter (PC)‐ 8‐bit condition code register (CCR)
Internal (system) registers‐ 16‐bit status register (SR)‐ 32‐bit supervisor stack pointer (SSP)‐ 32‐bit vector base register (VBR)‐ 32‐bit CPU configuration register
(CPUCR)
Do NOT try to memorise this!
ARM has a total of 37 registers‐ 31 general purpose / user visible registers‐ 6 internal (system) registers‐ Mostly 32‐bits each, depends on mode‐ 16 User visible registers at once
7 modes of operations:‐ User: normal execution mode‐ System: OS system‐privilege mode‐ FIQ: data transfer mode‐ Supervisor: SVC‐ Abort: abort instruction‐ IRQ: general purpose interrupt services‐ Undefined: when unintended
instructions are executed
Do NOT try to memorise this!
main(){int a,b,c[50];b = 2;for( a= 0; a < 50; a++)c[a] = a * b;
}
mov r3, #2str r3, [fp, #‐16]mov r3, #0str r3, [fp, #‐20]b .L2
.L3:ldr r1, [fp, #‐20]ldr r2, [fp, #‐20]ldr r3, [fp, #‐16]mul r0, r3, r2mvn r2, #207mov r3, r1, asl #2sub r1, fp, #12add r3, r3, r1add r3, r3, r2str r0, [r3, #0]ldr r3, [fp, #‐20]add r3, r3, #1str r3, [fp, #‐20]
.L2:ldr r3, [fp, #‐20]cmp r3, #49ble .L3sub sp, fp, #12ldmfd sp, {fp, sp, pc}
You’re not expected to learn ARM assembler from this lecture by the way! This shows Register use R0 R1…
Computer Arithmeticand Logic Units
ControlUnit
InternalInterconnects
Registers
CPU
I/O
Memory
Systembus
CPU
Fetch next instruction
Execute instruction
Decodeinstruction
Simplified view
Load PC addressinto Stack
Fetch Decode Execute
(Pre‐)Fetch Decode Execute
Instr. N
Instr. N+1
Detailed data flow
Instruction 3 caused a branch to 15
Instructions4‐7 have stalls
Prefetch the branch instructions and store somewhere non‐conflicting
• Do not prefetch after branch
Instr.
Order
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Reg ALU DMemIfetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
Reg ALU DMemIfetch Reg
Needs careful processor pipeline design with appropriate arbitration between streams(eg. skip the cycle 4)
pipelined
dunpipeline
Time CycleTime Cycle
CPI stall Pipeline CPI Ideal
depth Pipeline CPI Ideal Speedup
pipelined
dunpipeline
Time CycleTime Cycle
CPI stall Pipeline 1
depth Pipeline Speedup
Instper cycles Stall Average CPI Ideal CPIpipelined
For simple pipeline, ideal CPI = 1:
CPI: Cycles per instruction
Without pipelining, CPI is equal to the number of stages in Data Flow; assumingeach stage requires 1 cycle (= Ideal CPI x Pipeline depth)
Clock rates
remember LOADs are problematic
0
loads
• Would you pay for a double port DRAM system or not?
InstructionFetch
Shift + ALU MemoryAccess
RegWriteReg
ReadRegDecode
FETCH DECODE EXECUTE MEMORY WRITE
ARM9TDMI – 5 stage pipelineARM or ThumbInst Decode
Reg Select
RegRead Shift ALU Reg
WriteThumbARMdecompress
ARM decodeInstructionFetch
FETCH DECODE EXECUTE
ARM7TDMI – 3 stage pipeline
Computer Arithmeticand Logic Units
ControlUnit
InternalInterconnects
Registers
CPU
I/O
Memory
Systembus
CPU
Von NeumannHarvard
CISCRISC
CENTRALPROCESSING
UNIT(CPU)
PROGRAMAND
DATAMEMORY
DATAMEMORYADDRESS
INPUTUNIT
OUTPUTUNIT
CENTRALPROCESSING
UNIT(CPU)
PROGRAMMEMORY
DATAMEMORY
DATAPROGRAM
INSTRUCTION
PROGRAMMEMORYADDRESS
DATAMEMORYADDRESS
INPUT / OUTPUT
CENTRALPROCESSING
UNIT(CPU)
MEMORY
(PROGRAMAND DATA)
ADDRESSBUS
INPUT/OUTPUTDEVICES
CONTROLBUS
DATABUS
Shared L1 Data/Instruction Cache typical in modern systems
Decoder
Program Counter
Instruction Register
Data Address Data
ALU Data Registers
Control Path Data Path
+1
Program Memory Workspace(Data Memory)
Program address
Program Instruction
Data Source 1
Data Source 2
CISC: complex instruction set computerRISC: reduced instruction set computer
Berkeley group coined the term RISC and made a cpu called RISC 1soon after Stanford made a similar cpu: MIPS
SPARC also emerged from SUN
ARM has a range of RISC architectures
early RISC CPUs had about 50 instructions compared to 200‐300‐ aim was to simplify CPU to process (and start) instructions faster
RISC: Reduced Instruction Set Computer (e.g. ARM)
CISC: Complex Instruction Set Computer (e.g. Intel Pentium)
Computer Arithmeticand Logic Units
ControlUnit
InternalInterconnects
Registers
CPU
I/O
Memory
Systembus
CPU
To make an N‐bit ALU: Duplicate the 1‐bit ALU N times; connect carry‐out to neighbour's carry‐ins
An example: a one bit full adder
Example: a 32‐bit full adder
By paralleling the one‐bit ALUs and some other modification on the logical circuits, we can create bigger ALUs.
This 32‐bit adder can be used to1. Add two 32‐bit numbers2. Subtract 32‐bit numbers3. Generate
ZERO/NON_ZERO conditions
4. Set overflow flags
/enable(2’s complement)(2’s complement)
The decoder determines the nature of the instruction in IR, and passes this on to the PLA.
The control Programmable Logic Array (PLA) takes in information from all the other modules, and generates the appropriate data path operations
•
••
•
•••
1.
2.3.4.
a.
b.
c.
5.