Upload
buinga
View
219
Download
0
Embed Size (px)
Citation preview
1
TDT4255 Computer DesignTDT4255 Computer Design
Review Lecture – First Half
Magnus Jahre
TDT4255 – Computer Design
2
ABOUT THE EXAM
TDT4255 – Computer Design
3
About exam• The exam will cover a large part of the curriculum• The exam will cover a large part of the curriculum
(reading list) • Exam properties that we seek:
– Comprehensible and unambiguous– Correct– Reasonable (e.g. not too easy, not too difficult, not ask about
i t t d t il b t th t t f i i l dunimportant details but rather try to focus on principles and understanding, etc.)
– Relevant (same as above) – Differentiating (NTNU has decided that an 'A' should be anDifferentiating (NTNU has decided that an A should be an
outstanding result, and we need to have some difficult questions to be able to find eventual A-candidates and to get a reasonable distribution of the students among the possible marks.) U di t bl (W thi k it h ld t b i i f ti– Unpredictable (We think it should not be given information or answers to questions that are of a kind that makes it possible for smart or pushing students to find out what the exam will include or not. We want to influence the students so that they prepare for the
b t i t i i th l i f th t i l
TDT4255 – Computer Design
exam by trying to maximize the learning of the course material rather than by speculation :-) ).
4
How to Answer an ExamHow to Answer an Exam Question• Only answer what is asked for
No points awarded for answers that are besides the point– No points awarded for answers that are besides the point
• Only answer what you are reasonably sure is correct• Only answer what you are reasonably sure is correct– Norwegian saying: ”It’s better to keep you mouth shut and let
people think you are stupid than to open your mouth and remove all d bt ”doubt.”
• There is a limited amount of space available to• There is a limited amount of space available to answer the questions– Prioritize: good priorities indicate good understanding
TDT4255 – Computer Design
g p g g
5
Example Assignment (1/2)
• Explain the difference between a write-through and a write back strategy for cacheswrite-back strategy for caches
• Good answer:• Good answer:– A write-through strategy updates main memory on all cache writes– A write-back strategy writes back dirty data when the block is
evicted from the cache
• Why is this good?– Answers the question– Only answers the question
TDT4255 – Computer Design
Only answers the question
6
Example Assignment (2/2)• Explain the difference between a write through and a• Explain the difference between a write-through and a
write-back strategy for caches
• Poor answer:– A write-through strategy updates main memory on all cache writesg gy p y– A write-back strategy writes back dirty data when the block is
evicted from the cache– Set associative caches are common in current processors– Set associative caches are common in current processors– Fully associative caches are popular because they give the lowest
miss rates(th ti ith ibl i l t f t b t– (the answer continues with any possible irrelevant facts about caches where some are correct and others are wrong or at least imprecise)
N t k d f ! I i !TDT4255 – Computer Design
Not asked for! Imprecise!
7
Other Practicalities
• The exam will have no multiple choiceTrade off: hard to write vs easy to grade– Trade off: hard to write vs. easy to grade
• MIPS fact sheet will be providedp
• I will make last years exam for TDT4160 available– Curriculum is very different– Introductory course: You will get harder questions– Illustrates my exam styleIllustrates my exam style
TDT4255 – Computer Design
8
Chapter 1 Reviewp
TDT4255 – Computer Design
Acknowledgement: Slides are adapted from Morgan Kaufmann companion material
9
Defining PerformanceDefining Performance• Which airplane has the best performance?
Boeing 747
Boeing 777
Boeing 747
Boeing 777
DouglasDC-8-50
BAC/SudConcorde
Douglas DC-8-50
BAC/SudConcorde
0 100 200 300 400 500
Passenger Capacity
0 2000 4000 6000 8000 10000
Cruising Range (miles)
BAC/SudConcorde
Boeing 747
Boeing 777
BAC/SudConcorde
Boeing 747
Boeing 777
0 500 1000 1500
DouglasDC-8-50
Concorde
Cruising Speed (mph)
0 100000 200000 300000 400000
Douglas DC-8-50
Concorde
Passengers x mph
TDT4255 – Computer Design
Cruising Speed (mph) Passengers x mph
10
Response Time• Book definition: Time from issuing a command to its
completionThi i ft f d t th t d ti– This is often referred to as the turn-around time
• More common response time definition: Time fromMore common response time definition: Time from issue to first response
• Execution time is the time the processor is busy execution the programg– Turn-around time includes the time the process waits to be
executed, execution time does notAlso: user execution time vs system execution time
TDT4255 – Computer Design
– Also: user execution time vs. system execution time
11
Response Time and Throughput
• Throughputg p– Total work done per unit time
• How are response time and throughput affected byy– Replacing the processor with a faster version?– Adding more processors?Adding more processors?
TDT4255 – Computer Design
12
CPI in More Detail• If different instruction classes take different numbers of
cyclescycles
n
1i
ii )Count nInstructio(CPICycles Clock
Weighted average CPI
n
1i
ii CountnInstructio
Count nInstructioCPICountnInstructio
Cycles ClockCPI 1i
Relative frequency
TDT4255 – Computer Design
13
Appendix D Reviewpp
TDT4255 – Computer Design
Acknowledgement: Slides are adapted from Morgan Kaufmann companion material
14
Combinatorial logic
• Combinatorial logic only depends on current inputsWe don’t need a clock!– We don t need a clock!
• There might be inputs that are irrelevant to our circuit• There might be inputs that are irrelevant to our circuit– Don’t cares– Room for optimizationRoom for optimization
TDT4255 – Computer Design
15
32 Bit ALU
• Exploit the 1 bit ALU abstraction to create aabstraction to create a wide ALU– Called a ripple carry pp y
adder
• Ripple carry adders are slow– Carry propagation
through the circuit is the critical path
TDT4255 – Computer Design
16
Carry Lookahead• Idea: We can use more logic to shorten the critical
path of a ripple carry adder
• Each carry bit uses all previous carries and inputs– We can compute each carry directly by applying the formulas
recursively– But: Logic overhead grows quickly
• Two bit carry lookahead example:
1111112
0000001
bacacbcbacacbc
TDT4255 – Computer Design
11000000100000012 ][][ babacacbabacacbbc
17
Sequential Systems
Cl ki th d l i• Clocking methodologies– Edge triggered: State elements are updated on clock transitions– Level triggered: State elements are updated continuously while theLevel triggered: State elements are updated continuously while the
clock is either 1 or 0– Choose one or the other
Different methodologies may be appropriate for different production– Different methodologies may be appropriate for different production technologies
TDT4255 – Computer Design
18
Register
• Collection of flip-fl l t h th t
reg: process(clk)begin
flops or latches that store multi-bit values
if rising_edge(clk) thendata_out <= data_in_1;
end if;
• Register files end process reg;
VHDL d i id ti l tg
contain multiple registers and access
VHDL code is identical to latch/flip-flop except that the signals are vectors and not g
logic scalars
TDT4255 – Computer Design
19
Register File Example
2 P t R d l i 1 P t W it l iTDT4255 – Computer Design
2 Port Read logic 1 Port Write logic
20
Finite State Machines
• Commonly synchronousChanges state on clock– Changes state on clock tick
• Two types– Moore: Next state only
depends on current state– Mealy: Next state
depends on current state M M l ?
depe ds o cu e t stateand inputs Moore or Mealy?
TDT4255 – Computer Design
Almost all electronic systems contain a number of state machines
21
Chapter 2 Reviewp
TDT4255 – Computer Design
Acknowledgement: Slides are adapted from Morgan Kaufmann companion material
22
Instruction Set DesignInstruction Set DesignDP1 Si li it f l it• DP1: Simplicity favors regularity– Regularity makes implementation simpler– Simplicity enables higher performance at lower costSimplicity enables higher performance at lower cost
• DP2: Smaller is faster
• DP3: Make the common case fast– Small constants are common– Immediate operand avoids a load instruction
• DP4: Good design demands good compromises– Different formats complicate decoding, but allow 32-bit instructions uniformly
TDT4255 – Computer Design
Different formats complicate decoding, but allow 32 bit instructions uniformly– Keep formats as similar as possible
23
MIPS R-format Instructionsop rs rt rd shamt funct
6 bits 6 bits5 bits 5 bits 5 bits 5 bits
• Instruction fields– op: operation code (opcode)
6 bits 6 bits5 bits 5 bits 5 bits 5 bits
op: operation code (opcode)– rs: first source register number– rt: second source register number– rd: destination register number– shamt: shift amount (00000 for now)– funct: function code (extends opcode)( p )
TDT4255 – Computer Design
24
MIPS I-format Instructions
op rs rt constant or address6 bits 5 bits 5 bits 16 bits
• Immediate arithmetic and load/store instructions– rt: destination or source register number– Constant: –215 to +215 – 1– Address: offset added to base address in rs
TDT4255 – Computer Design
25
Branch Addressing• Branch instructions specify
– Opcode, two registers, target address
M t b h t t b h• Most branch targets are near branch– Forward or backward
op rs rt constant or address6 bit 5 bit 5 bit 16 bit6 bits 5 bits 5 bits 16 bits
PC-relative addressingg Target address = PC + offset × 4 PC already incremented by 4 by this time
TDT4255 – Computer Design
PC already incremented by 4 by this time
26
Jump Addressing• Jump (j and jal) targets could be anywhere in text
segmentEncode full address in instruction– Encode full address in instruction
op addressop address6 bits 26 bits
(P d )Di t j dd i (Pseudo)Direct jump addressing Target address = PC31…28 : (address × 4)
TDT4255 – Computer Design
27
Local Data on the StackLocal Data on the Stack
• Local data allocated by calleee g C automatic variables– e.g., C automatic variables
• Procedure frame (activation record)– Used by some compilers to manage stack storage
TDT4255 – Computer Design
Used by some compilers to manage stack storage
28
Memory LayoutMemory Layout• Text: program code
St ti d t l b l• Static data: global variables
t ti i bl i C– e.g., static variables in C, constant arrays and strings
– $gp initialized to address$gp initialized to address allowing ±offsets into this segment
• Dynamic data: heap– E.g., malloc in C, new in
JJava• Stack: automatic storage
TDT4255 – Computer Design
29
Translation and Startup
Many compilers produce object modules directlyj y
St tiStatic linking
TDT4255 – Computer Design
30
Chapter 3 Reviewp
TDT4255 – Computer Design
Acknowledgement: Slides are adapted from Morgan Kaufmann companion material
31
Integer AdditionInteger Addition• Example: 7 + 6
Overflow if result out of rangeAddi d d fl Adding +ve and –ve operands, no overflow
Adding two +ve operandsO fl if lt i i 1 Overflow if result sign is 1
Adding two –ve operandsOverflow if result sign is 0
TDT4255 – Computer Design
Overflow if result sign is 0
32
MultiplicationMultiplication• Start with long-multiplication approach
1000multiplicand
1000× 1001
10000000
multiplier
0000 0000 1000 1001000prod ct 1001000
Length of product
product
g pis the sum of operand lengths
TDT4255 – Computer Design
33
Optimized MultiplierOptimized Multiplier• Perform steps in parallel: add/shift
One cycle per partial-product addition That’s ok, if frequency of multiplications is low
TDT4255 – Computer Design
, q y p
34
Dividend/Divisor = Quotient
DivisionC f• Check for 0 divisor
• Long division approachIf divisor ≤ dividend bits
quotient
dividend – If divisor ≤ dividend bits• 1 bit in quotient, subtract
– Otherwise
10011000 1001010
-1000• 0 bit in quotient, bring down next
dividend bit
• Restoring division
100010101 1010
divisor
g– Do the subtract, and if remainder
goes < 0, add divisor back• Signed division
1010-1000
10remainder• Signed division
– Divide using absolute values– Adjust sign of quotient and remainder
n-bit operands yield n-bitquotient and remainder
TDT4255 – Computer Design
j g qas required
35
Representable Floating PointRepresentable Floating Point Numbers
TDT4255 – Computer Design
36
IEEE Floating-Point FormatIEEE Floating Point Formatsingle: 8 bitsdouble: 11 bits
single: 23 bitsdouble: 52 bits
S Exponent Fractiondouble: 11 bits double: 52 bits
Bias)(ExponentS 2Fraction)(11)(x
• S: sign bit (0 non-negative, 1 negative)• Normalize significand: 1.0 ≤ |significand| < 2.0
– Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)
– Significand is Fraction with the “1.” restored• Exponent: excess representation: actual exponent + Bias
– Ensures exponent is unsigned– Single: Bias = 127; Double: Bias = 1203
TDT4255 – Computer Design
Single: Bias 127; Double: Bias 1203
37
Chapter 4 Reviewp
TDT4255 – Computer Design
Acknowledgement: Slides are adapted from Morgan Kaufmann companion material
38
Single Cycle DatapathSingle Cycle Datapath
TDT4255 – Computer Design
39
R-Type InstructionR Type Instruction
TDT4255 – Computer Design
40
Load InstructionLoad Instruction
TDT4255 – Computer Design
41
Branch-on-Equal InstructionBranch on Equal Instruction
TDT4255 – Computer Design
42
Datapath With Jumps AddedDatapath With Jumps Added
TDT4255 – Computer Design
43
Multi-cycle Datapath (1/2)Multi cycle Datapath (1/2)• Idea: Add registers at strategic points in the datapathg g p p• Activate only needed functional units with control
signals
TDT4255 – Computer Design
44
Multicycle Datapath (2/2)Multicycle Datapath (2/2)• Area savings possible (but not necessary)g p ( y)
– Only one memory– Only one ALU
TDT4255 – Computer Design