TDT4255 Computer DesignTDT4255 Computer Design Review ... · TDT4255 Computer DesignTDT4255 Computer Design Review Lecture ... Slides are adapted from Morgan Kaufmann ... Representable

1

TDT4255 Computer DesignTDT4255 Computer Design

Review Lecture – First Half

Magnus Jahre

TDT4255 – Computer Design

2

ABOUT THE EXAM


3

About exam• The exam will cover a large part of the curriculum• The exam will cover a large part of the curriculum

(reading list) • Exam properties that we seek:

– Comprehensible and unambiguous– Correct– Reasonable (e.g. not too easy, not too difficult, not ask about

i t t d t il b t th t t f i i l dunimportant details but rather try to focus on principles and understanding, etc.)

– Relevant (same as above) – Differentiating (NTNU has decided that an 'A' should be anDifferentiating (NTNU has decided that an A should be an

outstanding result, and we need to have some difficult questions to be able to find eventual A-candidates and to get a reasonable distribution of the students among the possible marks.) U di t bl (W thi k it h ld t b i i f ti– Unpredictable (We think it should not be given information or answers to questions that are of a kind that makes it possible for smart or pushing students to find out what the exam will include or not. We want to influence the students so that they prepare for the

b t i t i i th l i f th t i l


exam by trying to maximize the learning of the course material rather than by speculation :-) ).

4

How to Answer an ExamHow to Answer an Exam Question• Only answer what is asked for

No points awarded for answers that are besides the point– No points awarded for answers that are besides the point

• Only answer what you are reasonably sure is correct• Only answer what you are reasonably sure is correct– Norwegian saying: ”It’s better to keep you mouth shut and let

people think you are stupid than to open your mouth and remove all d bt ”doubt.”

• There is a limited amount of space available to• There is a limited amount of space available to answer the questions– Prioritize: good priorities indicate good understanding


g p g g

5

Example Assignment (1/2)

• Explain the difference between a write-through and a write back strategy for cacheswrite-back strategy for caches

• Good answer:• Good answer:– A write-through strategy updates main memory on all cache writes– A write-back strategy writes back dirty data when the block is

evicted from the cache

• Why is this good?– Answers the question– Only answers the question


Only answers the question

6

Example Assignment (2/2)• Explain the difference between a write through and a• Explain the difference between a write-through and a

write-back strategy for caches

• Poor answer:– A write-through strategy updates main memory on all cache writesg gy p y– A write-back strategy writes back dirty data when the block is

evicted from the cache– Set associative caches are common in current processors– Set associative caches are common in current processors– Fully associative caches are popular because they give the lowest

miss rates(th ti ith ibl i l t f t b t– (the answer continues with any possible irrelevant facts about caches where some are correct and others are wrong or at least imprecise)

N t k d f ! I i !TDT4255 – Computer Design

Not asked for! Imprecise!

7

Other Practicalities

• The exam will have no multiple choiceTrade off: hard to write vs easy to grade– Trade off: hard to write vs. easy to grade

• MIPS fact sheet will be providedp

• I will make last years exam for TDT4160 available– Curriculum is very different– Introductory course: You will get harder questions– Illustrates my exam styleIllustrates my exam style


8

Chapter 1 Reviewp


Acknowledgement: Slides are adapted from Morgan Kaufmann companion material

9

Defining PerformanceDefining Performance• Which airplane has the best performance?

Boeing 747

Boeing 777

Boeing 747

Boeing 777

DouglasDC-8-50

BAC/SudConcorde

Douglas DC-8-50

BAC/SudConcorde

0 100 200 300 400 500

Passenger Capacity

0 2000 4000 6000 8000 10000

Cruising Range (miles)

BAC/SudConcorde

Boeing 747

Boeing 777

BAC/SudConcorde

Boeing 747

Boeing 777

0 500 1000 1500

DouglasDC-8-50

Concorde

Cruising Speed (mph)

0 100000 200000 300000 400000

Douglas DC-8-50

Concorde

Passengers x mph


Cruising Speed (mph) Passengers x mph

10

Response Time• Book definition: Time from issuing a command to its

completionThi i ft f d t th t d ti– This is often referred to as the turn-around time

• More common response time definition: Time fromMore common response time definition: Time from issue to first response

• Execution time is the time the processor is busy execution the programg– Turn-around time includes the time the process waits to be

executed, execution time does notAlso: user execution time vs system execution time


– Also: user execution time vs. system execution time

11

Response Time and Throughput

• Throughputg p– Total work done per unit time

• How are response time and throughput affected byy– Replacing the processor with a faster version?– Adding more processors?Adding more processors?


12

CPI in More Detail• If different instruction classes take different numbers of

cyclescycles

n

1i

ii )Count nInstructio(CPICycles Clock

Weighted average CPI

n

1i

ii CountnInstructio

Count nInstructioCPICountnInstructio

Cycles ClockCPI 1i

Relative frequency


13

Appendix D Reviewpp



14

Combinatorial logic

• Combinatorial logic only depends on current inputsWe don’t need a clock!– We don t need a clock!

• There might be inputs that are irrelevant to our circuit• There might be inputs that are irrelevant to our circuit– Don’t cares– Room for optimizationRoom for optimization


15

32 Bit ALU

• Exploit the 1 bit ALU abstraction to create aabstraction to create a wide ALU– Called a ripple carry pp y

adder

• Ripple carry adders are slow– Carry propagation

through the circuit is the critical path


16

Carry Lookahead• Idea: We can use more logic to shorten the critical

path of a ripple carry adder

• Each carry bit uses all previous carries and inputs– We can compute each carry directly by applying the formulas

recursively– But: Logic overhead grows quickly

• Two bit carry lookahead example:

1111112

0000001

bacacbcbacacbc


11000000100000012 ][][ babacacbabacacbbc

17

Sequential Systems

Cl ki th d l i• Clocking methodologies– Edge triggered: State elements are updated on clock transitions– Level triggered: State elements are updated continuously while theLevel triggered: State elements are updated continuously while the

clock is either 1 or 0– Choose one or the other

Different methodologies may be appropriate for different production– Different methodologies may be appropriate for different production technologies


18

Register

• Collection of flip-fl l t h th t

reg: process(clk)begin

flops or latches that store multi-bit values

if rising_edge(clk) thendata_out <= data_in_1;

end if;

• Register files end process reg;

VHDL d i id ti l tg

contain multiple registers and access

VHDL code is identical to latch/flip-flop except that the signals are vectors and not g

logic scalars


19

Register File Example

2 P t R d l i 1 P t W it l iTDT4255 – Computer Design

2 Port Read logic 1 Port Write logic

20

Finite State Machines

• Commonly synchronousChanges state on clock– Changes state on clock tick

• Two types– Moore: Next state only

depends on current state– Mealy: Next state

depends on current state M M l ?

depe ds o cu e t stateand inputs Moore or Mealy?


Almost all electronic systems contain a number of state machines

21

Chapter 2 Reviewp



22

Instruction Set DesignInstruction Set DesignDP1 Si li it f l it• DP1: Simplicity favors regularity– Regularity makes implementation simpler– Simplicity enables higher performance at lower costSimplicity enables higher performance at lower cost

• DP2: Smaller is faster

• DP3: Make the common case fast– Small constants are common– Immediate operand avoids a load instruction

• DP4: Good design demands good compromises– Different formats complicate decoding, but allow 32-bit instructions uniformly


Different formats complicate decoding, but allow 32 bit instructions uniformly– Keep formats as similar as possible

23

MIPS R-format Instructionsop rs rt rd shamt funct

6 bits 6 bits5 bits 5 bits 5 bits 5 bits

• Instruction fields– op: operation code (opcode)

6 bits 6 bits5 bits 5 bits 5 bits 5 bits

op: operation code (opcode)– rs: first source register number– rt: second source register number– rd: destination register number– shamt: shift amount (00000 for now)– funct: function code (extends opcode)( p )


24

MIPS I-format Instructions

op rs rt constant or address6 bits 5 bits 5 bits 16 bits

• Immediate arithmetic and load/store instructions– rt: destination or source register number– Constant: –215 to +215 – 1– Address: offset added to base address in rs


25

Branch Addressing• Branch instructions specify

– Opcode, two registers, target address

M t b h t t b h• Most branch targets are near branch– Forward or backward

op rs rt constant or address6 bit 5 bit 5 bit 16 bit6 bits 5 bits 5 bits 16 bits

PC-relative addressingg Target address = PC + offset × 4 PC already incremented by 4 by this time


PC already incremented by 4 by this time

26

Jump Addressing• Jump (j and jal) targets could be anywhere in text

segmentEncode full address in instruction– Encode full address in instruction

op addressop address6 bits 26 bits

(P d )Di t j dd i (Pseudo)Direct jump addressing Target address = PC31…28 : (address × 4)


27

Local Data on the StackLocal Data on the Stack

• Local data allocated by calleee g C automatic variables– e.g., C automatic variables

• Procedure frame (activation record)– Used by some compilers to manage stack storage


Used by some compilers to manage stack storage

28

Memory LayoutMemory Layout• Text: program code

St ti d t l b l• Static data: global variables

t ti i bl i C– e.g., static variables in C, constant arrays and strings

– $gp initialized to address$gp initialized to address allowing ±offsets into this segment

• Dynamic data: heap– E.g., malloc in C, new in

JJava• Stack: automatic storage


29

Translation and Startup

Many compilers produce object modules directlyj y

St tiStatic linking


30

Chapter 3 Reviewp



31

Integer AdditionInteger Addition• Example: 7 + 6

Overflow if result out of rangeAddi d d fl Adding +ve and –ve operands, no overflow

Adding two +ve operandsO fl if lt i i 1 Overflow if result sign is 1

Adding two –ve operandsOverflow if result sign is 0


Overflow if result sign is 0

32

MultiplicationMultiplication• Start with long-multiplication approach

1000multiplicand

1000× 1001

10000000

multiplier

0000 0000 1000 1001000prod ct 1001000

Length of product

product

g pis the sum of operand lengths


33

Optimized MultiplierOptimized Multiplier• Perform steps in parallel: add/shift

One cycle per partial-product addition That’s ok, if frequency of multiplications is low


, q y p

34

Dividend/Divisor = Quotient

DivisionC f• Check for 0 divisor

• Long division approachIf divisor ≤ dividend bits

quotient

dividend – If divisor ≤ dividend bits• 1 bit in quotient, subtract

– Otherwise

10011000 1001010

-1000• 0 bit in quotient, bring down next

dividend bit

• Restoring division

100010101 1010

divisor

g– Do the subtract, and if remainder

goes < 0, add divisor back• Signed division

1010-1000

10remainder• Signed division

– Divide using absolute values– Adjust sign of quotient and remainder

n-bit operands yield n-bitquotient and remainder


j g qas required

35

Representable Floating PointRepresentable Floating Point Numbers


36

IEEE Floating-Point FormatIEEE Floating Point Formatsingle: 8 bitsdouble: 11 bits

single: 23 bitsdouble: 52 bits

S Exponent Fractiondouble: 11 bits double: 52 bits

Bias)(ExponentS 2Fraction)(11)(x

• S: sign bit (0 non-negative, 1 negative)• Normalize significand: 1.0 ≤ |significand| < 2.0

– Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)

– Significand is Fraction with the “1.” restored• Exponent: excess representation: actual exponent + Bias

– Ensures exponent is unsigned– Single: Bias = 127; Double: Bias = 1203


Single: Bias 127; Double: Bias 1203

37

Chapter 4 Reviewp



38

Single Cycle DatapathSingle Cycle Datapath


39

R-Type InstructionR Type Instruction


40

Load InstructionLoad Instruction


41

Branch-on-Equal InstructionBranch on Equal Instruction


42

Datapath With Jumps AddedDatapath With Jumps Added


43

Multi-cycle Datapath (1/2)Multi cycle Datapath (1/2)• Idea: Add registers at strategic points in the datapathg g p p• Activate only needed functional units with control

signals


44

Multicycle Datapath (2/2)Multicycle Datapath (2/2)• Area savings possible (but not necessary)g p ( y)

– Only one memory– Only one ALU


Documents

TDT4255 Computer DesignTDT4255 Computer Design Review ... · TDT4255 Computer DesignTDT4255 Computer Design Review Lecture ... Slides are adapted from Morgan Kaufmann ... Representable