15
Prof. Connors Practice Exam ECEN/CSCI 4593 Computer Organization and Design Exam-2 Name: Write your initials at the top of each page. You are allowed one 8.5X11 page of notes. No interaction is allowed between students. Do not open this booklet until you are told to do so. Show all of your work for possible partial credit. Problem 1 30 pts Problem 2 20 pts Problem 3 30 pts Problem 4 20 pts Problem 5 15 pts Problem 6 15 pts Problem 7 30 pts Page 1 of ??

Practice Exam Computer Architecture

Embed Size (px)

DESCRIPTION

Practice Exam Advanced Computer Architecture

Citation preview

Page 1: Practice Exam Computer Architecture

Prof. Connors Practice Exam

ECEN/CSCI 4593 Computer Organization and Design Exam-2

Name:

• Write your initials at the top of each page.

• You are allowed one 8.5X11 page of notes.

• No interaction is allowed between students.

• Do not open this booklet until you are told to do so.

• Show all of your work for possible partial credit.

Problem 1 30 ptsProblem 2 20 ptsProblem 3 30 ptsProblem 4 20 ptsProblem 5 15 ptsProblem 6 15 ptsProblem 7 30 pts

Page 1 of ??

Page 2: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Question 1. (30 points)

Give a concise answer to each of the following questions. Limit your answers to 20 words.

(a) What is the functionality of a BTB?

(b) If a compiler could fill all the delay slots of a machine, would there be any need for branchprediction?

(c) What are the main functionalities of a page table?

(d) Why does a two-level address translation algorithm require a smaller amount of main mem-ory than a one-level algorithm?

(e) What is set associativity?

(f) What is the purpose of dirty (modified) bit in a cache tag store entry?

Page 2 of ??

Page 3: Practice Exam Computer Architecture

Prof. Connors Practice Exam

(g) Describe the page faults that can potentially occur during a two-level address translation.

(h) Which cache model has the fastest HIT LATENCY (fully-associative, set-associative, direct-mapped)?

(i) Which cache design (write-back or write-through) uses a dirty bit?

(j) When a item of data is written repeatedly, which has better memory system utilization, write-back or write-through cache?

(k) What are the two localities (characteristics) that caches exploit?

(l) What are the three types of cache misses?

(m) True/False - Capacity misses are generally eliminated by increasing the cache associativity?

(n) True/False - A 4-stage pipeline that uses a combined EXECUTE and MEMORY stage willnot have a LOAD-stall.

Page 3 of ??

Page 4: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Question 2. (20 points)

This question covers cache and pipeline performance analysis.(Part A) Write the formula for the ideal number of cycles in a pipelined execution (use N forinstructions and P for pipestages within one instruction):

(Part B) Write the formula for the average memory access time assuming one level of cachememory:

(Part C) For a data cache with a 80% hit rate and a 1-cycle hit latency, calculate the averagememory access latency. Assume that latency to memory and the cache miss penalty together is100 cycles. Note: The cache must be accessed after memory returns the data.

(Part D) Calculate the performance of a standard 5-stage pipeline with full register bypassing. Thedata cache (for loads and stores) is the same as described in Part C and 30% of instructions areloads and stores. The instruction cache has a hit rate of 90% with a miss penalty of 50 cycles.Calculate the CPI of the pipeline, assuming everything else is working perfectly. Assume the loadnever stalls a dependent instruction and assume the processor must wait for stores to finish whenthey miss the cache. Finally, assume that instruction cache misses and data cache misses neveroccur at the same time.

Page 4 of ??

Page 5: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Question 3. (30 points)

(Part A) Dependence detectionThis question covers your understanding of dependences between instructions. Using the code

below, list all of the dependence types (FLOW, ANTI, OUTPUT). You should list them in the table(example INST-X to INST-Y FLOW) instead of drawing a graph.

I0: ADD R3 = R1 + R0;I1: SUB R0 = R3 - R4;I2: ADD R4 = R5 + R6;I3: MUL R4 = R3 + R1;I4: LDW R2 = MEM[R2 + 0];I5: AND R2 = R2 & R1;

From Instruction To Instruction Type of Dependence

Page 5 of ??

Page 6: Practice Exam Computer Architecture

Prof. Connors Practice Exam

(Part B) Forwarding logic designFor this problem you are to design a forwarding unit for a 5-stage pipeline processor. The

forwarding unit returns the value to be forwarding to the current instruction. There are three placesthat the values for register RS and register RT can come from: decode stage (register file), memorystage, and write-back stage.

FORWARDING UNIT

VALUE FOR RT

MEMORY STAGE INFORMATION

WRITE−BACK STAGE INFORMATION

DECODE STAGE INFORMATION

RS REG VALUE (32bits)

RS INDEX(5 bits)

RT REG VALUE (32bits)

RT INDEX(5 bits)

WRITE_ENABLE (1 bit)VALUE (32 bits)

REGISTER INDEX (5 bits)

REGISTER INDEX (5 bits)VALUE (32 bits)

WRITE_ENABLE (1 bit)VALUE FOR RS

The write-back and memory stage information consists of:

• INDEX- explaining which inflight register index is to be written

• VALUE- the value that is to be written

• ENABLE- whether or not the instruction in the stage is writing.

The decode stage simply states the register index (for RS and RT) and the corresponding reg-ister value from the register file.

Page 6 of ??

Page 7: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Generally three values could exist, one of which the forwarding unit should choose for each ofthe RS and RT register value requests. The memory stage has value MEM, the write-back stagehas value WB, and the register file has value RS-REG or RT-REG.

Using the table below which contains information about all of the instruction stages, indicatewhich value should be forward to the current instruction: MEM, WB, RS-REG, or RT-REG. Eachline represents a Forwarding unit evaluation, there is no connection between evaluation lines in thetable. You do not need to worry about hazard detection, only value bypassing.

Mem Stage Write-Back Stage Register Stage RS Value RT ValueEvaluation Index Write Index Write RS-Index RT-Index

0 5 1 23 0 6 71 7 0 16 1 16 82 10 1 10 1 11 103 17 0 12 1 12 124 19 0 19 0 19 25

Page 7 of ??

Page 8: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Question 4. (20 points)

Question 4. (20 points)

This problem covers your knowledge of branch prediction.The following figure illustrates three possible state machines.

0

1 11

10

01

00 00

01

10

11

LAST-TAKEN UP-DOWN AUTOMATON-A3

TN

T

NN

N

TN

N

N T

N

T

N T

TT

N

T

NOTES:

• Last taken predicts taken on 1

• Up-Down predicts taken on 11 and 10

• Automata A3 predicts taken on 11 and 10

Fill out the tables below for each branch predictor. The execution pattern for the branch isTNNTTN.

Page 8 of ??

Page 9: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Execution Branch State State Correct orTime Outcome Before After Incorrect

0 T 01 N2 N3 T4 T5 N

Table 1: Table for last-taken branch predictor.

Execution Branch State State Correct orTime Outcome Before After Incorrect

0 T 011 N2 N3 T4 T5 N

Table 2: Table for up-down branch predictor.

Execution Branch State State Correct orTime Outcome Before After Incorrect

0 T 011 N2 N3 T4 T5 N

Table 3: Table for Automata-A3 branch predictor.

Calculate the prediction rates of the three branch predictors:

Page 9 of ??

Page 10: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Predictor Prediction accuracyLast-takenUp-DownAutomata-A3

Question 5. (15 points)

This problem cover physical cache design and cache access.(Part A) Design a 32KB direct-mapped data cache that uses a 16-bit address and 4 bytes per block.Calculate the following:

(a) How many bits are used for the byte offset?

(b) How many bits are used for the set (index) field?

(c) How many bits are used for the tag?

(Part B) Cache access:Assume the following 6-bit physical address sequence generated by the microprocessor:

Time 0 1 2 3 4 5 6 7Access 001101 110010 111111 001100 011100 101001 111110 101001

The cache uses 2 bytes per block. Assume a 2-way set assocative cache design that uses theLRU algorithm. Assume that the cache is initially empty. Hint, first determine the TAG, SET, andINDEX field.

SET 0SET 1

BLOCK 0 BLOCK 1

76

543

210

(Part C) Derive the hit ratio for the access sequence in Part B.

Page 10 of ??

Page 11: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Question 6. (30 points)

The memory architecture of a machine X is summarized in the following table.

Virtual space 8GB spacePage size 16K bytesPTE size 4 bytes

(Part A) Assume that there are 10 bits reserved for the operating system functions (protection,replacement, valid, modified, and Hit/Miss- All overhead bits) other than required by the hardwaretranslation algorithm. Derive the largest physical memory size (in bytes) allowed by this PTEformat. Make sure you consider all the fields required by the translation algorithm.

(Part B) How large (in bytes) is the page table?

Page 11 of ??

Page 12: Practice Exam Computer Architecture

Prof. Connors Practice Exam

(Part C) In the picture below (the algorithm for a 1-level translation scheme), place the values forthe known fields of the virtual memory and physical memory in the diagram. If the value to beused in a box is known, fill in the value. Otherwise, indicate the number of bits associated witheach rectangular box. Assume that you answer from Part A defines the actual physical memory forthe processor. (Hint: Determine how many page frames in your memory and how to index thosepage frames).

data page

OS brings

MBR

access physical memoryPAD

PTE

access physical memoryPAPTE

VAD

PTBR

x

+

DatainPM

page faultno

D

Page 12 of ??

Page 13: Practice Exam Computer Architecture

Prof. Connors Practice Exam

Question 7. (20 points)

This question covers virtual memory access. Assume a 5-bit virtual address and a memorysystem that uses 4 bytes per page. The physical memory has 16 bytes (four page frames). Thepage table used is a one-level scheme that can be found in memory at the PTBR location. Initiallythe table indicates that no virtual pages have been mapped. Implementing a LRU page replacementalgorithm, show the contents of physical memory after the following four virtual access: 11100,01000, 00000, 01000. Show the contens of memory and the page table information after eachaccess sucessfully completes in Figure A, B, C, and D. Each page table entry (PTE) is 1 byte.

MMM

MMM

M

MPTBR

page frame 11

page frame 10

page frame 01

page frame 00

0

31302928

321

11100100

11100100

physical (main) memorypage 000

page 001

page 010

page 011

page 100

page 101

page 110

page 111

virtual space

4

8

12

16

20

24 25

21

17

13

9

6

10

14

26 27

23

15

11

7

18

22

5

19

Figure 1: The initial contents of memory.

Page 13 of ??

Page 14: Practice Exam Computer Architecture

Prof. Connors Practice Exam

page frame 11

page frame 10

page frame 01

page frame 00

11100100

physical (main) memory

Figure 2: Figure A (after access 11100).

page frame 11

page frame 10

page frame 01

page frame 00

11100100

physical (main) memory

Figure 3: Figure B (after access 01000).

Page 14 of ??

Page 15: Practice Exam Computer Architecture

Prof. Connors Practice Exam

page frame 11

page frame 10

page frame 01

page frame 00

11100100

physical (main) memory

Figure 4: Figure C (after access 00000).

page frame 11

page frame 10

page frame 01

page frame 00

11100100

physical (main) memory

Figure 5: Figure D (after access 01000).

Page 15 of ??