12
Sample Exam Questions 1) Explain the following terms: • Carry select • Reduction of summands (multiplication) • Multiplicative division. • non-restoring division • size gap • CISC 2) a) Name major characteristics of the RISC philosophy: b) Name major characteristics of the CISC philosophy: 3) A program contains the following instruction mix: 60% load/store instructions with execution time of 1.2 microsecond each 10% ALU instructions with execution time of 0.8 microsecond each 30% branch instructions with execution time of 1.0 microsecond each a) If the clock period is 0.2 microsecond, calculate the average clock cycles per instruction (CPI) for the program. b) What is the average million-instruction per second (MIPS) rate of the program? 4) Assume floating point square root (FPSQR) is responsible for 25% of the execution time of a benchmark on a machine. One proposal is to add FPSQR hardware that will speed up this operation by a factor of 10. The other alternative is just to make all floating point (FP) instructions run two times faster. FP instructions are responsible for a total of 40% of the execution time. Compare performance of these two design alternatives.

Sample Exam Questions

Embed Size (px)

Citation preview

Page 1: Sample Exam Questions

Sample Exam Questions

1) Explain the following terms:• Carry select• Reduction of summands (multiplication)• Multiplicative division.• non-restoring division• size gap• CISC

2)a) Name major characteristics of the RISC philosophy:b) Name major characteristics of the CISC philosophy:

3)A program contains the following instruction mix:• 60% load/store instructions with execution time of 1.2 microsecond each• 10% ALU instructions with execution time of 0.8 microsecond each• 30% branch instructions with execution time of 1.0 microsecond eacha) If the clock period is 0.2 microsecond, calculate the average clock cycles

per instruction (CPI) for the program.b) What is the average million-instruction per second (MIPS) rate of the

program?

4)Assume floating point square root (FPSQR) is responsible for 25% of the execution time of a benchmark on a machine. One proposal is to add FPSQR hardware that will speed up this operation by a factor of 10. The other alternative is just to make all floating point (FP) instructions run two times faster. FP instructions are responsible for a total of 40% of the execution time. Compare performance of these two design alternatives.

5)Assume we have a machine where the CPI is 2.0 when all memory accesses (instruction fetches and data fetches) hit in the cache. The only data accesses are loads and stores (note, these are one address type instructions), and these total 40% of the instructions (the rest of instructions are dealing with registers). If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the machine be if all accesses were cache hits?

6)A RISC type workstation uses a 15-MHz processor with a claimed 10-MIPS rating to execute a given program mix. Assume a one-cycle delay for each memory access:a) What is the effective CPI of this computer?

Page 2: Sample Exam Questions

b) Suppose the processor is being upgraded with a 30-MHz clock. However, the speed of the memory subsystem remains unchanged, and consequently two clock cycles are needed per memory access. If 30% of the instructions require one memory access and another 5% require two memory accesses per instruction, what is the performance of the upgraded processor with a compatible instruction set and equal instruction counts in the given program mix?

7)Three enhancements with the following speedups are proposed:

Speedup1 = 30Speedup2 = 20Speedup3 = 10Only one enhancement is usable at a time:a) If enhancements 1 and 2 are each usable for 30% of the time, what

fraction of the time must enhancement 3 be used to achieve an overall speedup of 10?

b) Assume the distribution of enhancement usage is 30%, 30%, and 20% for enhancements 1, 2, and 3, respectively. Assuming all three enhancements are in use, for what fraction of the reduced execution time is no enhancement in use?

c) Assume for some benchmark, the fraction of use is 15% for each of the enhancements 1 and 2 and 70% for enhancement 3. We want to maximize performance. If only one enhancement can be implemented, which should be chosen?

8)a) As a performance metric define MIPSb) It has been discussed that MIPS is not an accurate measure for comparing

performance among computers justify this. Note, I need a discussion that is straight-forward and clear.

9)a) Cray Y-MP/8 (a vector processor) has a cycle time of 6ns. During a cycle, the

results of both an addition and a multiplication can be completed. Furthermore, there are eight processors operating simultaneously without interference in the best case. Calculate the peak performance of the Cray Y-MP/8 (in MIPS).

10)Consider the time needed to transfer a block of data from the main memory to the cache when a read miss occurs. Cache block size is 8 words, it takes one clock cycle to send an address to the main memory, it takes 8 clock cycles to read the first word and subsequent words are read in 4 clock cycles per word, finally one clock cycle is needed to send one word to the cache.

a) If a single memory module is used, then what is the time needed to load a block from main memory into cache.

Page 3: Sample Exam Questions

b) Now assume that the memory is 4-way interleaved, then what is the time needed to load a block from main memory into cache.

c) Now assume the computer has L1 and L2 caches, each with block size of 8 words. Hit rate is the same for both caches and that it is 0.95 and 0.90 for instructions and data, respectively. Finally, the time needed to access an 8-word block in these caches are C1=1 and C2=10 cycles:a. What is the average access time experienced by the processor during an

instruction cycle, if the main memory uses interleaving (30% of instructions are load/store) (use parameters defined earlier)?

b. What is the average access time during an instruction cycle, if the main memory is not interleaved?

c. What is the improvement obtained with interleaving?

11)Calculate the execution time of a parallel adder augmented by the carry look-ahead scheme1 where operands are 64-bit long. Note, the basic building blocks are a collection of eight full adders, and within each basic building block it takes 1d delay to generate ps and gs, 2d extra delay to generate carries, and 1d extra delay to generate sums (Show your work in detail).

12)Calculate the execution time of a 30-bit parallel adder augmented by carry-select scheme. Note: each basic building block is a cascade of six full adders (show your work step-by-step).

13)Calculate the execution time of a 64-bit parallel adder augmented by carry-select scheme. Note: each basic building block is a cascade of eight full adders (show your work step-by-step).

14)Calculate the execution time of a parallel adder augmented by the carry look-ahead scheme1 where operands are 32-bit long. Note, the basic building blocks are a collection of six full adders (Show your work in detail).

15)Figure 1 shows the ith stage logic of a parallel ALU: Where (Ai, Bi and Ci) are the operands and the carry-in, respectively, and (S2, S1, S0 and M) are the control signals. Determine under what values of S2, S1, S0, M, and C1 (carry-in to the right most stage) the ALU performs the following operation:

a) F A - B (Why?)b) F B (Why?).

Page 4: Sample Exam Questions

Figure 1

16)Booth algorithm is a technique that allows multiplication of two 2s complement numbers:

a) True or false; On the average Booth algorithm is faster than add-and-shift algorithm (justify your answer)?

b) Booth algorithm can be extended by checking three bits of multiplicand in one loop iteration. Compare and contrast traditional Booth algorithm with extended Booth algorithm.

c) Extended Booth algorithm can be further modified by checking group of more than three bits in each iterations (say 4, 5, …). However, in practice rarely, Booth algorithm based on grouping of more than three bits has been implemented. Why (clear explanation)?

d) Apply extended Booth algorithm (group of 3 bits) to perform the following multiplication:a. Multiplier 1000111b. Multiplicand 1111000

17)a) The "add and shift" algorithm can be used to multiply two signed numbers

(say A and B) in 1s complement format. Calculate the correction term, where A is negative and B is positive.

b) Apply your conclusion from part (a) to perform the following operation using "add and shift" algorithm (show step-by-step operation).

101001* 010011

Note: numbers are in 1s complement format.

18)

Page 5: Sample Exam Questions

a) The "add and shift" algorithm can be used to multiply two negative numbers (say A and B) in 2s complement format. Calculate the correction term.

b) Apply your conclusion from part (a) to perform the following operation using "add and shift" algorithm (show step-by-step operation).

101001* 110011

Note: numbers are in 2s complement format.

19)Apply the Column Compression technique to perform the following operation:

111001* 111101

Note: numbers are in 2s complement format.

20)a) Apply the Reduction of Summands technique to perform the following

operation (show your work step-by-step): 101011* 110101

b) Calculate the execution time of a Full Adder Tree when performing a 16*16 multiplication (show your work).

21)Apply the Column Compression technique to perform the following operation:

101011* 110101

Note: numbers are in 2s complement format (show your work in detail).

22)Apply the Column Compression technique to perform the following operation:

111011* 110111

Note: numbers are in 2s complement format (show your work in detail).

23)Apply Hurson's scheme to perform the following operation in which numbers are in 2s complement format:

1101101 * 0101101Show your work and explain each action step-by-step.

24)Apply the Column Compression technique to perform the following operation (show your work):

101001* 111101

Page 6: Sample Exam Questions

25)a) Draw the block diagram of a "full adder tree" for multiplication of two n-bit

numbers.b) Discuss the sequence of the operations in a "full adder tree".c) Calculate the execution time of an 8*8 "full adder tree" (show your work and

explain why).

26)Apply reduction of summands scheme (using half and full adders) to perform the following operation:

1 1 0 0 0 1 1 * 0 1 1 0 0 1 0Note: operands are unsigned numbers.

27)Apply reduction of summands scheme (using half and full adders) to perform the following operation:

1 1 0 1 1 1 1 * 0 1 1 0 0 1 1Calculate the execution time of the operation (show the work). Note: operands are in 2s complement format.

28)Apply column compression scheme to perform the following operation:

1 1 0 0 0 1 1 * 0 1 1 0 0 1 0Calculate the execution time of the operation (show the work).

29)Apply the Column Compression technique to perform the following operation:

1110111* 1101011

Note: numbers are in 2s complement format.

30)Use SRT division method to perform the following operation:

AQ/B where AQ = .00100011 B = .0111

Show step-by-step operation.

31)Use SRT division method to perform the following operation:

AQ/B where AQ = .00100000 B = .0110

Show step-by-step operation.

32)Use SRT method to perform

Page 7: Sample Exam Questions

AQ/B where AQ = .0010100100 andB = .01111 (Show step by step operations.)

33)Use SRT division method to perform the following operation:

AQ/B where AQ = .11001100B = .0111

Show step-by-step operation.

34)As a computer architect, in the design process of an ALU, what initial issues one have to keep in mind? Name them and discuss about their importance.

35)a) Explain access gap as clearly as possible.b) Discuss, in detail, three distinct directions that reduce the access gap.

36)a) Compare and contrast low-order interleaving against high-order interleaving (I

need clear discussion).b) A 16-way interleaved memory is used for program storage. It is found that the

branching probability of the memory-request queue is 0.25. What is the average number of instruction words accessed per memory cycle?

37)Explain what factors degrade the performance of an interleaved memory. Why?

38)Calculate the average duration of an instruction cycle for a Harvard-machine organization, where:

instructions are in the form ofR R <op> <operand>, and there are two types of instructions:Type1: te ≤ tsType2: te > tste and ts are the instruction execution time and the main memory regeneration time, respectively.

39)Within the scope of interleaved memory:a) What factors affect efficiency the most?b) Prove part (a).

40)a) Define term “interleave memory”,

Page 8: Sample Exam Questions

b) Define high-order interleaving,c) Define low-order interleaving,d) Compare and contrast high-order interleaving with low-order interleaving,e) Two issues affect performance of an interleave memory:

a. What are they,b. Show (proof) how do they affect the effectiveness of the interleave

memoryf) With respect to the part (e), discuss about solutions (one for each case).

41)a) A memory is n-way interleaved if:

1)2)3)

b) Define high-order interleaving,c) Define low-order interleaving,d) Address accessible memory can be classified as:

1)2)3)

e) Explain access gap as clearly as possible.

42)Assume we are utilizing a parallel disk (RAID) composed of 6 and 8 disks (# of data disks available). Calculate space utilization of each configuration for various redundancy schemes (show your work):

Redundancy Space Utilization%

Configuration 6 disks 8 disksLevel 0

Level 1

Level 0+1

Level 3

Level 4

Level 5

Level 6

43)Assume we are utilizing a parallel disk (RAID) composed of 6 and 8 disks (# of data disks available). Calculate space utilization of each configuration for various redundancy schemes (show your work):

Page 9: Sample Exam Questions

Redundancy Space Utilization%

Configuration 6 disks 8 disksLevel 0

Level 1

Level 0+1

Level 3

Level 4

Level 5

Level 6

44)“A programmer should avoid the application of branch instructions in a program.” Clearly name and discuss three architectural concepts that support it.

45) All homework problems and quizzes