Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
March 22, 2018 10:13 am EE457 MT - Spring 2018 1 / 10 C Copyright 2018 Gandhi Puvvada
EE457 Midterm Exam (~24%)Closed-book Closed-notes Exam; No cheat sheets;
Ordinary calculators may be used but not the smart phone with calculators. Verilog Guides are not needed and are not allowed.Smart phones, tablets (and any kind of computing/Internet devices) are not allowed.
This is a Crowdmark exam. Please do not write on margins or on backside.
Spring 2018Instructor: Gandhi Puvvada
Thursday, 3/22/2018 (A 3-hour exam) 05:00 PM - 08:00 PM (180 min) in THH201Please do not write your student ID
Ques# Topic Page# Time PointsScore
1 Lab 7 Part 3 Subpart 2 2-5 90 min 130
2 Lab 7 Part 1 3-element adder 6-7 30 min 40
3 Flushing by a successful branch 8 15 min 25
4 Virtual Memory 8 5 min 15
5 Cache and MM Organization 9 15 min 32
Total Cover+8+ Blank = 10
155 min. 240
Perfect Score 230
March 22, 2018 10:13 am EE457 MT - Spring 2018 2 / 10 C Copyright 2018 Gandhi Puvvada
1 ( 15 + 115 = 130 points) 10+80 = 90 min. Lab 7 Part 3 Subpart 2 modification:
1.1 Reproduced below is the solution to the Spring 2016 problem you are asked to go through, showing the generation of the STALL_12 logic and how that is used to control the four EN (enables).
1.1.1 Mr. _________________ (Bruin/Trojan) says, "When you stall the EX12 stage, it is not necessary to stall the WB stage. The senior #1 in the WB stage helps his junior ADD1 in the first clock and he (the junior) performs SUB3 on it in the first clock. In the next clock, ADD4 is performed on the result of the SUB3 and hence the forwarding help from the senior #1 is not needed in the 2nd clock. Hence stalling WB is unnecessary." Explanation of your answer: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________
1.1.2 Miss _________________ (Bruin/Trojan) says that she can replace the above stall generating logic with a toggle Flip-Flop shown on the side. But won’t it be toggling multiple times? What if there is a series of ADD1 instructions? If possible, generate either STALL_12 or STALL_12. Explain how it is or it isn’t possible.________ ________________________________________________________________________________________________________________________________________________________
PC
XA
Reg. File
XA
RA
RDR-Write
0
1
0
10
1
A
Cout
A
Cout
Comp Station in ID Stage
ID_XMEX12
P
IF ID EX12 WBComp Station in ID Stage
Q
ID_XA EX12_RA
P=Q
ID_XMEX12 = ID_XA Matched with EX12_RA
XD
EN
XM
EX12
A-3 A+4
FU
EN
RD
Wri
te
RA
XD
EX12_RA
EX12_ADD4
EX12_SUB3
EX12_ADD1 WB_RA
WB_Write
WB_RDX_Mux
R1_Mux
R2_Mux
SKIP
1
SKIP
2
Qualifying signals
LAB 7 Part 3 with EX1 and EX2 merged Block Diagram
I-MEM
EN
ADD4SUB3
EN
FOR
W
D QCLKCLR
CLK
AD
D4
SUB
3A
DD
1
RA
MO
V
AD
D4
SUB
3A
DD
1
RA
MO
V
EX12_MOV
RESET_B
RESET_BRESET_B
RESET_B
RESET_B
STALL_12STALL_12Q
STALL12
7pts
D QCLK
CLR
CLK
RESET_B
ADD1
8pts
March 22, 2018 10:13 am EE457 MT - Spring 2018 3 / 10 C Copyright 2018 Gandhi Puvvada
1.2 Now to the above design (of Lab 7 P3 SP2 on the top of the previous page), we added an EX3 stage with a MULT2 unit (which multiplies by 2) and a R3_mux (select line labeled SKIP3) to skip this doubling operation. Now we have a total of 8 operations: 4 previous operations (MOV, SUB3, ADD4, ADD1) (abbreviated as MV, S3, A4, A1) and 4 more new operations (which produce double of the result of
those previous four) (2MOV, 2SUB3, 2ADD4, 2ADD1) (abbreviated as 2MV, 2S3, 2A4, 2A1). No more one-hot coding of the opcode. A 3-bit opcode is decoded in the ID stage to produce the 8 one-hot control signals. There is no opcode for a NOP, but the decoder can be disabled from producing any active outputs in order to inject a bubble (during power-on reset) using a Wrist-Band FF. Here, we have both, a RAW stalls in the ID stage (initiated by STALL_ID signal) as well as a stall to allow ADD1 or 2ADD1 (double of ADD1) to perform both SUB3 and ADD4 operations in EX12 (initiated by STALL_12 signal). Complete the design on page 5/10 as well as a few parts below.
1.2.1 Stalls: STALL_ID and STALL_12
(i) they always go active together. T / F(ii) they can go active together or independently. T / F(iii) when they go active together, the STALL_ID gets extended beyond the STALL_12 by just 1-clock. T / F(iv) when STALL_ID occurs without STALL_12, STALL_ID lasts for only one clock.. T / F(v) The entire pipeline (including EX3 and WB) is stalled (a) by both (b) by STALL_ID only (c) by STALL_12 only (d) by neither
1.2.2 Bubbles are injected into the next stage(a) by both (b) by STALL_ID only (c) by STALL_12 only (d) by neither
1.2.3 Circle instructions which cannot help from EX3 stage: MV, S3, A4, A1, 2MV, 2S3, 2A4, 2A1
1.2.4 Circle instructions which do not mind to postpone receiving help until they reach EX3 stage: MV, S3, A4, A1, 2MV, 2S3, 2A4, 2A1
1.2.5 Circle instructions which do not want to receive help for the second time in EX3 stage: MV, S3, A4, A1, 2MV, 2S3, 2A4, 2A1
1.2.6 You want to check to make sure if the senior on whom you (the junior) are dependent (and from whom you are receiving forwarding help) is not a NOP. True / False
1.2.7 It is not necessary to check to see that you yourself are not a NOP, if you are the junior, who is dependent on a senior and receiving forwarding help from him as long as he (the senior) is not a NOP. True / False
1.2.8 It is not necessary to check to see that you yourself are not a NOP, if you are the junior, who is dependent on a senior and are stalling because the senior can not otherwise provide you needed forwarding help in time. True / False
1.2.9 Register File: Your assistant Mr. Bruin forgot to provide Internal forwarding circuitry inside the Register File, so please complete the forwarding mux just outside the register file in the ID stage.
10pts
3pts
3pts
3pts
3pts
3pts
3pts
3pts
March 22, 2018 10:13 am EE457 MT - Spring 2018 4 / 10 C Copyright 2018 Gandhi Puvvada
Generate here the 6 items marked as on next page. Use signal names like EX12_ID_XMEX3 (page total 36 points)
4pts
3pts
3pts
3pts
6pts
7pts
6pts
4pts
STALL_ID
FORW_12A
FORW_3
SID
FORW_12B
FU_12
HDU
SKIP1
SKIP2
SKIP3
FU_3
ID_Write is produced on the side in two ways. Comment on them. Use words like,correct/incorrect, logically equivalent/different, cost wise ..., timing wise...,________________________________________________________________________________________________________________________________Show again here, what you did to the Wrist-band FF and explain your work.________________________________________________________________________________________________________________________________________________________________________________________________
RA
op2op1
op0 CU
D Q
A0A1A2
Y0Y1Y2Y3Y4Y5Y6Y7
EN
op0op1op2
MVS3A4A12MV2S32A42A1
ID_Write
ID_Write
ID_Write #1
ID_Write #2
March 22, 2018 10:13 am
EE
457 MT
- Spring 2018 5 / 10
CC
opyrigh
t 2018 Gan
dh
i Pu
vvada
PCXA
Reg. File
XA
RA
RDR-Write
XD
I-M
EM
EN EN
RA
op2op1
op0 CU
D Q
A0A1A2
Y0Y1Y2Y3Y4Y5Y6Y7
EN
op0op1op2
MVS3A4A12MV2S32A42A1
0
1
0
10
1
A
EX12 WB
A-3
FU_12
EN
RD
Wri
te
RAWB_RA
WB_Write
WB_RD
X12
A_M
ux
R1_MuxR2_Mux
SKIP1
SKIP2
ADD4SUB3
FORW_12A
STALL_12
A
A+4
EN
XD
RA
ID
XD
EN
XD
RA
0
1
FU_3
FORW_3
0
1
X12
B_M
uxFORW_12B
FORW_12A
FORW_12B
EX3
0
1
R3_Mux
SKIP3
MULT2
A
2A
MVS3A4A12MV2S32A42A1
MVS3A4A12MV2S32A42A1
EX3_RAEX12_RA
X3_
Mux
EX3_Wr
ite
HDU
STALL_ID
SID
Modified LAB 7 Part 3 Block Diagram
Q#1.2
Comp Station in ID Stage
ID_XMEX12
P Q
ID_XAEX12_RA
P=Q
ID_XMEX12= ID_XA Matched with EX12_RA
2. Complete the 18 items marked as here on this page. 13*1.5 + 2 WB_FF*2 + 3 Forwards*2.5 = 31
Notes:
4. Produce the 6 items marked as on the previous page.
D QCLKCLR
CLK
Wrist-Band FF
IF
R_B
R_B1. Add low-active R_B (Reset_Bar) control whereever needed using 7.5 pts
0
1ForwardingNo Internal
ID_XMEX3
P Q
ID_XAEX3_RA
P=Q
ID_XMWB
P Q
ID_XAWB_RA
P=Q
Assume 7 more lines like this
XMEX12
XMEX3
XMWB
XMEX12
XMEX3
XMWB
XMEX12
XMEX3
XMWB
EX12_ID_XMEX12
EX12_ID_XMEX3
EX12_ID_XMWB
ID_XMEX12
ID_XMEX3
ID_XMWB
EX3_ID_XMEX12
EX3_ID_XMEX3
EX3_ID_XMWB
WB_ID_XMEX12
WB_ID_XMEX3
WB_ID_XMWB
EX12_RA
WriteID_Write EX12_Write WriteEX12_Write
3. The 3 inferences of the 3 comparators in the ID station are carried through the three pipeline stage registers. Cross off unneeded items (comparators,registers, wires). 9.5 pts
48pts
March 22, 2018 10:13 am EE457 MT - Spring 2018 6 / 10 C Copyright 2018 Gandhi Puvvada
2 ( 8 + 12 + 9 + 9 + 12 = 40 points) 30 min. Shifting stalling from ID to EX stage
Lab 7 Part 1 (3-element adder)
You went through ee457_MT_Sp2012_Q1.3_revised_in_Sp2018_sol.pdf. This question (Q#2.1) is to test your understanding of the same. The revised problem statement and the solution figure are reproduced below for your reference.
Problem statement:Stalling is currently in the ID stage. Your boss wanted you to move the stall to the EX1 stage and she told you something like, "... we can have a stage after the WB called WB_after ...". You do not know if your boss is a Bruin or a Trojan. Discuss the feasibility. If it is feasible discuss the details of the new design including what goes into WB_after, how many comparators are involved in stalling/forwarding, their locations, any changes to forwarding besides stalling, any changes to internally forwarding nature of the RF file to avoid duplication of hardware, overall whether it is desirable or undesirable to do this move. If it is not feasible, state reasons.For this question, let us keep all comparators in the Comp Station in the ID Stage only.
Solution: 3+2 = 5 comparators in ID stage (which include the 3 comparators in IFRF) and 7 comparators in the EX1 stage (total 12 comparators) plus forwarding muxes as shown below.
2.1 Register re-balancing: In this design stalling occurs only in _______ (EX1/EX2) stage. Your VLSI engineer said that EX1 stage has timing problems for whatever reason where as EX2 has slack. So she asks if she can remove the redundant mux in EX1 and retain the mux marked to be removed in EX2? ______ (OK/Not OK).If OK, do you need to add/change one of the 7 comparators in EX1? If it is not OK, what is the reason?____________________________________________________________________________________________________________________________________________________________________________________________________________________________________
A
BS
A
BSPC IM RF
IF ID EX1 EX2 WB
WB_after
8pts
March 22, 2018 10:13 am EE457 MT - Spring 2018 7 / 10 C Copyright 2018 Gandhi Puvvada
2.2 List the two comparators stationed in ID stage whose inference is carried to the EX1 stage.Comparator #1 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). Comparator #2 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). The inferences are carried into EX1 stage and are used in __________ (A/B/C/D) whereA = stalling logic B = forwarding logic C = both stalling and forwarding logic, D = neither
2.3 Let us account for the 7 comparisons in the EX1 stage: The three sources EX1_XA, EX1_YA, and EX1_ZA are compared with _______________________________________ accounting for ____ of the 7 comparators. The rest are: __________________________________________________________________________________________________________________________________________________________________These 7 are used for (complete this sentence using words like forwarding or stalling or both) ___________________________________________________________________________________________________________________________________________________________________
2.4 Let us explore to see how the design changes, if we perform R <= X + 3 + Z, instead of performing R <= X + Y + Z in the above design (i.e. constant 3 in place of variable Y )
2.4.1 Design #1: The original design with stalling in the ID stage and all comparators in the ID stage: You will have ___ in place of original ___comparators in the IFRF. The new RF has ____ RO (Read Only) ports and ____ WO (Write Only) port(s). In addition, you will have ___ in place of original ___ comparators in the comparator station in the ID stage. Cross off forwarding muxes not needed in the block diagram below drawn for R <= X + Y + Z.
2.4.2 Design #2: Now with stalling moved to EX1 stage, how do these previous numbers change? Previous numbers are 3+2 = 5 comparators in ID stage (which include the 3 comparators in IFRF) and 7 comparators in the EX1 stage (total 12 comparators) plus forwarding muxes as shown below.
Now we need ___+ ___ = ___ comparators in ID stage (which include the ___ comparators in IFRF) and ___ comparators in the EX1 stage (total ___ comparators) plus forwarding muxes as shown above (please cross off unneeded muxes to perform X+3+Z.
12pts
9pts
9pts
A
BS
A
BSPC IM RF
IF ID EX1 EX2 WB
X_
MU
XY
_M
UX
Z1
_M
UX
Z2
_M
UX
12pts
March 22, 2018 10:13 am EE457 MT - Spring 2018 8 / 10 C Copyright 2018 Gandhi Puvvada
3 ( 6*3 = 18 + 7 bonus = 25 points) 15 min. Flushing by a successful branch:
3.1 Our Lab 6 Verilog code may have chosen to set or clear the "wrist-band" Flip-Flop (flush bit) in the stage register IF/ID on system reset
using the RESET signal. Accordingly we discussed in class a solution for the Lab 6 Part 4 question on flushing two stages of the 7-stage pipeline.
For this question, let us assume that each designer is allowed to choose to set or clear each "wrist-band" Flip-Flop. He can choose to set one FF and clear another. Also some designers below assumed one delay slot where as some assumed zero delay slots. All 6 designs are correct based on their assumptions. Fill-in the table telling us what assumptions make the designs correct.
All control units are identical and are as per the textbook design (a 1-input means it is an instruction destined to be flushed). Fill-in the table above.
4 ( 14 *1 + 1 bonus = 15 points) 5 min. Virtual MemoryMMU stands for _______________________; TLB stands for ________________________PTBR stands for ____________________________PT (Page Table) (essentially a LUT (Look-Up Table)) _______ has both the LHS (Left-Hand Side) and RHS (Right-Hand Side) of the LUT. A Fully Associative TLB has both sides of LUT. T / F Given _______ (VPN/PPFN) the TLB provides __________ (VPN/PPFN).Given _______ (VPN/PPFN) the page table provides __________ (VPN/PPFN).We ___________ (use / don’t use) parallel search to search the page table. We ___________ (use / don’t use) binary search (also called dictionary search) to search the page table. We ___________ (use / don’t use) indexing to ________ to locate the page table entry in ____________ (one/multiple) accesses to a full-length single-level page table.
25pts
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
PC
cont
rol
RESET RESET
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
PC
cont
rol
RESET RESET
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
PC
cont
rol
RESET RESET
#1 #2 #3
FF1 FF2 FF1 FF2 FF1 FF2
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
PC
cont
rol
RESET RESET
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
PC
cont
rol
RESET RESET
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
PC
cont
rol
RESET RESET
#4 #5 #6
FF1 FF2 FF1 FF2 FF1 FF2
15 pts
March 22, 2018 10:13 am EE457 MT - Spring 2018 9 / 10 C Copyright 2018 Gandhi Puvvada
5 ( 4 + 18 + 10 = 32 points) 15 min. Cache and MM Organization:
A 16-bit data (D15-D0) 32-bit (logical) address byte-addressable processor (address pins: A31-A1, /BE1-/BE0) has its cache and MM organized as shown below. Fill-in the 12 boxes.Also divide the address below into appropriate fields and name the fields.
5.1 What are the drawbacks of the left-side design, which made us select the right-side design? _____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
18pts
4pts
A19 A18 A17 A16A31 A30 A29 A28 A27 A26 A25 A24 A23 A22 A21 A20 A3 A2 A1 A0A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4
BE
1-B
E0
CPU
Cache
16-bitbus
2-w
ay lo
wer
-ord
er
inte
rlea
ved
MM
16
One of the TAG RAMs
Addr
Data-inData-out
Comp
1
Hit
16
Valid
?
10
(5 such TAG RAMs)
=
D7-0D15-8 D7-0D15-8
XCVR XCVR
Note
Note
Address??
16-bitbus
16
Size of one TAG Ram ?
Size of one Byte-wide bank
?
Size of one Comparator
?
Total address space?Degree of Set-Associativity= ?
?
D7-0D15-8D15-D0
D7-0
Address
D7-0D15-8 D7-0D15-8
D7-0D15-8D7-0D15-8
proc
esso
r
Addr.
Data
CacheData RAM
Block 0’s Block 2’sBlock 1’s
Block 3’s Block 4’s
?
D15-8
Address
Addr.
Data
Size of one Byte-wide bank
?
Total cache size in KB______ KB
?
A11-A1
Note
10pts
March 22, 2018 10:13 am EE457 MT - Spring 2018 10 / 10 C Copyright 2018 Gandhi Puvvada
Blank page: Please write your name and email. Tear it off and use for rough work. Do not submit.email. Tear it off and use for rough work. Do not submit.
Student’s Last Name:____________________ email: __________________
It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you. The next four topics, Multi-cycle CPU, pipelined CPU, cache and virtual memory are interesting and challenging too. They are the focus of the midterm exam. Then we cover ad-vanced topics. Best! Gandhi, TAs: Fangzhou, Chao, Mentors: Rui, Pravin HW Graders: Navtej, Rupam Lab graders: Ujwala, Aashish
March 22, 2018 10:13 am EE457 MT - Spring 2018 7 / 10 C Copyright 2018 Gandhi Puvvada
2.2 List the two comparators stationed in ID stage whose inference is carried to the EX1 stage.Comparator #1 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). Comparator #2 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). The inferences are carried into EX1 stage and are used in __________ (A/B/C/D) whereA = stalling logic B = forwarding logic C = both stalling and forwarding logic, D = neither
2.3 Let us account for the 7 comparisons in the EX1 stage: The three sources EX1_XA, EX1_YA, and EX1_ZA are compared with _______________________________________ accounting for ____ of the 7 comparators. The rest are: __________________________________________________________________________________________________________________________________________________________________These 7 are used for (complete this sentence using words like forwarding or stalling or both) ___________________________________________________________________________________________________________________________________________________________________
2.4 Let us explore to see how the design changes, if we perform R <= X + 3 + Z, instead of performing R <= X + Y + Z in the above design (i.e. constant 3 in place of variable Y )
2.4.1 Design #1: The original design with stalling in the ID stage and all comparators in the ID stage: You will have ___ in place of original ___comparators in the IFRF. The new RF has ____ RO (Read Only) ports and ____ WO (Write Only) port(s). In addition, you will have ___ in place of original ___ comparators in the comparator station in the ID stage. Cross off forwarding muxes not needed in the block diagram below drawn for R <= X + Y + Z.
2.4.2 Design #2: Now with stalling moved to EX1 stage, how do these previous numbers change? Previous numbers are 3+2 = 5 comparators in ID stage (which include the 3 comparators in IFRF) and 7 comparators in the EX1 stage (total 12 comparators) plus forwarding muxes as shown below.
Now we need ___+ ___ = ___ comparators in ID stage (which include the ___ comparators in IFRF) and ___ comparators in the EX1 stage (total ___ comparators) plus forwarding muxes as shown above (please cross off unneeded muxes to perform X+3+Z.
12pts
9pts
9pts
A
BS
A
BSPC IM RF
IF ID EX1 EX2 WB
X_
MU
XY
_M
UX
Z1
_M
UX
Z2
_M
UX
12pts