45
© Mark Redekopp, All rights reserved EE 357 Unit 21 Final Review

EE 357 Unit 21

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EE 357 Unit 21

© Mark Redekopp, All rights reserved

EE 357 Unit 21

Final Review

Page 2: EE 357 Unit 21

© Mark Redekopp, All rights reserved

A LOOK BACK

EE 357 in review…

Page 3: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Where EE 357 Fits

• CS 101,102,105,201– Programming with high-

level languages (HLL’s) like C / C++/ Java

• EE 101,201– Digital hardware

(registers, adders, muxes)

C / C++ / Java

Logic Gates

Transistors

HWHW

SWSW

Voltage / Currents

Applications

Functional Units

(Registers, Adders, Muxes)

Page 4: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Where EE 357 Fits

• CS 101,102,105,201– Programming with high-

level languages (HLL’s) like C / C++/ Java

• EE 101,201– Digital hardware (registers,

adders, muxes)

• EE 357– Computer organization and

architecture• HW/SW System Perspective

– Topics• HW/SW interface

• System Software

• Assembly Language

• Computer Architecture

C / C++ / Java

Logic Gates

Transistors

HWHW

SWSW

Voltage / Currents

Assembly /

Machine Code

Applications

LibrariesOS

Processor / Memory / I/O

Functional Units

(Registers, Adders, Muxes)

Page 5: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Where Computer Architecture Fits

Computer Architecture

Software Development

(Parallel Programming,

Memory Hierarchy effects)

Operating Systems + Compilers

Embedded Devices +

Applications

IC/VLSI/Digital Design

New Technology

Page 6: EE 357 Unit 21

© Mark Redekopp, All rights reserved

EE 357 in Context

EE 357 EE 457 EE 557EE 653

Software Development

( CS 303, EE 451?)

Operating Systems + Compilers

(CS 402 / CS 410)

IC/Digital Design

(EE 477L, EE 438L)

New Technology

(EE 337L)

Embedded Devices + Applications

(EE 459L, EE 454L, EE 579, EE 483,

EE 434L, BME 302, BME 405L)

Page 7: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Architecture

• Multicore

– Power, multithreaded/multicore, parallel programming

• Reliability

– Smaller transistors leads to reliability issues (bits can be flipped

accidentally, transistors can “break”, etc.)

– How do we add reliability into the architecture

• Mobile and network-centric

– Low power, software/hardware decomposition

• Architectural Concepts

– Make the common case fast (Amdahl’s law)

– Concept of caching (save your work and reuse it next time you

need it) applies to almost anything (SW programs can “cache”

their results, too), not just “cache memory”

– Often easier to tackle throughput rather than latency

Page 8: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Digital Design & VLSI

• Build the structures and implement the algorithms that

architects, signal processing, and communications

engineers develop given power, area, and performance

constraints

• Focus on SoC (System-on-chip)

– Combine processor core + IP cores => Embedded System– http://en.wikipedia.org/wiki/List_of_semiconductor_IP_core_vendors

– http://www.design-reuse.com

• Learn Verilog and/or VHDL

• Understand advantages of FPGA vs. custom chips

• Learn to program (likely in C/C++ or maybe Python)

• Take EE 477L (VLSI design) as a technical elective

Page 9: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Programming

• Understand the hardware architecture you are working on

and how to take advantage of it

– Effects of Cache

• Sequential access is best

• Static allocation (fewer pointer based data structures) is often better

• Cluster accesses to same data…don’t use it, do a lot of other stuff, then reuse

it if you don’t have to

– Thread Level Parallelism (Create parallel tasks)

• OpenMP, MPI, Native Threads

– Data Level Parallelism (SIMD)

• Compiler options, intrinsics

• Understand the cost of parallelization

– Amdahl’s law

– Cost of communications/synchronization

Page 10: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Embedded Systems

• Complex systems which integrate and

interface to many I/O devices can be

cheaply and efficiently made with

programmable microcontrollers

• Identify your I/O and computational needs

and select an appropriate

microcontroller/microprocessor

Page 11: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Microcontrollers

• Freescale (www.freescale.com)

• Atmel – AVR (www.atmel.com)

• PIC (www.microchip.com)

• ARM (www.arm.com)

Page 12: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Embedded Systems Devices

• Parallel I/O

– Character LCD’s

– LED’s, Switches, Pushbuttons

• Serial I/O

– LCD’s (IIC)

– GPS (RS-232)

– USB

– Bluetooth (direct or RS-232)

– Wireless (direct or RS-232)

– Real Time Clocks (IIC)

– Servo Motors

• Analog Inputs

– Pressure, temperature,

biometric sources

– Touch screens/sensors

• Web sites

– Digikey.com

– Jameco.com

– Sparkfun.com

Page 13: EE 357 Unit 21

© Mark Redekopp, All rights reserved

REVIEW FOR FINAL

Page 14: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Final Jeopardy

Binary

Brainteasers

Performance

Puzzles

Memory

Madness

Processor

Predicaments

Programming

Pickles

100 100 100 100 100

200 200 200 200 200

300 300 300 300 300

400 400 400 400 400

500 500 500 500 500

Page 15: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Binary Brainteaser 100

• Given the binary string “10001101”, what

would its decimal equivalent be assuming

a 2’s complement representation?

• ANSWER: -128+8+4+1 = -115

Page 16: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Binary Brainteaser 200

• Assuming the 12-bit IEEE shortened FP

format, what is the decimal equivalent of

the following number?

• ANSWER: -1.100010*23 = -1100.010 =

-12.25

1 10010 100010

Page 17: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Binary Brainteaser 300

• Under what conditions does overflow

occur in signedsigned arithmetic

(addition/subtraction)?

• ANSWER: when p+p=n or n+n=p

Page 18: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Binary Brainteaser 400

• Under what conditions does overflow

occur in unsignedunsigned arithmetic

(addition/subtraction)?

• ANSWER: If adding, when Cout=1, if

subtracting, when Cout=0

Page 19: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Binary Brainteaser 500

• Given the following normalized FP

number, what would the result be after

using the round-to-nearest method?

+1.011011 100 * 25

• ANSWER: Round to 0 in the LSB, so

round up to +1.011100*25

Page 20: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Performance Puzzle 100

• What is the best metric for performance

measurement (i.e. not subject to

manipulation or misleading results)?

• ANSWER: Time (not rates like MIPS, CPI,

etc.)

Page 21: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Performance Puzzle 200

• What are the three basic components of

the performance equation learned in

class?

• ANSWER: Instruction Count, Average

CPI, Clock cycle time/period

Page 22: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Performance Puzzle 300

• Which of the three components of the

performance equation would be affected

by the choice of compiler?

• ANSWER: Instruction Count and probably

CPI since the instruction mix it selects will

affect the CPI

Page 23: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Performance Puzzle 400

• State Amdahl’s law for speedup?

• ANSWER: Speedup = 1 / [frac.unenhanced + frac.enhanced/improvement factor]

Page 24: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Performance Puzzle 500

• Using Amdahl’s law, if 50% of the instructions of

a program can be sped up by a factor of 2, will

the speedup of the program be 1 / 0.75 = 4/3??

• ANSWER: No, because the 50% represents

instruction count and not time those instruction

require. Those 50% instructions may take 2 CPI

while the other 50% instruction average 10 CPI.

Thus improving them by two times will not yield

the 4/3 speedup.

Page 25: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Memory Madness 100

• SDRAM will allow consecutive

(columns/rows/banks) to be read/written in

bursts?

• ANSWER: Columns

Page 26: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Memory Madness 200

• In a 4-way set associative cache with 512

total blocks, how many bits will be used to

index the set (i.e. the set field of the

address breakdown)?

• ANSWER: 512/4 = 128 sets => 7-bits

Page 27: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Memory Madness 300

• DRAM (may / will not) lose its content

even though power is continuously

provided and in general is (faster / slower)

to access than SRAM

Page 28: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Memory Madness 400

• In general, caches closer to the processor

core are (smaller / larger) so that they can

be faster. In addition, they usually have a

(lower / higher) degree of associativity?

Page 29: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Memory Madness 500

• In a 4-way set-associative cache with 128

sets, the worst cache performance will

occur when all accesses map to different

blocks in (the same / different) set(s) and

the earliest an eviction can occur is on the

(1st/ 4th/ 5th/ 128th/ 129th) block access.

Page 30: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Processor Predicaments 100

• What is the ideal throughput (IPC) of a

pipelined CPU?

• ANSWER: 1 (1 instruction completing

every cycle)

Page 31: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Processor Predicaments 200

• Name the three kinds of hazards that

prevent the pipeline from being kept full?

• ANSWER: Structural Hazards, Data

Hazards, Control Hazards

Page 32: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Processor Predicaments 300

• What method(s) can be used to solve the

following Read-After-Write data

hazards/dependencies or at least reduce

the associated stall penalty?

– Forwarding (bypassing) in HW

– Rearranging instructions (by the compiler)

Page 33: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Processor Predicaments 400

• Temporary registers are needed in the

(single- / multi-) cycle CPU. An example

of a temporary register is the (PC / IR).

Page 34: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Processor Predicaments 500

• The (single- / multi-) cycle CPU

architecture implies variable CPI’s for

different instruction classes and the clock

cycle time is set by the longest (state /

instruction) delay.

Page 35: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Programming Pickles 100

• When checking the status of an I/O device

one can rely on interrupts or __________?

• ANSWER: Polling/Busy looping

Page 36: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Programming Pickles 200

• Calling a subroutine requires using the

(bsr / bra) instruction and will result in the

return address being stored (on the stack /

in A7)?

Page 37: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Programming Pickles 300

• The stack frame of a subroutine includes

space for three sections of data, what are

they?

• ANSWER:

– Local variables

– Saved registers

– Arguments for subroutines

Page 38: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Programming Pickles 400

• System calls/TRAPS, interrupts, and error

conditions cause breaks in normal

program execution. What is the name we

give to these events?

• ANSWER: Exceptions

Page 39: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Programming Pickles 500

• What is the name we use for software

routines associated with an interrupt or

other error event?

• ANSWER: handler routines

Page 40: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Cache Operation Example

• Address Trace

– R: 0x3c0

– W: 0x048

– R: 0x3d4

– W: 0xb50

• Operations

– Hit

– Fetch block XX

– Evict block XX

(w/ or w/o WB)

– Final WB of block XX)

• Perform address breakdown and apply

address trace

• 2-Way Set-Assoc, N=8, B=8 words

Processor

Access

Cache Operation

R: 0x3c0 Fetch Block 3c0-3df

W: 0x048 Fetch Block 040-05f

R: 0x3d4 Hit

W: 0xb50 Evict 040-05f w/ WB,

Fetch b40-b5f

Done! Final WB of b40-b5f

Address Tag Set Word Unused

0x3c0 0011 1 10 000 00

0x048 0000 0 10 010 00

0x3d4 0011 1 10 101 00

0xb50 1011 0 10 100 00

Page 41: EE 357 Unit 21

© Mark Redekopp, All rights reserved

DBNZ on Multicycle CPU

• Many looping operations require decrementing a

counter and branching it the new value is zero

• Many instructions sets include an instruction that

we will term DBNZ (Decrement and Branch if

Not Zero)

• Format: DBNZ $rs, disp

• Operation:

– $rs = $rs – 1

– if $rs /= 0, branch to PC+4+dispOpcode Rs Rt Displacement

6-bits 5-bits 5-bits

(copy of Rs)

16-bits

Page 42: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Modified Datapath for DBNZP

C

Memory

Addr.

Read

Data

Write

Data

Me

mR

ea

d

0

1

Me

mW

rite

Instruc.

Reg.

Instruc[31:26]

Instruc[25:0]

IRW

rite

Register File

Read

Reg. 1 #

Read

Reg. 2 #

Write

Reg. #

Write

Data

Read

data 1

Read

data 2

0

1

0

1

AL

U Res.

Zero

0

1

01

23

Sign

Extend

Sh.

Left 2ALU

control

0

1

2

Target

Reg.

Sh.

Left 2

[15:11]4

[20:16]

[25:21]

[15:0]

[5:0]

Reg

Write

PC[31:28]

16 32

26 30

32

PC

Wri

te

AL

US

elA

ALUSelB

PCSource

TargetWrite

IorD

RegDst

MemtoReg

41

Page 43: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Multi-cycle CPU FSM w/ DBNZMemRead

ALUSelA=0

IorD=0

IRWrite

ALUSelB=01

ALUOp=00

PCSource=00

PCWrite

ALUSelA=0

ALUSelB=11

ALUOp=00

TargetWrite

ALUSelA=1

ALUSelB=00

ALUOp=01

PCWriteCond

PCSource=01

PCWrite

PCSource=10

(Op=‘BEQ’)

(Op=‘JMP’)

Branch

Completion

Jump

Completion

Instruc. Fetch Instruc. Decode +

Reg. Fetch

01

8 9

Reset

ALUSelA=1

ALUSelB=100

ALUOp=01

PCWriteCond

PCSource=01

RegDst=0

MemtoReg=0

RegWrite

Page 44: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Addressing Modes Review

20003004

20000500

20000506

2000050a

2000050c

2000050e

20000510

20000512

20000514

.data

PTR: .long 0x2000300c

.long 0x20003010

DAT: .long -1,1

RES: .space 4

.text

MAIN MOVEA.L #DAT,A1

MOVEA.L -4(A1),A0

MOVE.L (A0),D5

MOVE.L -(A0),D6

ADD.L D5,D6

OR.L D5,D6

LSL.L #1,D6

MOVE.L D6,RES

RES =

A1=

A0=

D5=

D6=

A0=

D6=

N,Z,V,C=

D6=

D6=

Page 45: EE 357 Unit 21

© Mark Redekopp, All rights reserved

Addressing Modes Review

20003004

20000500

20000506

2000050a

2000050c

2000050e

20000510

20000512

20000514

.data

PTR: .long 0x2000300c

.long 0x20003010

DAT: .long -1,1

RES: .space 4

.text

MAIN: MOVEA.L #DAT,A1

MOVEA.L -4(A1),A0

MOVE.L (A0),D5

MOVE.L -(A0),D6

ADD.L D5,D6

OR.L D5,D6

LSL.L #1,D6

MOVE.L D6,RES

RES = 0x20003014

A1= 0x2000300c

A0= 0x20003010

D5= 0x00000001

D6= 0xffffffff

A0= 0x2000300c

D6= 0x00000000

N,Z,V,C= 0,1,0,1

D6= 0x00000001

D6= 0x00000002