Lecture 1: Introduction Microcomputer... · ISA vs. Computer Architecture Old definition of computer architecture = instruction set design Other aspects of computer design called

Lecture 1:

Introduction

Dr. Eng. Amr T. Abdel-Hamid

Elect 707

Winter 2014

Co

mp

ute

r Arc

hite

ctu

re

Text book slides: Computer Architecture: A Quantitative Approach 5th Edition, John L. Hennessy & David A. Patterso with modifications.

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

CPU History in a Flash

Intel 4004 (1971): 4-bit processor,

2312 transistors, 0.4 MHz,

10 micron PMOS, 11 mm2 chip

• Processor is the new transistor?

• RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm2 chip

• 125 mm2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+Dcache

– RISC II shrinks to ~ 0.02 mm2 at 65 nm

– Caches via DRAM or 1 transistor SRAM

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Processor Performance In

troductio

n

RISC

Move to multi-processor

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Snapdragon 805 processor specs

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Classes of Computers

Personal Mobile Device (PMD) e.g. start phones, tablet computers

Emphasis on energy efficiency and real-time

Desktop Computing Emphasis on price-performance

Servers Emphasis on availability, scalability, throughput

Clusters / Warehouse Scale Computers Used for “Software as a Service (SaaS)”

Emphasis on availability and price-performance

Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks

Embedded Computers Emphasis: price

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Instruction Set Architecture: Critical Interface

instruction set

software

hardware

Properties of a good abstraction Lasts through many generations (portability)

Used in many different ways (generality)

Provides convenient functionality to higher levels

Permits an efficient implementation at lower levels

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

ISA vs. Computer Architecture

Old definition of computer architecture

= instruction set design

Other aspects of computer design called implementation

Insinuates implementation is uninteresting or less challenging

Our view is: computer architecture >> ISA

Architect’s job much more than instruction set design; techni

cal hurdles today more challenging than those in instruction

set design

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Computer Architecture is Design and Analysis

Design

Analysis

Architecture is an iterative process: • Searching the space of possible designs • At all levels of computer systems

Creativity

Good Ideas

Mediocre Ideas Bad Ideas

Cost / Performance Analysis

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Administrivia

Instructor: Dr. Amr T. Abdel-Hamid

Office: C3-320

Email: [email protected]

Office Hours: Monday, 3rd & 4th Lectures.

T. A: ???

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Administrivia

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Course Grading

Exams

Quizzes 3 Quizzes: best 2 10 %

Mid Term 30 %

Final exam 40 %

Project 20%

Assignments Not Graded

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Project

Phase 0: Select your partner (17/9/2014)

Submit list of your group members (2-4 per group)

Submit a Comparision between Risc and Cisc Proc.

Phase 1:

.

.

.

.

Phase N: Project Implementation + Report (2 weeks before fin

als)

FINAL Non-Negotiable deadline

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

In time & It is too LATE Policy

In phases 0, & 1:

5% of project grade penalty per day for being late

In phase 2, to n:

No late presentation is possible.

Honor code

100% penalty for both copier and copy-giver of

Any Report/CODE.

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

1. Take Advantage of Parallelism

2. Principle of Locality

3. Focus on the Common Case

4. Amdahl’s Law

5. The Processor Performance Equation

Quantitative Principles of Design

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

1) Taking Advantage of Parallelism

Increasing throughput of server computer via multiple processors or multi

ple disks

Detailed HW design (DSD course shortly)

Carry lookahead adders uses parallelism to speed up computing sum

s from linear to logarithmic in number of bits per operand

Multiple memory banks searched in parallel in set-associative caches

Pipelining: overlap instruction execution to reduce the total time to compl

ete an instruction sequence.

Not every instruction depends on immediate predecessor executin

g instructions completely/partially in parallel possible

Classic 5-stage pipeline:

1) Instruction Fetch (Ifetch),

2) Register Read (Reg),

3) Execute (ALU),

4) Data Memory Access (Dmem),

5) Register Write (Reg)

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

2) The Principle of Locality The Principle of Locality:

Program access a relatively small portion of the address space at any instant of time.

Two Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be r

eferenced again soon (e.g., loops, reuse)

Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access)

Last 30 years, HW relied on locality for memory perf.

P MEM $

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Levels of the Memory Hierarchy

CPU Registers 100s Bytes 300 – 500 ps (0.3-0.5 ns)

L1 and L2 Cache 10s-100s K Bytes ~1 ns - ~10 ns $1000s/ GByte

Main Memory G Bytes 80ns- 200ns ~ $100/ GByte

Disk 10s T Bytes, 10 ms (10,000,000 ns) ~ $1 / GByte

Capacity Access Time Cost

Tape infinite sec-min ~$1 / GByte

Registers

L1 Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

Staging Xfer Unit

prog./compiler 1-8 bytes

cache cntl 32-64 bytes

OS 4K-8K bytes

user/operator Mbytes

Upper Level

Lower Level

faster

Larger

L2 Cache cache cntl 64-128 bytes Blocks

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

3) Focus on the Common Case

Common sense guides computer design Since its engineering, common sense is valuable

In making a design trade-off, favor the frequent case over the infrequent case E.g., Instruction fetch and decode unit used more frequently th

an multiplier, so optimize it 1st

E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st

Frequent case is often simpler and can be done faster than the infrequent case E.g., overflow is rare when adding 2 numbers, so improve perf

ormance by optimizing more common case of no overflow

May slow down overflow, but overall performance improved by optimizing for the normal case

What is frequent case and how much performance improved by making case faster => Amdahl’s Law

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

4) Amdahl’s Law

enhanced

enhancedenhanced

new

oldoverall

Speedup

Fraction Fraction

1

ExTime

ExTime Speedup

1

Best you could ever hope to do:

enhancedmaximum Fraction - 1

1 Speedup

enhanced

enhancedenhancedoldnew Speedup

FractionFraction ExTime ExTime 1

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Amdahl’s Law example

New CPU 10X faster

I/O bound server, so 60% time waiting for I/O

56.1

64.0

1

10

0.4 0.4 1

1

Speedup

Fraction Fraction 1

1 Speedup

enhanced

enhancedenhanced

overall

• Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

5) Processor performance equation

CPU time = Seconds = Instructions x Cycles x Seconds

Program Program Instruction Cycle

Inst Count CPI Clock Rate

Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

inst count CPI Cycle time

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

5 Steps of MIPS Datapath Figure A.2, Page A-8

Memory Access

Write Back

Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

L M D

ALU

MU

X

Mem

ory

Reg F

ile

MU

X MU

X

Data

Mem

ory

MU

X

Sign Extend

4

Adder Zero? Next SEQ PC

Address

Next PC

WB Data

Inst

RD

RS1

RS2

Imm

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns


Memory Access

Write Back

Instruction Fetch


Execute Addr. Calc

ALU

Mem

ory

Reg F

ile

MU

X MU

X

Data

Mem

ory

MU

X

Sign Extend

Zero?

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

4

Adder

Next SEQ PC Next SEQ PC

RD RD RD WB

Dat

a

Next PC

Address

RS1

RS2

Imm

MU

X

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns


Memory Access

Write Back

Instruction Fetch


Execute Addr. Calc

ALU

Mem

ory

Reg F

ile

MU

X MU

X

Data

Mem

ory

MU

X

Sign Extend

Zero?

IF/ID

ID/E

X

ME

M/W

B

EX

/ME

M

4

Adder

Next SEQ PC Next SEQ PC

RD RD RD WB

Dat

a

• Data stationary control – local decode for each instruction phase / pipeline stage

Next PC

Address

RS1

RS2

Imm

MU

X

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Visualizing Pipelining Figure A.2, Page A-8

I n s t r.

O r d e r

Time (clock cycles)

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Pipelining is not quite that easy!

Limits to pipelining: Hazards prevent next instruction from exec

uting during its designated clock cycle

Structural hazards: HW cannot support this combination of instruc

tions (single person to fold and put clothes away)

Data hazards: Instruction depends on result of prior instruction stil

l in the pipeline (missing sock)

Control hazards: Caused by delay between the fetching of instruct

ions and decisions about changes in control flow (branches and ju

mps).

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

One Memory Port/Structural Hazards Figure A.4, Page A-14

I n s t r.

O r d e r

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg


Reg

ALU

DMem Ifetch Reg

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

One Memory Port/Structural Hazards (Similar to Figure A.5, Page A-15)

I n s t r.

O r d e r

Time (clock cycles)

Load

Instr 1

Instr 2

Stall

Instr 3

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg


Reg

ALU

DMem Ifetch Reg

Bubble Bubble Bubble Bubble Bubble

How do you “bubble” the pipe?

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Speed Up Equation for Pipelining

pipelined

dunpipeline

TimeCycle

TimeCycle

CPI stall Pipeline CPI Ideal

depth Pipeline CPI Ideal Speedup

pipelined

dunpipeline

TimeCycle

TimeCycle

CPI stall Pipeline 1

depth Pipeline Speedup

Instper cycles Stall Average CPI Ideal CPIpipelined

For simple RISC pipeline, CPI = 1:

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Example: Dual-port vs. Single-port

Machine A: Dual ported memory (“Harvard Architecture”)

Machine B: Single ported memory, but its pipelined implementatio

n has a 1.05 times faster clock rate

Ideal CPI = 1 for both

Loads are 40% of instructions executed

SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)

= Pipeline Depth

SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05)

= (Pipeline Depth/1.4) x 1.05

= 0.75 x Pipeline Depth

SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

Machine A is 1.33 times faster

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

I n s t r.

O r d e r

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Data Hazard on R1 Figure A.6, Page A-17

Time (clock cycles)

IF ID/RF EX MEM WB

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Read After Write (RAW)

InstrJ tries to read operand before InstrI writes it

Caused by a “Dependence” (in compiler nomenclature). This ha

zard results from an actual need for communication.

Three Generic Data Hazards

I: add r1,r2,r3

J: sub r4,r1,r3

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Write After Read (WAR) InstrJ writes operand before InstrI reads it

Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”.

Can’t happen in MIPS 5 stage pipeline because:

All instructions take 5 stages, and

Reads are always in stage 2, and

Writes are always in stage 5

I: sub r4,r1,r3

J: add r1,r2,r3

K: mul r6,r1,r7


Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns


Write After Write (WAW)

InstrJ writes operand before InstrI writes it.

Called an “output dependence” by compiler writers

This also results from the reuse of name “r1”.

Can’t happen in MIPS 5 stage pipeline because:

All instructions take 5 stages, and

Writes are always in stage 5

Will see WAR and WAW in more complicated pipes

I: sub r1,r4,r3

J: add r1,r2,r3

K: mul r6,r1,r7

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Time (clock cycles)

Forwarding to Avoid Data Hazard Figure A.7, Page A-19

I n s t r.

O r d e r

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

HW Change for Forwarding Figure A.23, Page A-37

ME

M/W

R

ID/E

X

EX

/ME

M

Data Memory

ALU

mux

m

ux

Registers

NextPC

Immediate

mux

What circuit detects and resolves this hazard?

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

38

Time (clock cycles)

Forwarding to Avoid LW-SW Data Hazard Figure A.8, Page A-20

I n s t r.

O r d e r

add r1,r2,r3

lw r4, 0(r1)

sw r4,12(r1)

or r8,r6,r9

xor r10,r9,r11

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Time (clock cycles)

I n s t r.

O r d e r

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

or r8,r1,r9

Data Hazard Even with Forwarding Figure A.9, Page A-21

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Data Hazard Even with Forwarding (Similar to Figure A.10, Page A-21)

Time (clock cycles)

or r8,r1,r9

I n s t r.

O r d e r

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

Reg

ALU

DMem Ifetch Reg

Reg Ifetch

ALU

DMem Reg Bubble

Ifetch

ALU

DMem Reg Bubble Reg

Ifetch

ALU

DMem Bubble Reg

How is this detected?

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Control Hazard on Branches

Three Stage Stall

10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9

36: xor r10,r1,r11

Reg ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

Reg

ALU

DMem Ifetch Reg

What do you do with the 3 instructions in between? How do you do it?

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Branch Stall Impact

If CPI = 1, 30% branch,

Stall 3 cycles => new CPI = 1.9!

Two part solution:

Determine branch taken or not sooner, AND

Compute taken branch address earlier

MIPS branch tests if register = 0 or 0

MIPS Solution:

Move Zero test to ID/RF stage

Adder to calculate new PC in ID/RF stage

1 clock cycle penalty for branch versus 3

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Adder

IF/ID

Pipelined MIPS Datapath Figure A.24, page A-38

Memory Access

Write Back

Instruction Fetch


Execute Addr. Calc

ALU

Mem

ory

Reg F

ile

MU

X

Data

Mem

ory

MU

X

Sign Extend

Zero?

ME

M/W

B

EX

/ME

M

4

Adder

Next SEQ PC

RD RD RD WB

Dat

a

• Interplay of instruction set design and cycle time.

Next PC

Address

RS1

RS2

Imm M

UX

ID/E

X

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Current Trends in Architecture

Cannot continue to leverage Instruction-Level parallelism (ILP) Single processor performance improvement ended in

2003

New models for performance: Data-level parallelism (DLP)

Thread-level parallelism (TLP)

Request-level parallelism (RLP)

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Parallelism

Classes of parallelism in applications: Data-Level Parallelism (DLP)

Task-Level Parallelism (TLP)

Classes of architectural parallelism: Instruction-Level Parallelism (ILP)

Vector architectures/Graphic Processor Units (GPUs)

Thread-Level Parallelism

Request-Level Parallelism

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Trends in Technology

Integrated circuit technology Transistor density: 35%/year

Die size: 10-20%/year

Integration overall: 40-55%/year

DRAM capacity: 25-40%/year (slowing)

Flash capacity: 50-60%/year 15-20X cheaper/bit than DRAM

Magnetic disk technology: 40%/year 15-25X cheaper/bit then Flash

300-500X cheaper/bit than DRAM

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Bandwidth and Latency

Bandwidth or throughput Total work done in a given time

10,000-25,000X improvement for processors

300-1200X improvement for memory and disks

Latency or response time Time between start and completion of an event

30-80X improvement for processors

6-8X improvement for memory and disks

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Power and Energy

Problem: Get power in, get power out

Thermal Design Power (TDP) Characterizes sustained power consumption

Used as target for power supply and cooling system

Lower than peak power, higher than average power consumption

Clock rate can be reduced dynamically to limit power consumption

Energy per task is often a better measurement

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Dynamic Energy and Power

Dynamic energy Transistor switch from 0 -> 1 or 1 -> 0

½ x Capacitive load x Voltage2

Dynamic power ½ x Capacitive load x Voltage2 x Frequency switched

Reducing clock rate reduces power, not energy

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Power

Intel 80386 consumed ~ 2 W

3.3 GHz Intel Core i7 consumes 130 W

Heat must be dissipated from 1.5 x 1.5 cm chip

This is the limit of what can be cooled by air

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Reducing Power

Techniques for reducing power:

Do nothing well

Dynamic Voltage-Frequency Scaling

Low power state for DRAM, disks

turning off cores

Dr. A

mr T

ala

at

Elect 707

Co

mp

ute

r Arc

hite

ctu

re a

nd

Ap

plic

atio

ns

Reading Assignment:

Chapter 1, Appendix B

Documents

Lecture 1: Introduction Microcomputer... · ISA vs. Computer Architecture Old definition of computer architecture = instruction set design Other aspects of computer design called