Agenda - EECS Instructional Support Group Home Pagecs61c/fa10/lectures/08...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/15/10 16 Stages&of&the&Datapath&(1/5)& • There&is&awide&variety&of&MIPS&instrucWons:&so&

9/15/10

1

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

Instructors: Randy H. Katz

David A. PaHerson hHp://inst.eecs.Berkeley.edu/~cs61c/fa10

1 Fall 2010 -‐-‐ Lecture #8 9/15/10

Agenda

•  Technology Trends Revisited •  Administrivia

•  Technology Break •  Components of a Computer

9/15/10 Fall 2010 -‐-‐ Lecture #8 2

9/15/10

2

Agenda



9/15/10 Fall 2010 -‐-‐ Lecture #8 3

Levels of RepresentaWon/InterpretaWon

lw $t0, 0($2) lw $t1, 4($2) sw $t1, 0($2) sw $t0, 4($2)

High Level Language Program (e.g., C)

Assembly Language Program (e.g., MIPS)

Machine Language Program (MIPS)

Hardware Architecture DescripCon (e.g., block diagrams)

Compiler

Assembler

Machine Interpreta4on

temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;

0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 !

Logic Circuit DescripCon (Circuit SchemaCc Diagrams)

Architecture Implementa4on

Anything can be represented as a number,

i.e., data or instrucWons

9/15/10 4 Fall 2010 -‐-‐ Lecture #8

9/15/10

3

Technology Life Cycle: Where Are We?

9/14/10 Fall 2010 -‐-‐ Lecture #8 5

Technology Cost over Time

9/15/10 Fall 2010 -‐-‐ Lecture #8 6

Cost $

Time

A

B

C

D

9/15/10

4

Technology Cost over Time

9/15/10 Fall 2010 -‐-‐ Lecture #8 7

Cost $

Time

C

Technology Cost over Time Successive GeneraWons

9/15/10 Fall 2010 -‐-‐ Lecture #8 8

Cost $

Time

Technology GeneraWon 1




9/15/10

5

Moore’s Law

9/15/10 Fall 2010 -‐-‐ Lecture #8 9

Predicts: 2X Transistors / chip every 2 years

Gordon Moore Intel Cofounder B.S. Cal 1950!

# of tran

sistors on

an integrated

circuit (IC)

Year

Memory Chip Size

9/15/10 Fall 2010 -‐-‐ Lecture #8 10

4x in 3 years 2x in 3 years

Growth in memory capacity slowing

9/15/10

6

Technology Trends: Uniprocessor Performance (SPECint)

9/15/10 Fall 2010 -‐-‐ Lecture #8 11

Improvements in processor performance have slowed Why?

Limits to Performance: Faster Means More Power

9/15/10 Fall 2010 -‐-‐ Lecture #8 12

P = CV2f

9/15/10

7

P = C V2 f

•  What is the effect on power consumpWon of: – “Simpler” implementaWon (fewer transistors)?

– Smaller implementaWon (shrunk down design)? – Reduced voltage? –  Increased clock frequency?

9/15/10 Fall 2010 -‐-‐ Lecture #8 13

Doing Nothing Well—NOT!

•  TradiWonal processors consume about two thirds as much power at idle (doing nothing) as they do at peak

•  Higher performance (server class) processors approaching 300 W at peak

•  ImplicaWons for baHery life?

9/15/10 Fall 2010 -‐-‐ Lecture #8 14

9/15/10

8

Limits to Performance: Processor-‐Storage Performance Gap

9/14/10 Fall 2010 -‐-‐ Lecture #8 15

09/2010: WD 2 TByte Sata Drive $109.99 @ Amazon.com

9/15/10 Fall 2010 -‐-‐ Lecture #8 16

Disk Drive Capacity/Performance Over Time

9/15/10

9

Internet ConnecWon Bandwidth Over Time

9/15/10 Fall 2010 -‐-‐ Lecture #8 17

50% annualized growth rate per year

Wireless Bandwidth

9/14/10 Fall 2010 -‐-‐ Lecture #8 18

9/15/10

10

Computer Technology: Growing, But More Slowly

•  Processor –  Speed 2x / 1.5 years (since ’85) [slowing!] –  100X performance last decade –  When you graduate: 4 GHz, 32 Cores

•  Memory (DRAM) –  Capacity: 2x / 2 years (since ’96) [slowing!] –  64x size last decade –  When you graduate: 128 GBytes

•  Disk –  Capacity: 2x / 1 year (since ’97) –  250X size last decade –  When you graduate: 8 Tbytes

•  Network –  Core: 2x every 2 years –  Access: 100-‐1000 mbps from home, 1-‐10 mbps cellular

9/14/10 19 Fall 2010 -‐-‐ Lecture #8

Agenda

•  Components of a Computer •  Administrivia

•  Technology Break

9/14/10 Fall 2010 -‐-‐ Lecture #8 20

9/15/10

11

Announcements

•  Notes: They are on line—use them! •  First HW soluWons posted—see Class Web

•  Experiment: move HW/Project due dates to Saturday from Friday for next three assignments (Project 1, Parts 1 & 2; HW #3)

•  Exam: 6 October, 6-‐9 PM, 1 Pimentel

•  Free book on Warehouse-‐Scale Computers—on the course web site

9/15/10 Fall 2010 -‐-‐ Lecture #8 21

Announcements

•  Grading –  Class ParWcipaWon (5%) – Homework (5%) –  Labs (20%) –  Projects (40%)

•  Computer Instruc2on Set Simulator •  Data Parallelism (Map-‐Reduce on EC2) •  Computer Processor Design (Logisim) •  Performance Tuning of a Parallel ApplicaWon (partnered)

– Midterm (10%) –  Final (20%)

9/15/10 Fall 2010 -‐-‐ Lecture #1 22

Effort Par2cipa2on Altruism

9/15/10

12

Agenda



9/15/10 Fall 2010 -‐-‐ Lecture #8 23

Five Components of a Computer

•  Control •  Datapath •  Memory •  Input •  Output

9/15/10 Fall 2010 -‐-‐ Lecture #8 24

John von Neumann invented this architecture"

9/15/10

13

Reality Check: Typical MIPS Chip Die Photograph

9/15/10 Fall 2010 -‐-‐ Lecture #8 25

Integer Control and Datapath

Performance Enhancing On-‐Chip Memory (iCache + dCache)

FloaWng Pt Control and Datapath

ProtecWon-‐ oriented Virtual Memory Support

Example MIPS Block Diagram

9/15/10 Fall 2010 -‐-‐ Lecture #8 26

9/15/10

14

A MIPS Family (Toshiba)

9/14/10 Fall 2010 -‐-‐ Lecture #8 27

The CPU

•  Processor (CPU): the acWve part of the computer, which does all the work (data manipulaWon and decision-‐making)

•  Datapath: porWon of the processor which contains hardware necessary to perform operaWons required by the processor (the brawn)

•  Control: porWon of the processor (also in hardware) which tells the datapath what needs to be done (the brain)

9/15/10 28 Fall 2010 -‐-‐ Lecture #8

9/15/10

15

Stages of the Datapath : Overview

•  Problem: a single, atomic block which “executes an instrucWon” (performs all necessary operaWons beginning with fetching the instrucWon) would be too bulky and inefficient

•  SoluWon: break up the process of “execuWng an instrucWon” into stages or phases, and then connect the phases to create the whole datapath –  Smaller phases are easier to design –  Easy to opWmize (change) one phase without touching the others

9/15/10 29 Fall 2010 -‐-‐ Lecture #8

InstrucWon Level Parallelism

9/15/10 Fall 2010 -‐-‐ Lecture #8 30

P 1

Instr 1 IF ID ALU MEM WR

P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 P 12

Instr 2 IF ID ALU MEM WR IF ID ALU MEM WR Instr 2

Instr 3 IF ID ALU MEM WR

IF ID ALU MEM WR Instr 4





9/15/10

16

Stages of the Datapath (1/5)

•  There is a wide variety of MIPS instrucWons: so what general steps do they have in common?

•  Stage 1: Instruc2on Fetch – No maHer what the instrucWon, the 32-‐bit instrucWon word must first be fetched from memory (the cache-‐memory hierarchy)

– Also, this is where we Increment PC (that is, PC = PC + 4, to point to the next instrucWon: byte addressing so + 4)

9/14/10 31 Fall 2010 -‐-‐ Lecture #8


•  Stage 2: Instruc2on Decode – Upon fetching the instrucWon, we next gather data from the fields (decode all necessary instrucWon data)

– First, read the opcode to determine instrucWon type and field lengths

– Second, read in data from all necessary registers •  For add, read two registers •  For addi, read one register •  For jal, no reads necessary

9/14/10 32 Fall 2010 -‐-‐ Lecture #8

9/15/10

17


•  Stage 3: ALU (ArithmeWc-‐Logic Unit) – Real work of most instrucWons is done here: arithmeWc (+, -‐, *, /), shizing, logic (&, |), comparisons (slt)

– What about loads and stores? •  lw $t0, 40($t1) •  Address we are accessing in memory = the value in $t1 PLUS the value 40

•  So we do this addiWon in this stage

9/14/10 33 Fall 2010 -‐-‐ Lecture #8


•  Stage 4: Memory Access – Actually only the load and store instrucWons do anything during this phase; the others remain idle during this phase or skip it all together

– Since these instrucWons have a unique step, we need this extra phase to account for them

– As a result of the cache system, this phase is expected to be fast

9/14/10 34 Fall 2010 -‐-‐ Lecture #8

9/15/10

18


•  Stage 5: Register Write – Most instrucWons write the result of some computaWon into a register

– E.g.,: arithmeWc, logical, shizs, loads, slt

– What about stores, branches, jumps? •  Don’t write anything into a register at the end •  These remain idle during this fizh phase or skip it all together

9/14/10 35 Fall 2010 -‐-‐ Lecture #8

Laptop Innards

9/14/10 Fall 2010 -‐-‐ Lecture #8 36

9/15/10

19

Server Internals

9/14/10 Fall 2010 -‐-‐ Lecture #8 37

Server Internals

9/14/10 Fall 2010 -‐-‐ Lecture #8 38

Google Server

9/15/10

20

The ARM Inside the iPhone

9/14/10 Fall 2010 -‐-‐ Lecture #8 39

ARM Architecture

•  hHp://en.wikipedia.org/wiki/ARM_architecture

9/14/10 Fall 2010 -‐-‐ Lecture #8 40

9/15/10

21

iPhone Innards

9/14/10 Fall 2010 -‐-‐ Lecture #8 41

1 GHz ARM Cortex A8

Processor Families: Tradeoff between Cost and Performance

9/14/10 Fall 2010 -‐-‐ Lecture #8 42

Dryston

e MIPS

9/15/10

22

Summary

•  Key Technology Trends and LimitaWons –  Transistor doubling BUT power constraints and latency consideraWons limit performance improvement

–  (Single Processor) computers are about as fast as they are likely to get, exploit parallelism to go faster

•  Five Components of a Computer –  Processor/Control + Datapath – Memory –  Input/Output: Human interface/KB + Mouse, Display, Storage … evolving to speech, audio, video

•  Architectural Family: One InstrucWon Set, Many ImplementaWons

9/15/10 Fall 2010 -‐-‐ Lecture #8 43

Documents

Agenda - EECS Instructional Support Group Home Pagecs61c/fa10/lectures/08...Instr7 IF& ID& ALU MEM WR Instr8 IF& ID& ALU MEM WR 9/15/10 16 Stages&of&the&Datapath&(1/5)& • There&is&awide&variety&of&MIPS&instrucWons:&so&