38
Jin-Soo Kim ([email protected]) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Computer Architecture

Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim ([email protected])

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

Jin-Soo Kim ([email protected])Computer Systems Laboratory

Sungkyunkwan Universityhttp://csl.skku.edu

Computer Architecture

Page 2: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 2

Modern PC Architecture

Page 3: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 3

Computer Systems

Page 4: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 4

Program?짬뽕라면

준비시간 :10분, 조리시간 :10분

재료라면 1개, 스프 1봉지, 오징어1/4마리, 호박 1/4개, 양파 1/2개, 양배추 1장, 당근 1/4개, 물 3컵(600cc)

만드는 법1.오징어는 껍질을 벗기고 깨끗하게 씻어 칼집으로 모양을 낸다. 2.호박, 양파, 양배추는 모두 채썬다. 3.냄비에 물 3컵을 붓고 끓인다. 4.물이 끓으면 스프를 넣고 오징어와 야채를 넣어 충분히 맛이 우러나도록 5분 정도 끓여준다. 5.끓으면 면을 넣어 익힌다.

Data

InstructionsSource: http://user.chol.com/~yugenie/yo/jjambong.html

Page 5: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 5

von Neumann Architecture

▪ Separate CPU and memory• Memory holds data and instructions• CPU fetches instructions from memory• CPU manipulates data by performing

arithmetic and logical operations

▪ Stored-program concept

Page 6: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 6

CPU & Memory

Memory CPU

address

data

PCADD r5,r1,r3200

Page 7: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

CPU

Page 8: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 8

CPU▪ Central Processing Unit

Page 9: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 9

The Machine Cycle

Page 10: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 10

Instruction Set Architecture

▪ Supported operations▪ Number of operands▪ Types of operands▪ Addressing modes▪ Fixed vs. variable length▪ Registers visible to the programmer▪ …

Page 11: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 11

Instruction Types

▪ Arithmetic/Logic• Compute a new bit patterns

▪ Data transfer• Copy data from one location to another

(between register and memory)

▪ Control • Direct the execution of the program

Page 12: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 12

Instruction Example

Page 13: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

Memory

Page 14: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 14

Memory

▪ A collection of cells, each with a unique physical address

▪ Both addresses and contents are in binary

Page 15: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 15

CPU-Memory Gap

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1980 1985 1990 1995 2000

ns

y ear

Disk seek time

DRAM access time

SRAM access time

CPU cycle time

Page 16: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 16

Principles of Locality

▪ Temporal locality• Recently referenced items are likely to be

referenced in the near future

▪ Spatial locality• Items with nearby addresses tend to be

referenced close together in time

Page 17: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 17

Memory Hierarchy

Page 18: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 18

Cache Memory▪ Cache

• A small, faster storage• Improves the average access time• Exploits both temporal and spatial locality

Page 19: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

Microprocessors

Page 20: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 20

Early Days

MOS6502(1976)

Intel 4004(1971)

Intel 8008(1972)

Intel 8080(1974)

Intel 8086(1978)

Intel 8088(1979)

Intel 8085(1976)

ZilogZ‐80(1976)

ZilogZ‐8000(1979)

Motorola6800(1974) ( )

Motorola6809/68000(1979)

180KHz2300 TRsUp to 640B

2MHz6000 TRsUp to 64KBCP/M

200KHz3500 TRsUp to 16KB

Apple I, II, II+, IIe

1st microprocessor IBM PC/XT

Altair 8800(1st PC)

Radio ShackTRS‐80

Page 21: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 21

CISC▪ Complex Instruction Set Computer

• Dominant style through mid-80’s• Add instructions to perform “typical”

programming tasks• Arithmetic instructions can access memory• Multiple complex addressing modes• Different instruction formats of varying lengths• Easy for compiler, fewer code bytes• Intel IA-32

Page 22: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 22

RISC

▪ Reduced Instruction Set Computer• Fewer, simple instructions• Register-oriented instruction set• Only load and store instructions can access

memory• Better for optimizing compilers• Can make run fast with simple chip design• ARM, MIPS, PowerPC, SPARC, …

Page 23: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 23

Pipelining in Real Life

Page 24: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 24

Pipelining▪ Sequential execution

▪ Pipelining

IF ID EX WBIF ID EX WB

IF ID EX WB

Clock cycles

IF ID EX WBIF ID EX WB

IF ID EX WBIF ID EX WB

IF ID EX WBIF ID EX WB

Inst’s

Clock cycles

Inst’s

Page 25: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 25

Superscalar▪ Superscalar

• The execution stage has a bunch of different functional units.

• Execute multiple instructions in parallel

IF ID EX WB

Clock cycles

Inst’s

IF ID EX WBIF ID EX WBIF ID EX WB

IF ID EX WBIF ID EX WB

fetch

decode &dispatch

int

float-1

test

address mem-1 mem-2 wb

wb

wb

float-2 float-3

branch

Page 26: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 26

Superpipelining▪ Superpipelining

• Subdivide each pipeline stage• Higher clock speed• 12+ in Pentium Pro/II/III, 20+ in Pentium 4

14 in UltraSparc-III, 16–25 in PowerPC G5

Clock cycles

Inst’s

IF ID EX WB

Page 27: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 27

Superpipelined Superscalar▪ Superpipelining + Superscalar

• 2-way: MIPS R5000• 3-way: PowerPC G3/G4, Pentium ro/II/III/M/4• 4-way: MIPS R10000, PowerPC G4e, Core 2 Duo• 5-issue: PowerPC G5

Clock cycles

Inst’s

IF ID EX WB

Page 28: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 28

Intel i486

▪ The first x86 chip that used more than 1M transistors

▪ 5-stage instruction pipeline▪ One clock cycle to execute simple

instructions▪ On-chip FPU (Floating Point

Unit)

Fetch

D1

D2

Ex

WB

Page 29: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 29

Intel Pentium

▪ The first trademarked Intel processor

▪ 2-way superscalar with 5 stages

▪ Speculative execution with dynamic branch prediction

▪ On-chip separate L1 cache(8KB I$ + 8KB D$)

PF

D1

D2

E

WB

D2

E

WB

u-pipe v-pipe

Page 30: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 30

Intel Pentium Pro

▪ 3-way superscalar▪ Superpipelined▪ Out-of-order execution▪ ISA translation (μops)▪ L1 cache: 8KB I$ + 8KB D$▪ L2 cache: 256KB (separate die)

Fetch

Decode

Execute Execute

WB

in-orderfront-end

in-orderretirement

out-of-ordercore

reorder

reorder

Page 31: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 31

Intel Pentium 4▪ Hyper pipelined for clock rates > 1 GHz

• 20 (Willamette) ~ 31 (Prescott) stages

▪ Execution trace cache▪ L1 cache: 12Kμops I$ + 8KB D$ ▪ L2 cache: 256KB, 8-way (on-chip)▪ Hyper-Threading▪ Max clock rate: 3.80GHz (Prescott)▪ Severe heat problems

Page 32: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 32

Core Microarchitecture

Page 33: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 33

Nehalem Microarchitecture

Page 34: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

Multi-core

Page 35: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 35

Architecture Trends

Page 36: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 36

Challenges▪ Memory wall

• CPU 55%/year, Memory 10%/year (1986~2000)• Caches show diminishing returns

▪ ILP (Instruction-Level Parallelism) wall• Control dependency• Data dependency

▪ Power wall• Dynamic power Frequency3

• Static power Frequency• Total power The number of cores

Page 37: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 37

Performance vs. Power

Raise Clock (20%)

1.73x

1.13x

PER

FOR

MA

NC

E

POW

ER

Lower Clock (20%)

0.51x

0.87x

PER

FOR

MA

NC

E

POW

ER

Power

Performance

1.00x

PER

FOR

MA

NC

E

Single–Core

POW

ER

1.02x

1.73x

PER

FOR

MA

NC

E

POW

ERDual–Core

Source: Intel

Page 38: Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010: Introduction to Comp uter Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu)

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 38

Think Parallel or Perish

The free lunch is over!

Perf

orm

ance

GHz Era Multi-core Era