Computer Architecturecsl.skku.edu/uploads/ICE2010S11/5-arch.pdf · 2011-04-11 · ICE2010:...

Preview:

Citation preview

Jin-Soo Kim (jinsookim@skku.edu)Computer Systems Laboratory

Sungkyunkwan Universityhttp://csl.skku.edu

Computer Architecture

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 2

Modern PC Architecture

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 3

Computer Systems

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 4

Program?짬뽕라면

준비시간 :10분, 조리시간 :10분

재료라면 1개, 스프 1봉지, 오징어1/4마리, 호박 1/4개, 양파 1/2개, 양배추 1장, 당근 1/4개, 물 3컵(600cc)

만드는 법1.오징어는 껍질을 벗기고 깨끗하게 씻어 칼집으로 모양을 낸다. 2.호박, 양파, 양배추는 모두 채썬다. 3.냄비에 물 3컵을 붓고 끓인다. 4.물이 끓으면 스프를 넣고 오징어와 야채를 넣어 충분히 맛이 우러나도록 5분 정도 끓여준다. 5.끓으면 면을 넣어 익힌다.

Data

InstructionsSource: http://user.chol.com/~yugenie/yo/jjambong.html

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 5

von Neumann Architecture

▪ Separate CPU and memory• Memory holds data and instructions• CPU fetches instructions from memory• CPU manipulates data by performing

arithmetic and logical operations

▪ Stored-program concept

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 6

CPU & Memory

Memory CPU

address

data

PCADD r5,r1,r3200

CPU

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 8

CPU▪ Central Processing Unit

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 9

The Machine Cycle

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 10

Instruction Set Architecture

▪ Supported operations▪ Number of operands▪ Types of operands▪ Addressing modes▪ Fixed vs. variable length▪ Registers visible to the programmer▪ …

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 11

Instruction Types

▪ Arithmetic/Logic• Compute a new bit patterns

▪ Data transfer• Copy data from one location to another

(between register and memory)

▪ Control • Direct the execution of the program

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 12

Instruction Example

Memory

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 14

Memory

▪ A collection of cells, each with a unique physical address

▪ Both addresses and contents are in binary

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 15

CPU-Memory Gap

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1980 1985 1990 1995 2000

ns

y ear

Disk seek time

DRAM access time

SRAM access time

CPU cycle time

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 16

Principles of Locality

▪ Temporal locality• Recently referenced items are likely to be

referenced in the near future

▪ Spatial locality• Items with nearby addresses tend to be

referenced close together in time

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 17

Memory Hierarchy

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 18

Cache Memory▪ Cache

• A small, faster storage• Improves the average access time• Exploits both temporal and spatial locality

Microprocessors

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 20

Early Days

MOS6502(1976)

Intel 4004(1971)

Intel 8008(1972)

Intel 8080(1974)

Intel 8086(1978)

Intel 8088(1979)

Intel 8085(1976)

ZilogZ‐80(1976)

ZilogZ‐8000(1979)

Motorola6800(1974) ( )

Motorola6809/68000(1979)

180KHz2300 TRsUp to 640B

2MHz6000 TRsUp to 64KBCP/M

200KHz3500 TRsUp to 16KB

Apple I, II, II+, IIe

1st microprocessor IBM PC/XT

Altair 8800(1st PC)

Radio ShackTRS‐80

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 21

CISC▪ Complex Instruction Set Computer

• Dominant style through mid-80’s• Add instructions to perform “typical”

programming tasks• Arithmetic instructions can access memory• Multiple complex addressing modes• Different instruction formats of varying lengths• Easy for compiler, fewer code bytes• Intel IA-32

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 22

RISC

▪ Reduced Instruction Set Computer• Fewer, simple instructions• Register-oriented instruction set• Only load and store instructions can access

memory• Better for optimizing compilers• Can make run fast with simple chip design• ARM, MIPS, PowerPC, SPARC, …

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 23

Pipelining in Real Life

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 24

Pipelining▪ Sequential execution

▪ Pipelining

IF ID EX WBIF ID EX WB

IF ID EX WB

Clock cycles

IF ID EX WBIF ID EX WB

IF ID EX WBIF ID EX WB

IF ID EX WBIF ID EX WB

Inst’s

Clock cycles

Inst’s

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 25

Superscalar▪ Superscalar

• The execution stage has a bunch of different functional units.

• Execute multiple instructions in parallel

IF ID EX WB

Clock cycles

Inst’s

IF ID EX WBIF ID EX WBIF ID EX WB

IF ID EX WBIF ID EX WB

fetch

decode &dispatch

int

float-1

test

address mem-1 mem-2 wb

wb

wb

float-2 float-3

branch

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 26

Superpipelining▪ Superpipelining

• Subdivide each pipeline stage• Higher clock speed• 12+ in Pentium Pro/II/III, 20+ in Pentium 4

14 in UltraSparc-III, 16–25 in PowerPC G5

Clock cycles

Inst’s

IF ID EX WB

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 27

Superpipelined Superscalar▪ Superpipelining + Superscalar

• 2-way: MIPS R5000• 3-way: PowerPC G3/G4, Pentium ro/II/III/M/4• 4-way: MIPS R10000, PowerPC G4e, Core 2 Duo• 5-issue: PowerPC G5

Clock cycles

Inst’s

IF ID EX WB

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 28

Intel i486

▪ The first x86 chip that used more than 1M transistors

▪ 5-stage instruction pipeline▪ One clock cycle to execute simple

instructions▪ On-chip FPU (Floating Point

Unit)

Fetch

D1

D2

Ex

WB

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 29

Intel Pentium

▪ The first trademarked Intel processor

▪ 2-way superscalar with 5 stages

▪ Speculative execution with dynamic branch prediction

▪ On-chip separate L1 cache(8KB I$ + 8KB D$)

PF

D1

D2

E

WB

D2

E

WB

u-pipe v-pipe

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 30

Intel Pentium Pro

▪ 3-way superscalar▪ Superpipelined▪ Out-of-order execution▪ ISA translation (μops)▪ L1 cache: 8KB I$ + 8KB D$▪ L2 cache: 256KB (separate die)

Fetch

Decode

Execute Execute

WB

in-orderfront-end

in-orderretirement

out-of-ordercore

reorder

reorder

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 31

Intel Pentium 4▪ Hyper pipelined for clock rates > 1 GHz

• 20 (Willamette) ~ 31 (Prescott) stages

▪ Execution trace cache▪ L1 cache: 12Kμops I$ + 8KB D$ ▪ L2 cache: 256KB, 8-way (on-chip)▪ Hyper-Threading▪ Max clock rate: 3.80GHz (Prescott)▪ Severe heat problems

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 32

Core Microarchitecture

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 33

Nehalem Microarchitecture

Multi-core

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 35

Architecture Trends

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 36

Challenges▪ Memory wall

• CPU 55%/year, Memory 10%/year (1986~2000)• Caches show diminishing returns

▪ ILP (Instruction-Level Parallelism) wall• Control dependency• Data dependency

▪ Power wall• Dynamic power Frequency3

• Static power Frequency• Total power The number of cores

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 37

Performance vs. Power

Raise Clock (20%)

1.73x

1.13x

PER

FOR

MA

NC

E

POW

ER

Lower Clock (20%)

0.51x

0.87x

PER

FOR

MA

NC

E

POW

ER

Power

Performance

1.00x

PER

FOR

MA

NC

E

Single–Core

POW

ER

1.02x

1.73x

PER

FOR

MA

NC

E

POW

ERDual–Core

Source: Intel

ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim (jinsookim@skku.edu) 38

Think Parallel or Perish

The free lunch is over!

Perf

orm

ance

GHz Era Multi-core Era

Recommended