Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Jin-Soo Kim ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhttp://csl.skku.edu
Computer Architecture
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 2
Modern PC Architecture
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 3
Computer Systems
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 4
Program?짬뽕라면
준비시간 :10분, 조리시간 :10분
재료라면 1개, 스프 1봉지, 오징어1/4마리, 호박 1/4개, 양파 1/2개, 양배추 1장, 당근 1/4개, 물 3컵(600cc)
만드는 법1.오징어는 껍질을 벗기고 깨끗하게 씻어 칼집으로 모양을 낸다. 2.호박, 양파, 양배추는 모두 채썬다. 3.냄비에 물 3컵을 붓고 끓인다. 4.물이 끓으면 스프를 넣고 오징어와 야채를 넣어 충분히 맛이 우러나도록 5분 정도 끓여준다. 5.끓으면 면을 넣어 익힌다.
Data
InstructionsSource: http://user.chol.com/~yugenie/yo/jjambong.html
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 5
von Neumann Architecture
▪ Separate CPU and memory• Memory holds data and instructions• CPU fetches instructions from memory• CPU manipulates data by performing
arithmetic and logical operations
▪ Stored-program concept
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 6
CPU & Memory
Memory CPU
address
data
PCADD r5,r1,r3200
CPU
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 8
CPU▪ Central Processing Unit
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 9
The Machine Cycle
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 10
Instruction Set Architecture
▪ Supported operations▪ Number of operands▪ Types of operands▪ Addressing modes▪ Fixed vs. variable length▪ Registers visible to the programmer▪ …
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 11
Instruction Types
▪ Arithmetic/Logic• Compute a new bit patterns
▪ Data transfer• Copy data from one location to another
(between register and memory)
▪ Control • Direct the execution of the program
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 12
Instruction Example
Memory
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 14
Memory
▪ A collection of cells, each with a unique physical address
▪ Both addresses and contents are in binary
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 15
CPU-Memory Gap
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1980 1985 1990 1995 2000
ns
y ear
Disk seek time
DRAM access time
SRAM access time
CPU cycle time
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 16
Principles of Locality
▪ Temporal locality• Recently referenced items are likely to be
referenced in the near future
▪ Spatial locality• Items with nearby addresses tend to be
referenced close together in time
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 17
Memory Hierarchy
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 18
Cache Memory▪ Cache
• A small, faster storage• Improves the average access time• Exploits both temporal and spatial locality
Microprocessors
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 20
Early Days
MOS6502(1976)
Intel 4004(1971)
Intel 8008(1972)
Intel 8080(1974)
Intel 8086(1978)
Intel 8088(1979)
Intel 8085(1976)
ZilogZ‐80(1976)
ZilogZ‐8000(1979)
Motorola6800(1974) ( )
Motorola6809/68000(1979)
180KHz2300 TRsUp to 640B
2MHz6000 TRsUp to 64KBCP/M
200KHz3500 TRsUp to 16KB
Apple I, II, II+, IIe
1st microprocessor IBM PC/XT
Altair 8800(1st PC)
Radio ShackTRS‐80
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 21
CISC▪ Complex Instruction Set Computer
• Dominant style through mid-80’s• Add instructions to perform “typical”
programming tasks• Arithmetic instructions can access memory• Multiple complex addressing modes• Different instruction formats of varying lengths• Easy for compiler, fewer code bytes• Intel IA-32
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 22
RISC
▪ Reduced Instruction Set Computer• Fewer, simple instructions• Register-oriented instruction set• Only load and store instructions can access
memory• Better for optimizing compilers• Can make run fast with simple chip design• ARM, MIPS, PowerPC, SPARC, …
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 23
Pipelining in Real Life
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 24
Pipelining▪ Sequential execution
▪ Pipelining
IF ID EX WBIF ID EX WB
IF ID EX WB
Clock cycles
IF ID EX WBIF ID EX WB
IF ID EX WBIF ID EX WB
IF ID EX WBIF ID EX WB
Inst’s
Clock cycles
Inst’s
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 25
Superscalar▪ Superscalar
• The execution stage has a bunch of different functional units.
• Execute multiple instructions in parallel
IF ID EX WB
Clock cycles
Inst’s
IF ID EX WBIF ID EX WBIF ID EX WB
IF ID EX WBIF ID EX WB
fetch
decode &dispatch
int
float-1
test
address mem-1 mem-2 wb
wb
wb
float-2 float-3
branch
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 26
Superpipelining▪ Superpipelining
• Subdivide each pipeline stage• Higher clock speed• 12+ in Pentium Pro/II/III, 20+ in Pentium 4
14 in UltraSparc-III, 16–25 in PowerPC G5
Clock cycles
Inst’s
IF ID EX WB
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 27
Superpipelined Superscalar▪ Superpipelining + Superscalar
• 2-way: MIPS R5000• 3-way: PowerPC G3/G4, Pentium ro/II/III/M/4• 4-way: MIPS R10000, PowerPC G4e, Core 2 Duo• 5-issue: PowerPC G5
Clock cycles
Inst’s
IF ID EX WB
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 28
Intel i486
▪ The first x86 chip that used more than 1M transistors
▪ 5-stage instruction pipeline▪ One clock cycle to execute simple
instructions▪ On-chip FPU (Floating Point
Unit)
Fetch
D1
D2
Ex
WB
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 29
Intel Pentium
▪ The first trademarked Intel processor
▪ 2-way superscalar with 5 stages
▪ Speculative execution with dynamic branch prediction
▪ On-chip separate L1 cache(8KB I$ + 8KB D$)
PF
D1
D2
E
WB
D2
E
WB
u-pipe v-pipe
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 30
Intel Pentium Pro
▪ 3-way superscalar▪ Superpipelined▪ Out-of-order execution▪ ISA translation (μops)▪ L1 cache: 8KB I$ + 8KB D$▪ L2 cache: 256KB (separate die)
Fetch
Decode
Execute Execute
WB
in-orderfront-end
in-orderretirement
out-of-ordercore
reorder
reorder
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 31
Intel Pentium 4▪ Hyper pipelined for clock rates > 1 GHz
• 20 (Willamette) ~ 31 (Prescott) stages
▪ Execution trace cache▪ L1 cache: 12Kμops I$ + 8KB D$ ▪ L2 cache: 256KB, 8-way (on-chip)▪ Hyper-Threading▪ Max clock rate: 3.80GHz (Prescott)▪ Severe heat problems
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 32
Core Microarchitecture
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 33
Nehalem Microarchitecture
Multi-core
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 35
Architecture Trends
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 36
Challenges▪ Memory wall
• CPU 55%/year, Memory 10%/year (1986~2000)• Caches show diminishing returns
▪ ILP (Instruction-Level Parallelism) wall• Control dependency• Data dependency
▪ Power wall• Dynamic power Frequency3
• Static power Frequency• Total power The number of cores
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 37
Performance vs. Power
Raise Clock (20%)
1.73x
1.13x
PER
FOR
MA
NC
E
POW
ER
Lower Clock (20%)
0.51x
0.87x
PER
FOR
MA
NC
E
POW
ER
Power
Performance
1.00x
PER
FOR
MA
NC
E
Single–Core
POW
ER
1.02x
1.73x
PER
FOR
MA
NC
E
POW
ERDual–Core
Source: Intel
ICE2010: Introduction to Computer Engineering (Spring 2011) – Jin-Soo Kim ([email protected]) 38
Think Parallel or Perish
The free lunch is over!
Perf
orm
ance
GHz Era Multi-core Era