29
KILO-INSTRUCTION PROCESSORS Arzucan Özgür Department of Computer Engineering Boğaziçi University 15.12.2005 Cmpe 511

KILO-INSTRUCTION PROCESSORS

  • Upload
    chacha

  • View
    41

  • Download
    1

Embed Size (px)

DESCRIPTION

KILO-INSTRUCTION PROCESSORS. Arzucan Özgür Department of Computer Engineering Boğaziçi University. 15.12.2005 Cmpe 511. Introduction. Memory Wall. 60%/yr. 1000. CPU. “Moore’s Law”. 100. Processor-Memory Performance Gap: (grows 50% / year). Performance. 10. RAM 7%/yr. - PowerPoint PPT Presentation

Citation preview

Page 1: KILO-INSTRUCTION PROCESSORS

KILO-INSTRUCTION PROCESSORS

Arzucan Özgür

Department of Computer EngineeringBoğaziçi University

15.12.2005 Cmpe 511

Page 2: KILO-INSTRUCTION PROCESSORS

Introduction

Page 3: KILO-INSTRUCTION PROCESSORS

Memory Wall

Performance improvements of high-frequency micro-processors is seriously limited by main memory access latencies

60%/yr.

RAM7%/yr.

1

10

100

10001980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

RAM

CPU

1982

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

Page 4: KILO-INSTRUCTION PROCESSORS

Reducing Memory Latency

Page 5: KILO-INSTRUCTION PROCESSORS

Cache memory hierarchies

Cache memory hierarchies First level (L1) cache built into

the processor core Takes 1-3 processor clock

cycles to access

If there is a miss in the L1 cache on-chip L2 cache accessed in the order of 10 processor cycles

Accessing main memory takes at least in the order of 100 processor cycles

Prefetching data from memory to the cache

Prefetch addresses hard to predict

QueueScheduleScheduleScheduleDispatchDispatch

Reg. ReadReg. ReadExecute

FlagsBr. chkDrive

DriveAlloc.

RenameRename

Next IPNext IPFetchFetch L1

Instr.

L1Data

L2

Mem

ory

Bra

nc h

mis

pr e

dic

tion

Page 6: KILO-INSTRUCTION PROCESSORS

Out-of-order superscalar processors

Page 7: KILO-INSTRUCTION PROCESSORS

Sequence of instructions containing data cashe misses

Page 8: KILO-INSTRUCTION PROCESSORS

Kilo-Instruction Processors

Page 9: KILO-INSTRUCTION PROCESSORS

Definition

An out-of-order superscalar processor that supports thousands of “in-flight instructions”

Intelligent use of resources

Page 10: KILO-INSTRUCTION PROCESSORS
Page 11: KILO-INSTRUCTION PROCESSORS
Page 12: KILO-INSTRUCTION PROCESSORS

Scalability

Thousands of In-flight Instructions and In-Order Commit make designs impractical: ROB : Needs to maintain a copy of every in-flight

instruction IQs : Instructions depending on long latency instructions

remain in these queues for a long time LSQs : Instructions remain in the queue until commit Registers : A new physical register for each instruction

producing a new value

We would like to get the IPC of thousands of instructions in-flight without drastically increasing resource requirements

Page 13: KILO-INSTRUCTION PROCESSORS

Efficient Kilo-Instruction Processor Design

Multi-Checkpointing the ROB Out-of-Order Commit

Early Release of Resources Ephemeral Registers Load Queues

Page 14: KILO-INSTRUCTION PROCESSORS

Checkpointing

Page 15: KILO-INSTRUCTION PROCESSORS

Checkpointing

ROB allows of the restoration of the correct state at any instruction (not necessary)

Checkpoint a snapshot of the processor state taken at a specific instruction of the program being executed (checkpoint processor state for a subset of instructions)

With this snapshot the processor can restore state to that point in case of an exception or misprediction

Page 16: KILO-INSTRUCTION PROCESSORS

Design Decisions

How many in-flight checkpoints should be maintained by the processor? large number of checkpoints reduce the penalty of the

recovery process large number of checkpoints increase the implementation

cost

What kind of instructions should be checkpointed? take a checkpoint at any instruction some instructions are better candidates (ex:some current

processors take checkpoints at branch instructions in order to minimize the branch misprediction penalty)

How much information should be kept by each checkpoint?

Page 17: KILO-INSTRUCTION PROCESSORS

Multicheckpointing

Page 18: KILO-INSTRUCTION PROCESSORS

Selective Checkpointing

Replace ROB Pseudo-ROB Processor removes instructions that reach the

pseudo-ROB’s head at fixed rate

Processor state is recovarable for any instruction in the pseudo-ROB

Checkpoint taken when incomplete instruction leaves the pseudo-ROB

Page 19: KILO-INSTRUCTION PROCESSORS

Instruction Queue Management

Page 20: KILO-INSTRUCTION PROCESSORS

Bi-level Issue Queue

Processor detects instructions that will hold an issue queue for a long time

Removes this instructions from primary issue queue

Offloads them to slow-lane instruction queue larger, slower, less complex

Same principle applied to load-store queue

Page 21: KILO-INSTRUCTION PROCESSORS

Physical Register File

Page 22: KILO-INSTRUCTION PROCESSORS

Ephemeral Registers

A conventional superscalar processor assigns registers to architected registers when an instruction enters the issue queue

An instruction reserves a physical register for its entire flight time

A physical register not written a value until much later primary function is tracking data dependencies

Use virtual registers late register allocation Release register if no other instruction that reads

the data early release

Page 23: KILO-INSTRUCTION PROCESSORS

Performance Evaluation

Page 24: KILO-INSTRUCTION PROCESSORS
Page 25: KILO-INSTRUCTION PROCESSORS
Page 26: KILO-INSTRUCTION PROCESSORS

Kilo-Instruction Multiprocessors

Page 27: KILO-INSTRUCTION PROCESSORS

Ideal Network

0

0.5

1

1.5

2

2.5

3

3.5

FFT RADIX LU MP3D WATER

16 processors

IPC

ROB 64

ROB 128

ROB 512

ROB 1024

ROB 2048

Page 28: KILO-INSTRUCTION PROCESSORS

References

Adrian Cristal, Oliverio J. Santana, Francisco Cazorla, Marco Galluzzi, Tanausu Ramirez, Miquel Pericas, Mateo Valero. "Kilo-Instruction Processors: Overcoming the Memory Wall," IEEE Micro, vol. 25,  no. 3,  pp. 48-57,  May/June,  2005.

A. Cristal, O. Santana, M. Valero, and J.F. Martínez. Toward kilo-instruction processors. In ACM Trans. on Architecture and Code Optimization, Vol. 1, No. 4, Dec. 2004

Marco Galluzzi, Valentin Puente, Adrián Cristal, Ramón Beivide, José-Ángel Gregorio, Mateo Valero, A first glance at Kilo-instruction based multiprocessors, Conf. Computing Frontiers 2004: 212-221

Page 29: KILO-INSTRUCTION PROCESSORS

Thank you!