28
EPIC Architecture (Explicitly Parallel Instruction Computing) Yangyang Wen CDA5160--Advanced Computer Architecture I University of Central Florida

EPIC Architecture (Explicitly Parallel Instruction Computing)

  • Upload
    makani

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

EPIC Architecture (Explicitly Parallel Instruction Computing). Yangyang Wen CDA5160--Advanced Computer Architecture I University of Central Florida. Outline. What is EPIC? EPIC Philosophy Architectural Features Supporting EPIC Intel’s IA-64 Architectural Features IA-64’s Key Technologies - PowerPoint PPT Presentation

Citation preview

Page 1: EPIC Architecture (Explicitly Parallel Instruction Computing)

EPIC Architecture(Explicitly Parallel Instruction Computing)

Yangyang Wen

CDA5160--Advanced Computer Architecture IUniversity of Central Florida

Page 2: EPIC Architecture (Explicitly Parallel Instruction Computing)

OutlineOutline

What is EPIC?EPIC PhilosophyArchitectural Features Supporting EPICIntel’s IA-64 Architectural FeaturesIA-64’s Key TechnologiesSummary and Reference

Page 3: EPIC Architecture (Explicitly Parallel Instruction Computing)

Traditional Architectures:Traditional Architectures: Limited Parallelism Limited Parallelism

CompilerCompiler parallelizedparallelizedcodecode

HardwareHardware

multiplemultiple functional unitsfunctional units

Original SourceOriginal SourceCodeCode

Sequential MachineSequential MachineCodeCode

Execution Units Available Execution Units Available the execution units are not

used efficientlyToday’s Processors often 60% Idle

Page 4: EPIC Architecture (Explicitly Parallel Instruction Computing)

EPIC Architecture: Explicit ParallelismEPIC Architecture: Explicit Parallelism

Increases Parallel Execution

Original SourceOriginal SourceCodeCode

CompileCompile

HardwareHardware multiple functional unitsmultiple functional units

......

......

Get more efficient use Get more efficient use of execution resourcesof execution resources

Better Parallel machine CodeBetter Parallel machine Code

EPIC Compiler EPIC Compiler Views WiderViews Wider

ScopeScope

CompilerCompiler

Page 5: EPIC Architecture (Explicitly Parallel Instruction Computing)

What is EPIC ?What is EPIC ?

EPIC means Explicitly Parallel Instruction computing, and EPIC architecture provides features that allow compilers to take a proactive role in enhancing Instruction level parallelism( ILP) without unacceptable hardware complexity.

Page 6: EPIC Architecture (Explicitly Parallel Instruction Computing)

EPIC’s PerformanceEPIC’s Performance

Page 7: EPIC Architecture (Explicitly Parallel Instruction Computing)

EPIC Design PhilosophyEPIC Design Philosophy

EPIC permits the compiler have advanced features to enhance ILP: predication, speculation.

EPIC can design the plan of execution (POE) at compile-time and communicate the POE to the hardware.

EPIC must have massive hardware resources for parallel execution

Page 8: EPIC Architecture (Explicitly Parallel Instruction Computing)

Introducing IA-64Introducing IA-64

IA-64 comes from Intel and is the first 64-bit architecture for Intel.

The first instance of a commercially available EPIC ISA.

The first architecture to bring ILP features to general-purpose microprocessors.

Page 9: EPIC Architecture (Explicitly Parallel Instruction Computing)

IA-64’s Architectural BasicsIA-64’s Architectural Basics

Explicit Parallelism Enhanced ILP Compiler-oriented Extremely large physical memory A huge virtual address space for applications 64-bit computation Extremely large register files

Page 10: EPIC Architecture (Explicitly Parallel Instruction Computing)
Page 11: EPIC Architecture (Explicitly Parallel Instruction Computing)

IA-64’s Key TechnologiesIA-64’s Key Technologies

Instructions BundlingPredicationControl SpeculationData SpeculationSoftware pipelining

Page 12: EPIC Architecture (Explicitly Parallel Instruction Computing)

Instruction BundlingInstruction Bundling

Uses a form of VLIW architecture Three Instructions are combined into a 128-bit

instruction Parallel Instructions are executed in groups Template bits decode and route instructions

and mark the end of groups of parallel instructions.

Instruction 1Instruction 1 Instruction 0Instruction 0 TemplateTemplate

128-bit bundle128-bit bundle00127127

Insrtruction2Insrtruction2

41-bits41-bits

Page 13: EPIC Architecture (Explicitly Parallel Instruction Computing)

ILP BottlenecksILP Bottlenecks Branches

– Deal with branch, take predication.– Branch mispredications cause 20% to 30% loss in

processor performance .

Memory latency– Latency is the time it takes to get data from

memory. The longer it takes you to access memory to get code and data, the longer the CPU sits idle.

– For memory latency, it's the loads that are the big problem, not the stores.

Page 14: EPIC Architecture (Explicitly Parallel Instruction Computing)

Predication Predication

If A>BIf A>B S+=AS+=Aelseelse S+=BS+=Bend ifend if

If A>BIf A>B

S+=BS+=B

S+=AS+=A

*P=S*P=S

Branching is a major cause of lost performance.

If A>BIf A>B

The predication is wrongThe predication is wrong

Predicate S+=APredicate S+=A

Throw away S+=AThrow away S+=A

S+=BS+=B

(a) Traditional predication (b) IA-64 predication

Page 15: EPIC Architecture (Explicitly Parallel Instruction Computing)

Processor checks predicationProcessor checks predicationand stores correct resultsand stores correct results

Processor executes both Processor executes both

paths in parallelpaths in parallel

Instructions are Instructions are packed into bundlespacked into bundles

Branch CandidateBranch Candidate

Compiler finds what instsCompiler finds what insts

to execute in parallelto execute in parallel

Instructions are Instructions are

marked with IDmarked with ID

EPIC Predication Process

Page 16: EPIC Architecture (Explicitly Parallel Instruction Computing)

Predication BenefitsPredication Benefits

Reduce branches

Reduce mispredication penalties

Reduce critical paths

Page 17: EPIC Architecture (Explicitly Parallel Instruction Computing)

Control SpeculationControl Speculation

ld.s r8=a[ ]ld.s r8=a[ ]instr 1instr 1instr 2instr 2brbr

chk.s r8chk.s r8use use

IA-64 ArchitecturesIA-64 Architectures

instr 1instr 1instr 2instr 2. . .. . .brbr

Load a[ ]Load a[ ]useuse

Traditional ArchitecturesTraditional Architectures

Allows elevation of load, Allows elevation of load, even above a brancheven above a branch

BarrierBarrier

Memory latency is a major performance bottleneck

Elevating the load above a Elevating the load above a branch is not possiblebranch is not possible

Page 18: EPIC Architecture (Explicitly Parallel Instruction Computing)

Introducing the Token BitIntroducing the Token Bit

ld.s r8=a[ ]ld.s r8=a[ ] instr 1instr 1instr 2instr 2brbr

chk.s r8chk.s r8use use

PropagatePropagateExceptionException

;Exception Detection;Exception Detection

;Exception Delivery;Exception Delivery

IA-64IA-64

When elevate ld, give an exception detection If the load address is valid, it’s normal. If the load address is invalid, compiler sets

token bit ,and jumps out of this path. If the code goes to chk.s, and the chk.s detects

the token bit,jumps to fix-up code,executes the load.

Page 19: EPIC Architecture (Explicitly Parallel Instruction Computing)

Data SpeculationData Speculation

instr 1instr 1instr 2instr 2. . .. . .storestore

loadloaduseuse

BarrierBarrier

Traditional ArchitecturesTraditional Architectures

load.aload.ainstr 1instr 1instr 2instr 2storestore

load.cload.cuse use

IA-64IA-64

Allows the compiler to elevate Allows the compiler to elevate the load ,even it isn’t sure if the the load ,even it isn’t sure if the memory reference overlaps.memory reference overlaps.

Can’t elevate the load, so prevents from reordering insts

ALATALAT

Chk.aChk.a

Page 20: EPIC Architecture (Explicitly Parallel Instruction Computing)

Advanced Load Address Table: Advanced Load Address Table: ALATALAT

reg # Address

reg # Address

reg # Address...

ld.a reg# =...

storeWhen elevate ld.a,insert When elevate ld.a,insert ALATALATWhen store, remove overlap When store, remove overlap address records in ALATaddress records in ALATWhen chk.a,if no address is When chk.a,if no address is found ,there is a conflict, and found ,there is a conflict, and jumps to fix-up code to jumps to fix-up code to reexecute the code reexecute the code

chk.a reg#?

Page 21: EPIC Architecture (Explicitly Parallel Instruction Computing)

Speculation BenefitsSpeculation BenefitsReduces impact of memory latencyStudy demonstrates performance

improvement of 80% when combined with predication

Greatest improvement to code with many cache accesses

Scheduling flexibility enables new levels of performance headroom

Page 22: EPIC Architecture (Explicitly Parallel Instruction Computing)

Software PipeliningSoftware Pipelining

vs.vs.

•Overlap the execution of different loop iterationsOverlap the execution of different loop iterations•Get more iterations in same amount of timeGet more iterations in same amount of time

Page 23: EPIC Architecture (Explicitly Parallel Instruction Computing)

Software Pipelining ExampleSoftware Pipelining Example

For(I=0;I<1000;I++)

x[I]=x[I]+s;

Loop: Ld f0,0(r1)Add f0,f0,f1Sd f0,0(r1)Add r1,r1,8Subi r2,r2,1Benz loop

Loop: SD f2, -4(r1)Add f2,f0,f1Subi r2,r2,1Ld f0, 4(r1)Benz loop

Software pipelining

Page 24: EPIC Architecture (Explicitly Parallel Instruction Computing)

Software Pipelining AdvantagesSoftware Pipelining Advantages

Traditionally performed through loop unrolling

less code compared loop unrolling, increased regularity

Smaller code means fewer cache misses

Especially useful for integer code with small number of loop iterations

Page 25: EPIC Architecture (Explicitly Parallel Instruction Computing)

Software Pipelining Software Pipelining disadvantagesdisadvantages

Requires many additional instructions to manage the loop

Without hardware support the overhead may greatly increase code size

typically only used in special technical computing applications

Page 26: EPIC Architecture (Explicitly Parallel Instruction Computing)

IA-64 Features Supporting IA-64 Features Supporting Software PipeliningSoftware Pipelining

Full predication

Circular Buffer of General and FP Registers

Loop Branches Decrement RRBs (register rename bases)

Page 27: EPIC Architecture (Explicitly Parallel Instruction Computing)

SummarySummary Predication removes branches

– Parallel compares increase parallelism– Benefits complex control flow: large databases

Speculation reduces memory latency impact– IA-64 removes recovery from critical path– Benefits applications with poor cache locality: server

applications, OS S/W pipelining support with minimal overhead

enables broad usage– Performance for small integer loops with unknown trip counts

as well as monster FP loops

Page 28: EPIC Architecture (Explicitly Parallel Instruction Computing)

ReferenceReference M. S. Schlanker, "EPIC: Explicitly Parallel

Instruction Computing", Computer, vol. ?, No. ?, pp 37--45, 2000.

Jerry Huck et al., "Introducing the IA-64 Architecture", Sept - Oct. 2000, pp. 12-23

Carole Dulong “The IA-64 Architecture at Work”,Computing Practices