68
EPIC Architectures and Compiler Technology Wen-mei Hwu IMPACT IMPACT EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science Laboratory University of Illinois at Urbana- Champaign IMPACT Group http://www.crhc.uiuc.edu/IMPACT/

EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

EPIC Architectures

Wen-mei Hwu

Department of Electrical and Computer Engineering

Coordinated Science Laboratory

University of Illinois at Urbana-Champaign

IMPACT Grouphttp://www.crhc.uiuc.edu/IMPACT/

Page 2: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Outline

• History and Background

• Control Speculation

• Predication

• IMPACT EPIC Architecture

• Compiler Technology

• Outlook

Page 3: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Vision: Bridging the Gap Between Programs and Hardware

if (x>=0)

if (x==1 ||

x==2 ||

x==3)

m=f(x);

else

m=g(x);

f g

>== = =1 32 0

1

m

+

x

0

enable

x>=0

x!=3

x!=1

x!=2

m=g(x) m=f(x)

TF

T

T

T

F

F

F

Page 4: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Can we get the best of both worlds?

• Hardware– highly speculative

– parallel in nature

– efficient logic manipulation

– special purpose

– area effiicient

– enery efficient

• Programming– conservative semantics

– sequential in nature

– awkward logic manipulation

– easily retargeted

– area inefficient

– energy inefficient

Page 5: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

EPIC Design Objectives• To define a programmable architecture

model that allows compiled programs to approach special hardware design in– logic manipulation capability

– speculation and parallelism

– chip area efficiency

– energy efficiency

Page 6: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Significant Milestones

• 1994 Intel/HP forms IA-64 alliance with U. of Illinois contribution

• 1997 Announcement of IA-64

• 1997 Motorola/Lucent forms StarCore alliance with U. of Illinois contribution

• 1998 major computer vendors adopt IA-64

• 1998 Announcement of StarCore

• 1999 Release of user mode architecture

Page 7: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Evolution of VLIW/EPIC

First-Generation

VLIW

Second-Generation

VLIW

Research

MicrocodeCompaction

Product

Floating-PointSystems

NYUTrace Scheduling

(1976-1979)

TRWPolycyclicProcessor

Module Scheduling(1980-1982)

YaleELI

Bulldog Compiler(1980-84)

CydromeCydra 5

(1984-1988)

MultiflowTRACESeries

(1984-1990)

CullerPSC

(1983-1987)

Univ. of IllinoisIMPACT(1987 - )

HP LabsPlayDoh(1989 - )

ArrayProcessors

Numerix,CDC,etc.

Intel/HP/Moto/Lucent

EPIC

Page 8: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

EPIC - the IMPACT Perspective• IMPACT work done since 1987 to lay

foundation for EPIC architectures– Intel/HP IA-64, Motorola/Lucent StarCore

• Key Technologies– control speculation [ISCA-91] [ASPLOS-92] [MICRO-96]

– data (dependence) speculation [ICS-92] [ASPLOS-94]

– predicated execution [MICRO-92][ISCA-95] [MICRO-97]

– integrated architecture and inline recovery [ISCA-98]

– logic minimization approach to predication [ISCA-99]

– implementation neutral predication architecture [TBD]

Page 9: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Outline

• History and Background

• Control Speculation

• Predication

• IMPACT EPIC Architecture

• Compiler Technology

• Outlook

Page 10: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Control Speculation

• Executing an instruction before knowing that its execution is required

• Moving an instruction above a branch– Removes control dependences to increase ILP– Win when branch directions predicted correctly

• Instruction sequence seen by hardware is changed!– Must ensure that execution result unaffected by such

movement

Page 11: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Control Speculation Example

• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4

• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• G: MEM(r2+4) = r4

Page 12: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Scheduling Error

• An ordering of instructions that will – cause early program termination or

– produce results that differ from those of the unscheduled program.

• To avoid scheduling errors– Live value must be properly preserved - register

renaming

– Spurious Exception condition must be supressed

Page 13: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Safe Speculation• Compiler analysis to identify

– instructions that are always safe.

– speculation that will not introduce a new exception.

• Trivial analysis examples:– array references with constant indices

– divide and remainder with non-zero divisor

• Complex analysis examples:– Branches to ensure legal input operands

– Earlier use of the same input operand

– Loop analysis

Page 14: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Silent Instructions

• Architecture provides silent versions of instructions that may potentially cause exceptions.– Multiflow - silent FP instructions– HPPA - silent FP instructions, silent de-referenced null pointer– SPARC V9 - silent load instruction

• To move an instr. above a branch, convert it into its silent version.

• Both Multiflow TRACE and Cydrome Cydra-5 used similar ideas.

Page 15: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Silent Instructions• Memory access instructions

– If a segmentation fault condition occurs, the instruction is canceled before it reaches the memory system. An arbitrary garbage value is returned.

– If a page fault happens without segmentation fault, the OS page fault handler is immediately invoked as usual. Extra page faults may occur from speculation.

• Arithmetic instructions– If a trap condition occurs, an arbitrary garbage value is deposited

into the destination register.

• The exception condition is either immediately handled or simply ignored.

Page 16: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Debugging Implications• If the speculated instruction:

– the garbage value generated by a silent instruction would not be used.

– the exception condition is correctly ignored since the silent instruction should not have been executed.

• If the branch agrees with compile-time prediction:– the exception condition that occurred to a silent instruction is

incorrectly ignored.

– the garbage value generated may be used by a subsequent instruction without warning.

– not acceptable if exceptions must be reported timely and accurately

Page 17: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Performance Issues

• Page faults caused by silent loads are handled right away– no support to defer page fault until execution of

instruction is confirmed.

– Additional page may faults result from speculation.

– The number additional page faults should be small for systems that are designed not to page.

• Similar issues exist if TLB misess are handled through exception mechanism.

Page 18: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Sentinel Scheduling• Design Objective

– Correctly ignore exceptions generated by speculative instructions whose execution turns out to be unnecessary.

– Correctly report exceptions generated by speculative instructions whose execution is confirmed.

– Support recovery from exceptions thus reported.– Provide the option to handle page faults after the need

for executing a speculative instruction is confirmed.– Minimize the extra hardware and instructions needed to

achieve the objectives above.

Page 19: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Accurate Exception Report

• Each instruction has two parts:– Non-excepting part which performs the actual

operation

– Sentinel part that flags an exception if necessary

• Non-excepting part of I can be speculatively executed provided the sentinel part stays in I's home block

Page 20: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Sentinel Speculation Example

• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4

• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• sentinels B, C, D, E• G: MEM(r2+4) = r4

Page 21: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Sentinel Elimination

• The sentinel of I can be eliminated if – there is another instruction in I's home block which uses the

result of I OR – I is non-excepting and is not the last direct or indirect use of

an excepting instruction's destination

• Unprotected instruction - an instruction whose sentinel cannot be eliminated.

• If an unprotected instruction is speculated, an explicit instruction must be created to serve as the sentinel

Page 22: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Sentinel Speculation Example

• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4

• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• H: check r5• G: MEM(r2+4) = r4

Page 23: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Architectural Support

• Additional bit in opcode field to specify speculative instruction. – can be partially supported by adding speculative version

of all opcodes that should be considered for speculative scheduling and that can directly or indirectly cause exceptions.

• Exception bit (vector) added to each register to mark exceptions caused by a speculative instruction. – These bits need to be preserved across context switches.

Page 24: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Execution Model

• Speculative instructions – src(I).except = 0

• I does not cause an exception, normal execution

• I causes an exception

– dest(I).except = 1

– dest(I).data = pc of I– src(I).except = 1 (exception propagation)

• dest(I).except = 1,

• dest(I).data = src(I).data

Page 25: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Execution Model

• Non-speculative instructions – src(I).except = 0

• I does not cause an exception - normal execution

• I causes an exception - I reported as source of exception

– src(I).except = 1

• (report exception for speculative instruction)

• signal exception

• src(I).data is PC of exception

Page 26: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Scheduling Algorithm

• Identify unprotected instructions

• Perform conventional scheduling– if an unprotected instruction is moved above a branch,

an explicit sentinel instruction is inserted into list of to-be-scheduled instructions

– Explicit sentinel restricted to remain in I's home block with control dependences

– All instructions moved above a branch are marked as speculative

Page 27: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Recovery from Exception

• Important to allow accurate handling of page faults and TLB misses.

• Issues: – ensure that instructions can be retried after the

exception condition is handled

– minimize the negative performance impact in terms of register pressure and instruction count due to recovery.

Page 28: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Recovery Block• Copy speculative instructions into recovery

blocks– One entrance point per potential exception reported by a sentinel

– Code Expansion vs. Efficiency

– must provide a means to reach recovery block - explicit checks

• Source registers of the instructions not in the recovery blocks are not preserved.

• Instructions re-executed during recovery are reduced.

Page 29: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Recovery Block Example

• A: r6 = r4+1• B: If (r9==0) goto L1• C: r1 = MEM(r2+0)• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• G: MEM(r2+r4) = r4

• C: r1 = MEM(r2+0)• A: r6 = r4+1• D: r3 = MEM(r2+4)• E: r4 = r3+1• F: r5 = r1+1• B: if (r9==0) goto L1• H: check r5, L2• I: check r4, L3• G: MEM(r2+4) = r4

Page 30: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Recovery Block Example

• Recovery Block for C• L2:• C: r1 = MEM (r2+0)• E: r5 = r1 + 1

• Recovery Block for D• L3:• D: r3 = MEM (r2+r4)• F: r4 = r3 + 1• G: MEM (r2+4) = r4

Page 31: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Multiple Exceptions

• Different basic blocks

– first sequential exception always reported since check instruction guaranteed to remain in home block of each potential trap-causing instruction

• Same basic block– An exception will be signaled but no guarantee it will

be the first according to original source code

Page 32: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Outline

• History and Background

• Control Speculation

• Predication

• IMPACT EPIC Architecture

• Compiler Technology

• Outlook

Page 33: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Predicated Execution

• Conditional execution of instructions based on a Boolean source operand

• Execution model– Load r1, r2, r3 <p1>

– If p1 is TRUE, instruction executes normally

– If p1 is FALSE, instruction treated as NOP (with some exceptions)

Page 34: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Full Predication Support

• Predicate defining instructions

• Full set of predicated instructions

• Separate predicate register file

• Best performance

• Cydra-5, IA-64, TI-C60, StarCore

Page 35: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Partial Predication Support

• Adds limited set of predicated instructions to existing ISA– no extension to operand format

– CMOV

• Brings some performance increase to existing ISA’s

• SPARC, Alpha, MIPS, P6

Page 36: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

HP-PD Predicate Defines

pred< cmp > dest < type >, src1, src2 (Pin)

• < cmp > - condition: =, >, <, etc.

• < type >– Unconditional (U, U)

– OR-type (O, O)

– AND-type (A, A)

Page 37: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Unconditional Predicate Defines

• For blocks reached on one condition

If (a < 10) c= c+1;

else if (b > 20)

d = d+1;else

e = e+1;

bge a, 10, L1

add c, c, 1jmp L3

ble b, 20, L2

add d, d, 1jmp L3

add e, e, 1

L3

F

TF

T

Page 38: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Unconditional Predicate Define

Pin Condition U U0 0 0 00 1 0 01 0 0 11 1 1 0

Pout

bge a, 10, L1

add c, c, 1jmp L3

ble b, 20, L2

add d, d, 1jmp L3

add e, e, 1

L3

F

TF

T

pred p1(U), p2(U), a 10add c, c, 1 (p2)pred p3(U), p3(U), b 20 (p1)add d, d, 1 (p4)add e, e, 1 (p3)

Page 39: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Or Predicate Defines

• For blocks reached on multiple conditions

If (a && b) c= c+1;

else d = d+1;

beq a, 0, L1

beq b, 0, L1

add d, d, 1jmp L2

L1: add e, e, 1

L2:

F

TF

T

Page 40: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Or-type Predicate Define

Pin Condition O O0 0 - -0 1 - -1 0 - 11 1 1 -

Pout

pred_clr p1pred p1(O), p2(U), a = 0pred p1(O), p3(U), b = 0 (p2)add d, d, 1 (p3)add e, e, 1 (p1)

bge a, 0, L1

ble b, 0, L1

add d, d, 1jmp L2

L1: add e, e, 1

L2:

F

TF

T

Page 41: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

And-type Predicate Define

Pin Condition O O0 0 - -0 1 - -1 0 0 -1 1 - 0

Pout

pred_clr p1pred_set p3pred p1(O), p3(A), a = 0pred p1(O), p3(A), b = 0add d, d, 1 (p3)add e, e, 1 (p1)

bge a, 0, L1

ble b, 0, L1

add d, d, 1jmp L2

L1: add e, e, 1

L2:

F

TF

T

Page 42: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Outline

• History and Background

• Control Speculation

• Predication

• IMPACT EPIC Architecture

• Compiler Technology

• Outlook

Page 43: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

IMPACT EPIC Architecture• Predication

– base model is HP-PD [Schlansker,Rau, Kathail]

– added implicit predicate pR to facilitate speculation

– prefix alternative for code size control [EuroPar-99]

– added new conjunctive and disjunctive types to facilitate minimization of program decision logic

– moving towards implementation-neutral predication

• Control Speculation– based on Sentinel model [ASPLOS-92]

– added R-Tags (in addition to E-tags) and pR (implicit recovery predicate) to enable inline recovery

Page 44: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

IMPACT EPIC ArchitectureRegister File

Value/PC E-Tag R-Tag

Memory Conflict Buffer

Register Tag and Attribute

S

Instructions

DS

T/F E-Tag R-Tag

Predicate Register File

LOAD Pred

S

DS CHECK Pred

PredOPERATION

T/F E-Tag R-Tag pR

Page 45: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Control Speculative Execution• Speculative instruction causes an exception

– write current PC into destination register

– set E-Tag in destination register

• Speculative instruction propagates an exception– a source register with set E-Tag– Propagate PC from source to destination register– set E-Tag in destination register

• Non-speculative instruction detects exceptions– a source register with set E-Tag

Page 46: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Microprocessor Microarchitecture Main Memory

100 cycle latency

L2 Cachenon-blocking1M-byte, 4-way64B block

System Bus Backside BusInterface

L1 Cache

non-blocking16K-byte, 2-way32B block

BTB1Kdirect-mapped2-level

I-Fetch Unit

I-Cache32K-bytedirect-mapped64B block (split)

Instruction Decoder

Register Alias Table(64 predicate and 128 regular) Reorder Buffer

ArchitectureRegister File

2 Memory Units

1 Floating PointUnit

1 Branch Unit

2 Integer Units

Page 47: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Result of Applying EPIC Techniques3.10

1.00

1.20

1.40

1.60

1.80

2.00

2.20

2.4000

8.es

pres

so

023.

eqnt

ott

072.

sc

085.

cc1

124.

m88

ksim

129.

com

pres

s

130.

li

132.

ijpe

g

147.

vort

ex

cccp

cmp

eqn

grep lex

wc

yacc

Spe

edup

ove

r ba

se

Predication onlyControl speculation onlyData speculation onlyPredication and speculation combined

Page 48: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Integrated Predication and Control Speculation

• All of the following must be true for a predicated instruction to take effect– input predicate true

– input predicate E-Tag false

– either • pR false, or

• R-Tag of at least one input registers true

Page 49: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Speculation Example

• Speculative (affected by exception)

• speculative (not affected)• Non-speculative• branch• check (non-speculative

use)

Page 50: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Inline Recovery Model• Processor enters recovery mode, set pR

– PC in source register used as recovery PC– The speculative instruction at recovery PC is

executed non-speculatively.– Exception processing is performed.– If exception is non-terminating, the result is

stored into destination register, set R-Tag.– Instructions with R-Tag set in source registers

are executed, set R-Tag in destination register

Page 51: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Inline Recovery Model (cont.)– Non-speculative instructions not repeated.

• Stores, self-incrementing loads and stores, etc. are safe.

• Same effect is achieved by recovery blocks.

• Source registers of non-speculative instructions do not need to be preserved.

– Branches and predicate defines repeated• to reproduce original control flow

• input condition must be preserved

– Recovery model is turned off when reaching check with set source R-Tag.

Page 52: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Recovery Block - Code Size

1

1.05

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

00

8.e

spre

sso

02

3.e

qnto

tt

07

2.s

c

08

5.c

c1

09

9.g

o

12

4.m

88

ksim

12

9.c

om

pres

s

13

0.l

i

13

2.i

jpeg

14

7.v

ort

ex

cccp

cmp

eqn

grep

lex

wc

yacc

Co

de E

xpan

sio

n

Page 53: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Instruction Cache Miss Comparison (32k, direct mapped)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%0

08

.esp

ress

o

02

3.e

qnto

tt

07

2.s

c

08

5.c

c1

12

4.m

88

ksim

12

9.c

om

pres

s

13

0.l

i

13

2.i

jpeg

14

7.v

ort

ex

cccp

cmp

eqn

grep

lex

wc

yacc

Per

cent

red

uct

ion

(%)

Page 54: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Instruction Cache Miss Comparison (64k, 8way)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%0

08

.esp

ress

o

02

3.e

qnto

tt

07

2.s

c

12

4.m

88

ksim

13

0.l

i

13

2.i

jpeg

14

7.v

ort

ex

Per

cent

red

uct

ion

(%)

Page 55: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Spurious Cache Misses and Exceptions

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%0

08

.esp

ress

o

02

3.e

qnto

tt

07

2.s

c

08

5.c

c1

12

4.m

88

ksi

m

12

9.c

om

pre

ss

13

0.l

i

13

2.i

jpeg

14

7.v

ort

ex

cccp

cmp

eqn

gre

p

lex

wc

yacc

Per

cent

redu

ctio

n (

%)

Cache missesExceptions

• Spurious cache misses, TLB misses, and page faults are frequent in speculated code. Failing to suppress them can have a detrimental effect on performance.

Page 56: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Outline

• History and Background

• Control Speculation

• Predication

• IMPACT EPIC Architecture

• Compiler Technology

• Outlook

Page 57: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

EPIC Compiler Technology Overview

If-Conversion

Classical Optimization

Predicate Optimization

ILP Optimization

Scheduling/PartialReverse If-Conversion

Register Allocation

Code Generation

Deb

uggi

ng o

f O

ptim

ized

Cod

e

Predicated Dataflow

PredicateAnalysis

Source

MemoryDisambiguation

MachineDescription

Page 58: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Technology Vision• Compiler

Technology– analyze programmatic intentions

• pointer alias analysis

• integer range analysis

• predicate analysis

– transformations• program decision logic

minimization [ISCA-99]

• fully resolved predicate optimizatios

• data structure optimization

• algorithm transformations

• Architecture Support– logic manipulation instructions

• efficient condition tests

• instructions to efficiently combine conditions

– highly effective speculative execution

• cache misses, TLB misses

• exceptions and dependence violations [ISCA-98]

Page 59: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Vision: Bridging the Gap Between Programs and Hardware

if (x>=0)

if (x==1 ||

x==2 ||

x==3)

m=f(x);

else

m=g(x);

f g

>== = =1 32 0

1

m

+

x

0

enable

x>=0

x!=3

x!=1

x!=2

m=g(x) m=f(x)

TF

T

T

T

F

F

F

Page 60: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Analysis of Predicated Codes

• Live Variable Analysis Example:– Without Predicate Aware Dataflow (Only

instructions on TRUE predicate can kill.)• R7 is defined and killed by instruction 5; R7 is used

by instruction 6.• R7’s live range is (5,6).• R3 is not defined and killed by instruction 3 in all

cases because it is predicated on P1. R3 is used by instruction 4.

• R3’s live range is (1,2,3,4) and live out the top of the CB.

– With Predicate Aware Dataflow• R7’s live range is also (5,6).• R3’s live range is (3,4) because instruction 3

defines R3 for all uses by instruction 4. This is known by studying the relation of P1 to P2.

1 (p1un) = (r1 < 0)

2 (p2un) = (r2 < 0) (p1)

3 r3 = r4 + r5 (p1)

4 r8 = r3 + 1 (p2)

5 r7 = r4 + r6

6 r4 = r7 - 1 (p1)

7 r9 = r9 / 2 (p2)

• Dataflow without regard to predicates leads to conservative results.

Page 61: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Dataflow Analysis of Predicated Code

• Traditional dataflow requires reverse if-conversion (RIC)

• RIC of some codes is exponential (wc: 5,20,80,240,...)

• Factoring reduces order of complexity (wc: 8,15,22,28,...)

RIC of one iter. (width 5)

Code example (wc)

RIC of code with 2x unroll (width 20)

Page 62: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Code Size Control using Predication

Code example (099.go copyshape):

• Predication reduced code size by instruction merging (in example 35%)

OriginalPredicated

B

S

L

JS

B

S

L

JS

B

S

L

JS

B

S

L

JS

S

L

JS

B

S

L

JS

B

S

L

JS

B

P

X

P P P P PP L

X X

S S S S

X

Code example (MediaBench Experimental Image Compression reflect1):Original (Overhead=8/17 instrs (47%)) Optimized (6/19 (30%)) Predicated (3/13 (23%))

B1

J

B2

B1

J

J

B1

J

B1

J

B2 B2

J

J

P1I0

I1

I3 I4 I5 I6

I0I0I1

I7 I8

I9

I2 I2I2I4I3I3

I5I8

I9

I7 I6

I2

I7 I8I9I8

I7

P3I1P2

I3 I4 I5 I6

Page 63: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Program Decision Logic Optimization

• Express control as a predicate network

• Reformulate decision as a logic network mimicking circuit minimization techniques

T

p1

p2 p4

p3 p5 p6

p8

T

p7

p3

p5

p8

p3 p5 p8

T

p3

p5

p8

T

Page 64: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Working Example - 132.ijpeg in SPEC95

• Contains 477 functions and 25,889 lines of code• Spends 200 seconds and 18MB of memory in analysis• 229 of 266 indirect call-sites are converted into direct ones

f6

f3f7

f3(&s1, &i, &j);f7(s1);

f?

*s->p = 10;*s->q = 20;(*s->fp)(s);

s1s

i

j

v1

v2 s1 qp

j

i

fpf5

t = malloc();t->p = v1;t->q = v2;t->fp = f5;*s = t;

Prior to object elevation

After object elevation

Page 65: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Debugging of Optimized Code (PLDI-99)

• When to take over execution and when to stop forward recovery?– original execution order of instructions has to be tracked

– instructions might be moved up to different paths leading to the breakpoint or down to different paths starting from the breakpoint

I1(S1)I1’(S4)I5 (S4)

I2(S2)I3(S3)I4(S3)

A

B

C D

E

F

I1(S1)I1’(S4)I5 (S4)

I2(S2)I3(S3)I4(S3)

A

B

C D

E

F

I3’(S3)

I4’(S3)

I1’’(S2,S4)

A

B

D

E

F

I3’(S3)

I4’(S3)

breakpointI5 (S4)

Page 66: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Outline

• History and Background

• Control Speculation

• Predication

• IMPACT EPIC Architecture

• Compiler Technology

• Outlook

Page 67: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

EPIC Research Challenges

• Implementation neutral architecture

• Profile independence and program transparent profiling

• Code size optimizations

• Analysis of predicated code

• Interprocedural alias analysis

• Debugging of optimized code

Page 68: EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science

EPIC Architectures and Compiler TechnologyWen-mei Hwu IMPACTIMPACT

Outlook• Compilers critical to the performance of EPIC uP’s

– Use of predication and speculation is a serious challenge

– Any misuse will lead to performance loss.

– Brand new algorithms will be deployed in the EPIC compilers.

– Existing software development models must be supported.

• Expect performance robustness issues– Awesome performance leap seen for some applications.

– Less for others due to limitations of analyses and optimizations.

– It can take years for the performance gain to be universal.

– A lot of research activities needed, www.trimaran.org.

• Evolution of EPIC architectures– Revisions of architectures are likely as compilers mature.

– Code size and power consumption are critical for embedded EPICs.