20
C B E D F G Frequently executed path Not frequently executed path Hard to predict path A C E B H Insert select-µop (φ-nodes SSA Diverge Branch CFM point A H Hard to predict p1 !p1 nop

Frequently executed path Not frequently executed path Hard to predict path

  • Upload
    eara

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

A. A. Diverge Branch. Hard to predict. nop. B. C. B. p1. D. C. !p1. E. E. F. G. Insert select- µ ops ( φ -nodes SSA). H. CFM point. H. Frequently executed path Not frequently executed path Hard to predict path. A. A. A. A. A. A. - PowerPoint PPT Presentation

Citation preview

Page 1: Frequently executed path Not frequently executed path Hard to predict path

C B

E

D

F G

Frequently executed path

Not frequently executed path

Hard to predict path

A

C

E

B

H

Insert select-µops

(φ-nodes SSA)

Diverge Branch

CFM point

A

H

Hard to predict

p1

!p1

nop

Page 2: Frequently executed path Not frequently executed path Hard to predict path

diverge-branch executed block CFM point

Diverge-Merge Processor (DMP) [MICRO 2006, TOP PICKS 2007]

C B

E

D

F G

Frequently executed path

Not frequently executed path

A A A

A A A

A

H

Page 3: Frequently executed path Not frequently executed path Hard to predict path

Profile-Assisted Compiler Supportfor Dynamic Predicationin Diverge-Merge Processors

Hyesoon Kim José A. Joao

Onur Mutlu* Yale N. Patt

HPS Research Group University of Texas at Austin

*

Page 4: Frequently executed path Not frequently executed path Hard to predict path

4

Control-Flow Graphs

A

simple hammock

A

nested hammock

A

frequently-hammock

A

loop

A

. . . . . . . . . . .

non-merging

66% of mispredicted branches can be dynamically predicated by DMP.

Exact CFM points

Approximate CFM points

Page 5: Frequently executed path Not frequently executed path Hard to predict path

5

Diverge-Merge Processor (DMP)

DMP can dynamically predicate complex

branches (in addition to simple hammocks).

The compiler identifies Diverge branches

Control-flow merge (CFM) points

The microarchitecture decides when and

what to predicate dynamically.

Page 6: Frequently executed path Not frequently executed path Hard to predict path

6

Why hardware and compiler?

Compiler-centric solution (static predication):predicated ISA, not adaptive,applicable to limited CFG.

Microarchitecture-only solution:complex, expensive, limited in scope.

Compiler-microarchitecture interaction Each one does what it is good at.

Page 7: Frequently executed path Not frequently executed path Hard to predict path

7

Compiler Support

Analysis: Identify Diverge Branch Candidates and CFM points

Select Diverge Branches and CFM points

Code generation: mark the selected diverge branchesand CFM points (ISA extensions)

Page 8: Frequently executed path Not frequently executed path Hard to predict path

8

Simple/Nested Hammocks: Alg-exact

A

CFM point = IPOSDOM(A)

Page 9: Frequently executed path Not frequently executed path Hard to predict path

9

Frequently-Hammocks: Alg-freq

A

H

T NT

• Use edge-profiling frequencies

0.8 0.2

0.1 0.9 0.5 0.5

pT(X): conditional probability of reaching basic block X if branch A is taken

pNT(X): conditional probability of reaching basic block X if branch A is not taken

• Stop at IPOSDOM or MAX_INSTR

• Compute pMerging(X) = pT(X) * pNT(X)

pMerging(H) = 1 * 0.5 = 0.5 CFM point candidate !

1 1 1 1

pT(H)=1

pNT(H)=0.5

Page 10: Frequently executed path Not frequently executed path Hard to predict path

10

Compiler Support

Analysis: Identify Diverge Branch Candidates and CFM points

Select Diverge Branches and CFM points

Code generation: mark the selected diverge branchesand CFM points (ISA extensions)

• Chains of CFM points• Short hammocks

• Return CFM points• Diverge loop branches

• Simple/nested hammocks • Frequently-hammocks

Page 11: Frequently executed path Not frequently executed path Hard to predict path

11

Compiler Support

Analysis: Identify Diverge Branch Candidates and CFM points

Select Diverge Branches and CFM points

Code generation: mark the selected diverge branchesand CFM points (ISA extensions)

• Heuristics • Cost-benefit model

Page 12: Frequently executed path Not frequently executed path Hard to predict path

12

Heuristics-Based Selection

Do not select: CFM points too far from the diverge-branch Hammocks with too many branches

on each path Approximate CFM points with

a low probability of merging

Motivation:minimize overhead: wrong pathmaximize benefit: control-independence

Page 13: Frequently executed path Not frequently executed path Hard to predict path

13

Confidence Estimation Cost BenefitCompared to baseline

Cost-Benefit Model-Based Selection

High - - - Same

LowInaccurate Overhead None Lose

Accurate Overhead No flush possibly WIN

cost = (1 – A) x overhead + A x (overhead – misprediction_penalty)

Select if cost < 0

A = accuracy of the conf. estimator = mispredicted / low_conf.

Page 14: Frequently executed path Not frequently executed path Hard to predict path

14

Compiler Support

Analysis: Identify Diverge Branch Candidates and CFM points

Select Diverge Branches and CFM points

Code generation: mark the selected diverge branchesand CFM points (ISA extensions)

• Heuristics • Cost-benefit model

Page 15: Frequently executed path Not frequently executed path Hard to predict path

15

Methodology All compiler algorithms implemented on a binary analysis

and annotation toolset.

Cycle-accurate execution-driven simulation of a DMP processor: Alpha ISA Processor configuration

16KB perceptron predictor Minimum 25-cycle branch misprediction penalty 8-wide, 512-entry instruction window 2KB 12-bit history enhanced JRS confidence estimator 32 predicate registers, 3 CFM registers

12 SPEC CPU 2000 INT, 5 SPEC 95 INT

Page 16: Frequently executed path Not frequently executed path Hard to predict path

16

Heuristic-Based Selection

0

10

20

30

40

50

60

70

gzi

p

vp

r

gc

c

mc

f

cra

fty

pa

rse

r

eo

n

pe

rlb

mk

ga

p

vo

rte

x

bzi

p2

two

lf

co

mp

go

ijp

eg li

m8

8k

sim

hm

ea

n

IPC

del

ta (

%)

exactexact+freqexact+freq+shortexact+freq+short+retexact+freq+short+ret+loop

20.4%

Page 17: Frequently executed path Not frequently executed path Hard to predict path

17

Cost-Benefit-Based Selection

0

10

20

30

40

50

60

70

gzi

p

vp

r

gcc

mcf

cra

fty

pars

er

eo

n

perl

bm

k

gap

vo

rtex

bzi

p2

two

lf

co

mp

go

ijp

eg li

m88ksim

hm

ean

IPC

delt

a (

%)

cost-edge

cost-edge+short

cost-edge+short+ret

cost-edge+short+ret+loop

20.2%

The cost-benefit model is simpler and effective

Page 18: Frequently executed path Not frequently executed path Hard to predict path

18

Input Set Effects

0

10

20

30

40

50

60

70

gzi

p

vp

r

gcc

mcf

cra

fty

pars

er

eo

n

perl

bm

k

gap

vo

rtex

bzi

p2

two

lf

co

mp

go

ijp

eg li

m88

ksim

hm

ea

n

IPC

delt

a (

%)

Heuristics-same

Heuristics-diff

Cost-same

Cost-diff

19.8%

Page 19: Frequently executed path Not frequently executed path Hard to predict path

19

Conclusion Compiler-microarchitecture interaction is good!

DMP exploits frequently-hammocks.

We developed new algorithms that select beneficial diverge-branches and CFM points.

We proposed a new cost-benefit model for dynamic predication.

DMP and our algorithms improve performance by 20%.

Page 20: Frequently executed path Not frequently executed path Hard to predict path

Thank You!

Questions?