14
Open64 A Software Pipelining Framework for Simple Processor Cores •Juergen Ributzka •David Stephenson •Timothy Kong •Dee Lee •Fred Chow •Guang R. Gao

Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Embed Size (px)

Citation preview

Page 1: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Open64

A Software Pipelining Framework for Simple Processor Cores

•Juergen Ributzka•David Stephenson•Timothy Kong•Dee Lee•Fred Chow•Guang R. Gao

Page 2: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 2Open64

Overview

• SiCortex Multiprocessor• Software Pipelining Framework• Results• Future Work

3/23/2009

Page 3: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 3Open64

SiCortex Multiprocessor

• RISC• 6 cores (MIPS 5KF)• 500 – 700 MHz• In-Order Execution• Limited-Dual Issue• Low Power (12 – 16 Watt)

3/23/2009

Page 4: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 4Open64

Software Pipelining Framework

3/23/2009

DDG

MII

MS

MVE

RA

CG

Front-End

Middle-End

Back-End

FORTRAN C/C++ Java

Loop NestOptimizer

GlobalOptimizer

CodeGenerator

Page 5: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 5Open64

Data Dependence Graph (DDG)

• Cyclic Data Dependence Graph• No register anti- and output-

dependencies• Only single BBs

3/23/2009

DDG

MII

MS

MVE

RA

CG

S1

S2

S1

S2

Page 6: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 6Open64

Minimum Initiation Interval

• Limited by:– Resource Requirements– Recurrence Requirements

3/23/2009

DDG

MII

MS

MVE

RA

CG

Page 7: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 7Open64

Modulo Scheduling (MS)

• Huff Modulo Scheduling– Lifetime sensitive– Uses backtracking

• Hyper Node Reduction Scheduling– Lifetime sensitive– No backtracking

3/23/2009

DDG

MII

MS

MVE

RA

CG

Page 8: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 8Open64

Modulo Variable Expansion (MVE)

• Separate TN set for every iteration• # of unrollings depends on longest

lifetime

3/23/2009

DDG

MII

MS

MVE

RA

CG

TN10 op1 TNXX…TN20 op2 TN10 TNYY

TN11 op1 TNXX…TN21 op2 TN11 TNYY

TN10 op1 TNXX…TN20 op2 TN10 TNYY

Iteration

Cycle

Page 9: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 9Open64

Register Allocation (RA)

• Only kernels are fully register allocated

• No regions– SWP kernels are not a black box for GRA

and LRA anymore• Currently no register spilling support

3/23/2009

DDG

MII

MS

MVE

RA

CG

Page 10: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 10Open64

Code Generation (CG)

• Prologues and epilogues are partially register allocated

• Need several epilogues due to different register sets

3/23/2009

DDG

MII

MS

MVE

RA

CG

P

P0

K0

K1 E0

E1

E2

E

Page 11: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 11Open64

Benchmark Results

3/23/2009

BT CG EP FT IS LU MG SP UA0.9

0.95

1

1.05

1.1

1.15

1.2

NAS Parallel Benchmark

ABI n32ABI 64

spee

dup

Page 12: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 12Open64

Benchmark Results

3/23/2009

410.

bwav

es

433.

milc

434.

zeus

mp

435.

grom

acs

436.

cact

usAD

M

437.

lesli

e3d

444.

nam

d

447.

deal

II

450.

sopl

ex

453.

povr

ay

454.

calcu

lix

459.

Gem

sFDT

D

465.

tont

o

470.

lbm

481.

wrf

482.

sphi

nx3

400.

perlb

ench

401.

bzip

2

403.

gcc

429.

mcf

445.

gobm

k

456.

hmm

er

458.

sjeng

462.

libqu

antu

m

464.

h26r

ef

471.

omne

tpp

473.

asta

r

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

SPEC2006

ABI n32ABI 64

spee

dup

Page 13: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 13Open64

Conclusion and Future Work

• Spilling support needed to enable more loops for SWP

• Screen out loops with small trip count during runtime to reduce SWP overhead

• Hit-under-miss support to overcome current hardware limitations

3/23/2009

Page 14: Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Copyright © 2009 - Juergen Ributzka. All rights reserved. 14Open64

Questions ?

3/23/2009