51
university-logo Introduction Decoupled LM Management Experiments Conclusion Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1 , Ozcan Ozturk 2 , Albert Cohen 1 1 INRIA Saclay- ˆ Ile de France & Universit´ e Paris-Sud 11 2 Bilkent University October 11, 2009 / LCPC’09 B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Optimizing Local Memory Allocation andAssignment Through a Decoupled Approach

Boubacar Diouf 1, Ozcan Ozturk 2, Albert Cohen 1

1INRIA Saclay-Ile de France & Universite Paris-Sud 11

2Bilkent University

October 11, 2009 / LCPC’09

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 2: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Outline

1 Introduction

2 Decoupled LM Management

3 Experiments

4 Conclusion

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 3: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

I Many processors have Local MemoriesDigital Signal processorsStream-processing unit (GPUs) and network processorsCell broadband engine’s synergetic processing units (SPU)

I Why?FastPredicatabilityPower efficiency

I Array allocation? on Local Memory (LM)Allocation decision fixed for the entire execution (static)Allocation depends on the program points (dynamic)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 4: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

I Many processors have Local MemoriesDigital Signal processorsStream-processing unit (GPUs) and network processorsCell broadband engine’s synergetic processing units (SPU)

I Why?FastPredicatabilityPower efficiency

I Array allocation? on Local Memory (LM)Allocation decision fixed for the entire execution (static)Allocation depends on the program points (dynamic)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 5: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 6: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 7: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stackd

2 Available registers

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 8: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stackd

2 Available registers

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 9: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stack

2 Available registers

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 10: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stack

2 Available registers

c

a

b

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 11: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stack

2 Available registers

a

b

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 12: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stack

2 Available registers

a

b

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 13: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stack

2 Available registers

a

b

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 14: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stack

2 Available registers

a

b

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 15: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

code stack

2 Available registers

a

b

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 16: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

db,db,da,b,da,bb,ca,ca

code lived

2 Available registers

12232221

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 17: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

db,db,da,b,da,bb,ca,ca

code lived

2 Available registers

12232221

maxlive

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 18: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

db,db,da,b,da,bb,ca,ca

code lived

2 Available registers

12222221

maxlive

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 19: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

db,db,da,b,da,bb,ca,ca

code lived

2 Available registers

12222221

maxlive

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 20: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b = load …b = b * da = load …a = d / ac = a / ba = b + cstore cstore a

db,db,da,b,da,bb,ca,ca

code live

2 Available registers

12222221

maxlive

a

bc

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 21: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b1= load …b2= b1 * da1= load …a2= d / a1c1= a2 / b2a3= b2 + c1store c1store a3

db,db,da,b,da,bb,ca,ca

code live

2 Available registers

12232221

maxlive

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 22: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b1= load …b2= b1 * da1= load …a2= d / a1c1= a2 / b2a3= b2 + c1store c1store a3

db,db,da,b,da,bb,ca,ca

code live

2 Available registers

12222221

maxlive

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 23: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b1= load …b2= b1 * da1= load …a2= d / a1c1= a2 / b2a3= b2 + c1store c1store a3

db,db,da,b,da,bb,ca,ca

code live

2 Available registers

12222221

maxlive

b1

b2

a2a1

c1a3

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 24: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b1= load …b2= b1 * da1= load …a2= d / a1c1= a2 / b2a3= b2 + c1store c1store a3

db,db,da,b,da,bb,ca,ca

code live

2 Available registers

12222221

maxlive

b1

b2

a2a1

c1a3

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 25: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b1= load …b2= b1 * da1= load …a2= d / a1c1= a2 / b2a3= b2 + c1store c1store a3

db,db,da,b,da,bb,ca,ca

code live

2 Available registers

12222221

maxlive

b1

b2

a2a1

c1a3

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 26: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Register AllocationI Allocation Phase

Rely on MaxliveChoose register residents

I Assignment phasewhich register for whichvariablepolynomial under SSADecoupling: isolate the hardproblem of allocation (spilling)

d = ...b1= load …b2= b1 * da1= load …a2= d / a1c1= a2 / b2a3= b2 + c1store c1store a3

db,db,da,b,da,bb,ca,ca

code live

2 Available registers

12222221

maxlive

b1

b2

a2a1

c1a3

For more, Please attend SSA tutorial

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 27: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation

Decoupled Local Memory AllocationI Allocation Phase

Rely on Maxlive, revised as the maximal of the living arraysChoose decision points for splitting

I Assignment phaseWhich offset for which ArrayColorability?Complexity?

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 28: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Motivation Example

Choice of decision points:I points where loads and stores are

going to be inserted

//Nested within outer loopsfor (i=0; i<N; i++)

for (j=0; j<N; j++) C[i][j] = /* ... */;

F[0][0]=1; F[0][1]=2; F[0][2]=1; F[1][0]=2;F[1][1]=4; F[1][2]=2;F[2][0]=1; F[2][1]=2;F[2][2]=1;

decision

decision

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 29: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Allocation schemes

1 At every array instructionfiner decision pointsExcessive Complexity (if ILP used)

2 Every time an array becomes aliveSimilar to SSA-based register Allocation

3 For the whole methodSpill everywhere problem (static)We cannot rely on MAXLIVE

//Nested within outer loopsfor (i=0; i<N; i++)

for (j=0; j<N; j++) C[i][j] = /* ... */;

F[0][0]=1; F[0][1]=2;... ...F[2][1]=2;F[2][2]=1;

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 30: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Allocation schemes

1 At every array instructionfiner decision pointsExcessive Complexity (if ILP used)

2 Every time an array becomes aliveSimilar to SSA-based register Allocation

3 For the whole methodSpill everywhere problem (static)We cannot rely on MAXLIVE

//Nested within outer loopsfor (i=0; i<N; i++)

for (j=0; j<N; j++) C[i][j] = /* ... */;

F[0][0]=1; F[0][1]=2;... ...F[2][1]=2;F[2][2]=1;

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 31: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Allocation schemes

1 At every array instructionfiner decision pointsExcessive Complexity (if ILP used)

2 Every time an array becomes aliveSimilar to SSA-based register Allocation

3 For the whole methodSpill everywhere problem (static)We cannot rely on MAXLIVE

//Nested within outer loopsfor (i=0; i<N; i++)

for (j=0; j<N; j++) C[i][j] = /* ... */;

F[0][0]=1; F[0][1]=2;... ...F[2][1]=2;F[2][2]=1;

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 32: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Preliminary transformations

I Tiling

I Loop distribution

I Strip Mining

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 33: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Preliminary transformations

I Tiling

I Loop distribution

I Strip Mining

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 34: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Preliminary transformations

I Tiling

I Loop distribution

I Strip Mining

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 35: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Preliminary transformations

I Tiling

I Loop distribution

I Strip Mining

for (i=0; i<N; i++) for (j=0; j<N; j++) C[i][j] = /* ... */;

F[0][0]=1; /* ... */ F[2][2]=1;

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 36: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Preliminary transformations

I Tiling

I Loop distribution

I Strip Mining

for (i=0; i<N; i++)//Outer strip-mined loop for (jj=0; jj<N+B-1; jj+=s) // Inner strip-mined loop for (j=jj; j<N && j<jj+s; j++) C[i][j] = /* ... */;

F[0][0]=1; /* ... */ F[2][2]=1;

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 37: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Preliminary transformations

I Tiling

I Loop distribution

I Strip Mining

for (i=0; i<N; i++) //Outer strip-mined loop for (jj=0; jj<N+B-1; jj+=s) STORE(C[i][jj..min(jj+B-1,N-1)]);

STORE(F[0..2][0..2]);

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 38: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Abstracted Model

I Array blocks are like scalar variables in register allocation

I Extension of SSA to perform on array blocks

Not array SSA: no dataflow of individual array elements

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 39: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = ...

B

LM

= CA2 = ...A1

LM

A2

C

A3 = Φ(A1,A2)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 40: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = ...

B

LM

= CA2 = ...

A1

LM

A2

C

A3 = Φ(A1,A2)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 41: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = ...

B

LM

= CA2 = ...

A1

LM

A2 C

A3 = Φ(A1,A2)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 42: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = ...

B

LM

= CA2 = ...

A1

LM

A2 C

A3 = Φ(A1,A2)

LMA2?

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 43: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = ...

B

LM

= CA2 = ...

A1

LM

A2 C

A3 = Φ(A1,A2)

LM

?A1

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 44: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = ...

B

LM

= CA2 = ...

A1

LM

A2 C

A3 = Φ(A1,A2)

PTR (A1) PTR (A2)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 45: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = …

B

LM

= CA2 = ...

A1

LM

A2 C

A3 = Φ(A1,A2)

PTR (A1) PTR (A2)

PTR(A3)= PTR (A1)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 46: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Pointer reconciliation

= BA1 = …

B

LM

= CA2 = ...

A1

LM

A2 C

A3 = Φ(A1,A2)

PTR (A1) PTR (A2)

PTR(A3)= PTR (A1) PTR(A3)= PTR (A1)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 47: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Allocation

I The allocation problem is solved by Integer Linear Programming (ILP)

I Rely on maxlive to perform allocation

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 48: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Assignment

Two options:I Ignore the fragmentation problem in the allocation step, rely on a

fragmentation-avoidance heuristicI Extend the allocation step to guarantee fragmentation-free assignment:

openCompare with an integrated approach (not scalable ILP problem)

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 49: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Benchmarks

Benchmark Brief Suite Data arraysdescription size /blocks

Edge-Detect Edge detection in an image UTDSP 196644 4/385D-FFT 256-point complex FFT UTDSP 2032 7/7Bmcm Water molecular dynamics Perfect Club 125240 10/310MxM Matrix multiplication n.a. 120000 3/300

Constant Latencylatency LM 8latency MM 128

latency move(sv) 8+2svlatency spill(sv) 128+4sv

latency reload(sv) 128+4sv

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 50: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Results

minsize (16%)+20%

+40%maxsize -1

maxsizefitsize (5166%)

00.20.40.60.8

11.2

decoupledintegrated BMCM

minsize (33%)+20%

+40%maxsize -1

maxsizefitsize (10000%)

00.20.40.60.8

11.2

MXMdecoupledintegrated

minsize (24%)+20%

+40%maxsize -1

maxsizefitsize (100%)

0

0.20.40.6

0.81

1.2FFT

decoupledintegrated

minsize (24%)+20%

+40%maxsize -1

maxsizefitsize (9435%)

00.20.40.60.8

11.2

EDGE_DETECTdecoupledintegrated

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation

Page 51: Optimizing Local Memory Allocation and Assignment Through ...€¦ · Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach Boubacar Diouf 1, Ozcan Ozturk

university-logo

IntroductionDecoupled LM Management

ExperimentsConclusion

Conclusion

I New bridge between LM management and Register Allocation

I Validation by Experiments

Optimal allocation Relying on maxlive

No fragmentation-induced spills

No fragmentation-induced displacements

B Diouf, O Ozturk, A Cohen Decoupled Local Memory Allocation