A Dimension Abstraction Approach to Vectorization in Matlab Neil Birkbeck Jonathan Levesque Jose Nelson Amaral Computing Science University of Alberta

A Dimension Abstraction Approach to Vectorization in Matlab

Neil BirkbeckJonathan LevesqueJose Nelson Amaral

Computing ScienceUniversity of Alberta

Edmonton, Alberta, Canada

Problem

Problem Statement:Generate equivalent, error-free vectorized

source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.

Motivation

Loop-based code is slower than vector code in Matlab.Why?

interpretive overhead (type/shape checking,…)

resizing of arrays in loops

Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted.

n=1000;for i=1:n, A(i)=B(i)+C(i);end

n=1000;for i=1:n, A(i)=B(i)+C(i);end

n=1000;A(1:n)=B(1:n)+C(1:n);

n=1000;A(1:n)=B(1:n)+C(1:n);

5x faster!

Related Work

Data dependence vectorization Allen & Kennedy’s Codegen algorithm

Build data dependence graph Topological visit strongly connected components

Abstract Matrix Form (AMF) [Menon & Pingali] axioms used to transform array code take advantage of matrix multiplication Not clear if it is easily extensible or allows for vectorization

of irregular access (e.g., access to the diagonal)

Incorrect Vectorization

Example 1:for i=1:n,

a(i)=b(i)+c(i);

end

Pull out of loop.Index variable

substitution (i1:n)a(1:n)=b(1:n)+c(1:n)a(1:n)=b(1:n)+c(1:n)

Vectorization correct if a,b, and c are row vectors or column vectors

If this is not true the vectorized code will introduce an error!

Incorrect Vectorization Example 2:

for i=1:n, x(i)=y(i,h)*z(h,i);end

for i=1:n, x(i)=y(i,h)*z(h,i);end

Matlab is untyped Vectorization depends on whether h is

a vector or scalar. If h is a scalar:Otherwise:

x(1:n)=y(1:n,h).*z(h,1:n)’;

x(1:n)=sum(y(1:n,h).*z(h,1:n)’,2);

Overview of Solution

Vectorizable statement

Data dependence-basedvectorizer

Knowledge ofShape of variables

Propagate dimensionalityup parse tree

Dimensions Agree?

Leave statement in loopNo

Yes Perform Transformations

Output Vector statement

More Specifically

Represent dimensionality of expressions as list of symbols 1 or “*” (>1) Assume known for variables.

Type dim

scalar (1)

1xn vector (1,*)

nx1 vector (*,1),(*)

mxn matrix (*,*)

Examples:

Propagate up parse tree according to Matlab rules Compatibility:

dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s)

Vectorized Dimensionality

Vectorized dimensionality: representation of dimensions after vectorization of

a loop denoted dimi for loop with index variable i

Introduce new symbol ri for index variable i

for i=1:n, a(i)=10+i;end

exp dim(exp) vectorized dimi(exp)

10 (1) 10 (1)

i (1) 1:n (1,ri)

a (*) a (*)

a(i) (1) a(1:n) (ri)


Expressions with incompatible vectorized dimensionality should not be vectorized.

When do dimensionalities agree?Assignment expressions: elhs=erhs

dimi(elhs)≈dimi(erhs) || erhs≈(1)

Element-wise binary operators: e=elhsΘerhs

dimi(elhs) ≈(1)||dimi(erhs)≈(1)||dimi(elhs)≈dim(erhs)

Θ in {+,-,.*,…}

dimi,j(B)=(rj,ri)dimi,j(C)=(ri,rj)

Vectorization fails because (ri,rj) is not compatible with (rj,ri)

dimi,j(B)=(rj,ri)dimi,j(C)=(ri,rj)

Vectorization fails because (ri,rj) is not compatible with (rj,ri)


Rules very restrictive: Assume dim(A)=dim(B)=dim(C)=(*,*)

for i=1:100,

for j=1:100

A(i,j)=B(j,i)+C(i,j);

end

end

for i=1:100,

for j=1:100

A(i,j)=B(j,i)+C(i,j);

end

end

Transpose Transformation

Extension to utilize transpose when necessary is straightforward:For assignment:

if dimi(A)≈reverse(dimi(B)) then A=BT is allowable

for i=1:m,

for j=1:n

A(i,j)=B(j,i);

end

end

for i=1:m,

for j=1:n

A(i,j)=B(j,i);

end

end

dimi,j(A)=reverse(dimi,j(B))=(ri,rj)

A(1:m,1:n)=(B(1:n,1:m))’

dimi,j(A)=reverse(dimi,j(B))=(ri,rj)

A(1:m,1:n)=(B(1:n,1:m))’

Transpose Transformation

Extension to utilize transpose when necessary is straightforward:Similar for pointwise operations:

if dimi(A)≈reverse(dimi(B)) then AΘBT is allowable, propagate dimi(AΘBT)=dimi(A)

if dimi(reverse(A))≈dimi(A) then ATΘB is allowable, propagate dimi(ATΘB)=dimi(B)

Pattern Database Dimensionality disagreement at binary operators inhibits

vectorization. Recognizing patterns (consisting of operator type and

operand dimensionalities) can be used to identify a transformation enabling vectorization.

lhs operation rhs output(ri, rj) ΘΘ (ri,1) (ri, rj)

for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); endend

for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); endend

B(i,j)+C(i);B(i,j)+C(i); B(1:m,1:n)+repmat(C(1:m),1,n);B(1:m,1:n)+repmat(C(1:m),1,n);

Transformed Result

Pattern:

Pattern Database

Diagonal access pattern:

lhs operation rhs output(ri, ri) (index) (index) nil (1, ri)

Pattern:

for i=1:n, a(i)=A(i,i)*b(i);end

for i=1:n, a(i)=A(i,i)*b(i);end

a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n);a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n);

Column major indexing of A

Additive Reduction Statements

Additive-reduction statements use a loop variable to perform an accumulation. Not all loop nest index variables appear in

output dimensionality

for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end endend

for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end endend

Loop nest variables I={i1,i2,…,ik}J is a subset of Efor i=1:m,

for j=1:n, a(i)=a(i)+A(i,j); endend

for i=1:m, for j=1:n, a(i)=a(i)+A(i,j); endend

I={i,j} J={i}

for i=1:m a=a+b(i);end

I={i},J={}I-J={i}ρ(b(i))={}

ri in dimi(b(i))=(ri,1)Reduce: b(i)sum(b(i),1);Vectorize: a=a+sum(b(1:m));

for i=1:m a=a+10;end

I={i},J={}I-J={i}ρ(10)={}

ri not in dimi(10)Reduce: 10m*10, ρ(m*10)={ri}Vectorize: a=a+m*10;

Additive Reduction (Solution)

Maintain/propagate dimensionality and reduced variables for an expression. ρ(E) denotes the reduced variables for expression E

When checking statement A(J)=A(J)+E ensure dimi1,i2,…,ikA(J)≈dimi1,i2,…,ik(E) and ρ(E)=I-J any variable ri in ρ(E) but not in I-J must be reduced

Additive Reduction via Matrix Multiplication

Matrix multiplication can be used to perform reductions on e=elhs*erhs , provided:

1. dimi1,…,ik(elhs)=(ri,rk)

2. dimi1,…,ik(erhs)=(rk,rj)

3. rk is a reduction variable. Implies:

dimi1,…,ik(e)=(ri,rj) ρ(e)=union(ρ(elhs), ρ(erhs),{rk})

for i=1:m for j=1:n a(i)=a(i)+B(i,j)*x(j); endend

• j is used for reduction• dimi,j(B(i,j))=(ri,rj)• dimi,j (x(j))=(rj)

a(1:m)=a(1:m)+… b(1:m,1:n)*x(1:n);

ρ(a(i,j)*b(j)+sum(c(i,j),2))={rj}, dimi,j(a(i,j)*b(j)+sum(c(i,j),2)=(ri,rj)

ρ(a(i,j))={}, dimi,j(a(i,j))=(ri,rj)ρ(b(j))={}, dimi,j(b(j))=(rj)rj is reduction variable

Additive Reduction Example

Additive reduction example:for i=1:m,for i=1:m, for j=1:n,for j=1:n, d(i)=d(i)+a(i,j)*b(j)+c(i,j)d(i)=d(i)+a(i,j)*b(j)+c(i,j) endendendend ρ(c(i,j))={},

dimi,j(c(i,j))={ri,rj}

Need to reduce rj: c(i,j)sum(c(i,j),2);

Dimensionality and reduced variables agree, now replace index

variables:

ρ(a(i,j)*b(j))={rj},dimi,j(a(i,j)*b(j))=(ri)

Use matrix multiplication to reduce rj

d(1:m)=d(1:m)+a(1:m,1:n)*b(1:n)+sum(c(1:m,1:n),2);

Implementation Prototype

Pattern database and corresponding transformations are specified in modular end-user extensible manner.

Original Loop

Octave ParserEmbedded

ControlStatements

Create DDG

DimensionCheck

SuccessVectorize

Statement

Code Generator

VectorizerVectorized

Loop

no

yes

no

yes

Results

Source-to-source transformation Timing results averaged over 100 runs: Platform:

Matlab 7.2.0.2833.0 GHz Pentium D Processor

Results Histogram Equalization:

h=hist(im(:),[0:255]);%histogramheq=255*cumsum(h(:))/sum(h(:));for i=1:size(im,1), for j=1:size(im,2), im2(i,j)=heq(im(i,j)+1); endend

h=hist(im(:),[(0:255)]);heq=255*cumsum(h(:))/sum(h(:));im2(1:size(im,1),1:size(im,2))=... heq(im(1:size(im,1),1:size(im,2))+1);

Input source Vectorized Result

For monochrome 8-bit 800x600 image: original/vectorized:

Entire routine: 0.178s/0.114s (speedup: 1.56) Loop Portion only: 0.0814s/0.0176s (speedup: 4.6)

Results (Menon & Pingali Examples)X(i,1:p)=X(i,1:p)-L(i,1:i-1)*X(1:i-1,1:p);for k=1:p, for j=1:(i-1),

X(i,k)=X(i,k)-L(i,j)*X(j,k);end end

for i=1:N,for j=1:N phi(k)=phi(k)+a(i,j)*x_se(i)*f(j);end end

phi(k)=phi(k)+sum(a(1:N,1:N)’* x_se(1:N).*f(1:N),1);

for i=1:n,for j=1:n, for k=1:n,for l=1:n y(i)=y(i)+x(j)*A(i,k)* B(l,k)*C(l,j); end end end end

y(1:n)=y(1:n)+x(1:n)’*... (A(1:n,1:n)*B(1:n,1:n)’*C(1:n,1:n))’;

Settings Input time (s) Output time(s) speedup

i=500,p=5000 0.536s 0.030s 17

N=1000 0.174s 0.012s 14

n=40 0.622s 0.0001s 5000

Remaining Issues/Future Work

Each pattern transformation is local; no optimization over entire statement. e.g., we do not optimize and distribute transposes

Control flow within loop Function calls

functions are treated as pointwise operators (correct for many predefined arithmetic functions)

Incorporate our analysis directly with shape analysis

Summary

Contributions:A simple method to prevent incorrect

vectorization in MatlabA user extensible operator/dimensionality

pattern database can be used to improve vectorization

These patterns can make use of higher level semantics (e.g., matrix multiplication) or diagonal accesses in vectorization.

Acknowledgements

Funding provided by NSERC Grateful for reviewers comments and

suggestions

Thank You

Questions?

Documents

A Dimension Abstraction Approach to Vectorization in Matlab Neil Birkbeck Jonathan Levesque Jose Nelson Amaral Computing Science University of Alberta