Matrix Multiply Methods. Some general facts about matmul High computation to communication hides...

Matrix Multiply Methods

Some general facts about matmul

• High computation to communication hides multitude of sins

• Many “symmetries” meaning that the basic O(n3) algorithm can be reordered many ways

• As always must imagine data tied to a processor, or data in other ways

Think 64x64 processor grid>> A=ones(64,1)*(0:63); B=A’;>> subplot(121),imagesc(A), axis('square')>> subplot(122),imagesc(A'), axis('square')

20 40 60

Colors show cols of A and rows of B

Colors only match on the diagonals

Think 64x64 processor grid>> A=ones(64,1)*(1:64); B=A’;>> subplot(121),imagesc(A), axis('square')>> subplot(122),imagesc(A'), axis('square')

20 40 60

Colors show cols of A and rows of B

Colors only match on the diagonals

20 40 60

In symbols

• Aij and Bij

• Column of first matches row of second if j=i

• Using 0 based indexing let plus,minus denote modulo n

• Cij =Ai,i-j Dij=Bi-j,j

• A shifts by columns, B shifts by rows• Now colors match as required for matmul • Diag of C(D) now has 0th column of A (0th row of B)

In pictures

20 40 60

10 20 30 40 50 60

• At kth timestep k=0,1,…,n-1 let• Cij =Ai,i-j-k Dij=Bi-j-k,j

• Colors match again• What else would work?

Canon with bitxor (Edelman, Johnson,.)

• At kth timestep k=0,1,…,n-1 let• Cij =Ai,i-j-k Dij=Bi-j-k,j

• Implement by moving C left 1, and D up 1 at every cycle• Colors match again• What else would work? This is the kind of thinking that

goes into good parallel algorithm design• Cij =Ai,i * j * k Dij=Bi * j * k,j where “*” denotes bitxor • What do we need, column/row index match• Cycle through all indices

Bitxor• v=[]; for i=1:8, v(i,:)=bitxor( (0:7),i); end• v = 1 0 3 2 5 4 7 6 2 3 0 1 6 7 4 5 3 2 1 0 7 6 5 4 4 5 6 7 0 1 2 3 5 4 7 6 1 0 3 2 6 7 4 5 2 3 0 1 7 6 5 4 3 2 1 0 8 9 10 11 12 13 14 15

In Canon can go left/up or right/down(-k or +k equally good)

• Can even go left/up and right/down

• Think of this as evens left/up and odds Right/down

• Can thereby use north,south, east, west on a processor grid

ALeft ARight BUp

On a hypercube

• Hypercubes are connected by bitxor with powers of 2

• Gray Codes connect into cycles

• Point not so much hypercubes and Gray codes but the underlying structure of a matmul

2/27/08

CS267 Guest Lecture 1 12

Pros and Cons of CannonAccording to cs267 berkeley

• Local computation one call to (optimized) matrix-multiply

• Hard to generalize for • ( AE: Don’t think any of these are right. Don’t believe

everything you read)– p not a perfect square– A and B not square– Dimensions of A, B not perfectly divisible by s=sqrt(p)– A and B not “aligned” in the way they are stored on processors– block-cyclic layouts

• Memory hog (extra copies of local matrices)

In any event

• This algorithm is communicate, compute, communicate, compute

• If hardware allows of course one can overlap• In the end these are all owner compute

algorithms where the blocks of Aix and Bxy find their way to the owner (lots of people just broadcast it around at the theoretical cost of more memory and more bandwidth, but probably not in practice)

What does an n3 matmul require

• One way or another the products happen somewhere and the adds happen somewhere

• That‘s it. • Today bottlenecks may be from main memory

and nothing else matters.

What if

• We are allowed to break up a matrix as in matmul arbitrarily but no cost to the matmul itself as if there were just a foo that needed to bring together A(i,j) and B(j,k) and combine with a commutative and associative operator? What is the best way?

Matrix Multiply Methods. Some general facts about matmul High computation to communication hides...

Documents

What hides in the dark

Symmetries in 2HDM and beyond [2mm] Lecture 1: Describing ... · Lecture 2: symmetries in 2HDM Lecture 3: abelian symmetries in bSM models Lecture 4: non-abelian symmetries in NHDM

Symmetries in Physics

Forest schools making bird hides

Fearful Symmetries

Confrontation of symmetries

Creating symmetries

GROUP ANALYSIS AND RENORMGROUP SYMMETRIES · GROUP ANALYSIS AND RENORMGROUP SYMMETRIES ... An original regular approach to constructing special type symmetries for boundary ... the

Symmetries in QFT

universally accessible bird hides - WordPress.com...An audit of bird hides in Gauteng in 2012 indicated that, with better planning, many hides could have been con structed in a manner

Symmetries and Conservation Laws I Discreet Symmetries C, P, T › ~bettoni › particelle › Symmetries1.pdf · Symmetries and Conservation Laws I Discreet Symmetries C, P, T Introduction

Where Lead Hides

Symmetries and particles

Aretxaga - What the Border Hides

Perfect Symmetries Abstract - PhilSci-Archivephilsci-archive.pitt.edu/4144/1/PerfectSymmetries.pdf · 2010-10-07 · Perfect Symmetries Abstract While empirical symmetries relate

Spring symmetries

The ﬂavour puzzle and accidental symmetries · The ﬂavour puzzle and accidental symmetries Andrea Romanino SISSA ... “Flavour symmetries” acting on family indexes ... GUT,

When reality hides

Symmetries - Physics Coursescourses.physics.ucsd.edu/2014/Fall/physics215a/chap3.pdf · Symmetries A famous theorem of Wigner shows that symmetries in a quantum theory must correspond

SYMMETRY: A SYNOPSISThe symmetries that are really important in nature, are not the symmetries of things but the symmetries of laws. StevenWeinberg Symmetries in simple dynamical systems