34
HAL Id: inria-00073925 https://hal.inria.fr/inria-00073925 Submitted on 24 May 2006 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Effcient Block Cyclic Data Redistribution Loïc Prylli, Bernard Tourancheau To cite this version: Loïc Prylli, Bernard Tourancheau. Effcient Block Cyclic Data Redistribution. [Research Report] RR-2766, INRIA. 1996. inria-00073925

Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

HAL Id: inria-00073925https://hal.inria.fr/inria-00073925

Submitted on 24 May 2006

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Efficient Block Cyclic Data RedistributionLoïc Prylli, Bernard Tourancheau

To cite this version:Loïc Prylli, Bernard Tourancheau. Efficient Block Cyclic Data Redistribution. [Research Report]RR-2766, INRIA. 1996. �inria-00073925�

Page 2: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

ISS

N 0

249-

6399

ap por t de r ech er ch e

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

Efficient Block Cyclic Data Redistribution

Loïc Prylli and Bernard Tourancheau

N 2766Janvier 1996

PROGRAMME 1

Page 3: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe
Page 4: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

������������ ����������������������������� �"!#��$%�'&(��)+*,���-�.

/(021 354+6798%:;:=<�>2?(@BADCE7-?(>27F@�GH0JI�7F>2?(4FK�C2>LINM

O"PRQ2SJP�T2U+U#VXWZY�[ZP]\F^`_ba�VF\Fa9cdP]VFe,fdT2P�ThgigkjVFgiVFeml�ndToepVFe�qdV�qdQJrdrtsVFVFeuldP`sVFepVuT2c`vVFa�epwJe�aujVuU#VFeqd_be�a9P]_xndctsVFeO"PRQ2y]VFa{z{VF|}TLO

z~T2fdfdQJPRa�qdVXP]VF\F^`VuP]\F^`V�r,�`�2�2�2�.Y��JT2rL�J_bVuP,Wu�L�2��Y��2��fdThS2VFe

�����F�u�m�'�o�2����U+f`giVuU�Vur2a�_xr`S�gb_xr`VuT2P�TogbS2VundPpT��oVuP�r`VFgbe�QJr�qd_be�a9PR_�ndc`apVFq�U#VuU�QJP]w fdT2P�ThgigbVFg\FQJU+fdc`a�VuP]e�P�To_be�VFe�a-^`V¡fdP]Q`n`giVuU¢Qo£�q'Toa9T¤qd_be�a9P]_xndc`a�_bQJrBQo£,U�Toa9P]_b\FVFe�T2r`qB�2VF\Fa�QJPRe�TLU#QJr`Sa9^`V�fdP]Q`\FVFe�e�Q`P]eu¥§¦tgbQJ\F�o¨�\Fw`\Fgb_b\©qd_be�a9PR_�ndc`ap_iQ`r�e�VFVuU#eªa�QBe9c`_ba¬«VFgbg�£kQJPªU#Q2e�aªTogbS2QJP]_ba9^dU#em¥¦c`a�Q`r`V¡^dToe,apQ�\F^`QJQ2epV¬TªS2Q`QJq�\FQ`U�fdP]QJU#_be�V�£5QJP.a9^`V�e�_b­FV�Qh£{a-^`V¬n`gbQJ\F�2e�®=apQ�To\¯^`_bVF�2VªTS2Q`Q`q�\FQJU�fdc`a-Toa�_bQJr°T2r`q�\FQ`U�U#cdr`_i\uTha�_bQJr¤V-±¬\F_bVur`\Fw}TLr`q²T+S2QJQ`q�gbQJThq²ndTogxT2r`\F_xr`SJ³F¥,´�^`_be\F^`Q2_b\FV�^`VuTm�J_bgbw�qdVufdVur`qde~QJr�VuTh\¯^#Q`fdVuP�Toap_iQ`r�lJepQ�_ba_ieVFe�epVur2a�_xTog%a�QXnµVZTLn`giVDa�Q�S2Q,£�P]QJU¶QJr`Vqd_be�a9PR_�ndc`ap_iQ`r#a�QNT2r`QLa9^`VuP��2VuP]w,·dc`_b\¯�Lgbw2¥�¸�V"fdP]VFepVur2a�^`VuP]V"a-^`V¹TogbS2QJPR_ia-^dU#eº«V�_xU�f`gbVuU#VurLa�VFq_xr�a-^`VZ»d¼[,½�[,O%[N¼�¾ gb_xndP�T2PRw2¥�[¿\FQ`U�f`gbVFvJ_ba]w¬epa9c`qdw¬_beNU+ToqdVNa-^dToa�fdP]Qo�2VFea9^`VNV-±ª\F_iVur`\FwQo£(QJcdP�epQ2gxc`a�_bQJr�¥"´_xU#_xr`S�P]VFe9c`gba�etQJr#a9^`VN�Àr2apVFg�O�T2P�ThS2QJr�TLr`q�a9^`VN¼�PpTuw#´��oÁ�\FQJP�PRQJndQJPpToa�VFea9^`VDP]VFe9c`gba�eu¥�¸ÂV�e9^`Qo«�a9^`V�SJTh_�r+a9^dToa�\uTLr�ndVDQJn`a9To_xr`VFq¬c`ep_�r`S#a9^`V~S2QJQ`q�q'Toa-TDqd_iepa9P]_xndc`a�_bQJr«�_ba9^¤��r`cdU#VuP]_b\uTogº�oVuPpr`VFgie,T2r`q¤QJcdPP]VFqd_be�a9P]_xndc`a�_bQJr°P]QJc`ap_�r`VFem¥ÃÅÄJÆ(Ç]È#É��oÊ��2��fdTLP�TogbgbVFgd\FQJU�fdc`a�_xr`S'l2fdT2PpTogbgiVFgdgb_xr`VuT2PtTogbS2VundP�Tdl2fµVuP]e�QJrdTogb_b­FVFq¬Togbg˨�a�Qo¨ÌThgigd\FQ`U�¨U#cdr`_b\uToa�_bQJr�l%q'Toa9T�PRVFqd_iepa9P]_xndc`a�_bQJr�l�ÍNOHÎ�l'n`gbQJ\F�o¨�\Fw`\Fgb_b\�q'Toa9T�qd_be�a9P]_xndc`a�_bQJr

Ï5Ð#ÑÒ]Ó]ÔuÕ ÑÒ�Ö�×�ÓÌØ=ÙuÚ

Û`Ü�ÝmÞ�ß�àkÜ]á¹â(ã]ä,å-æ�Ü]çká�èmé�ÝuÞ�ßpàkÜÌá�ê]ß-ëë�èué¬ì�í�îºïZðZñ%íòï.ÛJó-ß�é#ð�ôxí�îºôöõ"÷Jìøܹá=Þ�ùpú�ù�ûýü(Ü]çká�þuéoù�éoê2ÿÜÝoù�Þ�ü ܺì�í�îºïtê]ß�éuá=Þ�ù-á���ô;ì ï���ü Ü����ºî��'î(ì��'îºïtù-ê]á=û ß�éNñ �ºñ�ì���ü ùì�ñ'ñ.ÝuÞ�ß���Þ�ù-ëëܺñ �ºî(ñ ��õ¬ê]ßpéuá=Þ�ù9áñ �ºî���������ïo÷

� Û`ù��2ßpÞ�ù9ákßpûýÞ�Ü�Û ô�������î(õ�ì�í�îºï����� �!"�Fñ%êÌßpü Ü�íºßpÞ�ëù�ü ܺï¯èuÝ(ÿÜ�Þkû ÜÀèmÞ�Ü$#uÜ(Û`ó9ßpé%�'&(�*)� ���)�+�Û",$��í¡ì�Ü�#mÜ.-/10 �32146587�9�:�;=<39�>"?�@�A876582�BC:171?6@�A"76582�D�9EA">39.7�B<�>

Unité de recherche INRIA Rhône-Alpes46 avenue Félix Viallet, 38031 GRENOBLE Cedex 1 (France)Téléphone : (33) 76 57 47 77 – Télécopie : (33) 76 57 47 54

Page 5: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

� �t!X�p$'�%&(��)+*Z���9�.�������������!X�t$¿!#�.#���"�t$§$%�����ø����"�t$��X�{& )X�p����$�{��'&(�"�p������t$

���Ä �����Ä�� ½ �ý_xU�f`gxT2rLa9Toap_iQ`r#qdV¹r`QowJT2c`v,q��ËTogbSµjVundP]Vtgb_�rtsVuTo_xP]Ve9cdP giVFe"U+To\¯^`_xr`VFe"fdT2PpTogbg5jVFgbVFeNjTUBsVuU#Q2_xP]Vqd_be�a9P]_xndctsVFVNfdQ2epV¹gbVNfdPRQJn`g5jVuU#Vq'c�\¯^`QL_iv#qdVgxT,qd_be�a9PR_�ndc`ap_iQ`r�qdVFe�qdQJrdrtsVFVFe"fdQJcdP�gbVFeU�Tha9P]_b\FVFeVFa¹gbVFe�2VF\Fa�VucdPRe¹e-cdP�gbVFe�qd_��ZsVuP]VurLa�e"fdP]Q`\FVFe�e�VucdP]em¥��Dr`VDqd_iepa9P]_xndc`a�_bQJr�n`gbQ`\-¨�\FwJ\Fgb_b·dc`Ve�VuU#n`gbV�\FQ`r2�2Vur`_xPDfdQJcdP~gxT�f`gxcdfdT2P]aDqdVFeZThgiSLQJP]_ba9^dU#VFeul%U�To_beNcdr�\FQ`U�fdP]QJU#_be�VFepaNr"sVF\FVFe�e-To_xP]Vq'T2r`etgbV�\¯^`QL_iv#qdV�gxT.a9To_bgbgiVDqdVFe"n`gbQJ\Fe�®kfµQJcdP�Tm�2Q2_xP�jTDgxTD£kQ2_be�qdVFe"\uTogb\uc`gbeVFa�\FQJU�U#cdr`_b\uToa�_bQJr`eV-±¬\uTh\FVFe�VFa¬cdr`V�ndQ`rdr`V�PJsVufdT2PRa�_ba�_bQJr�qdV�\F^dT2P]SLVu³F¥º½�V¤\¯^`QL_ivBQJf`a�_xU�Thg¹VFepaXqd_��,sVuPRVur2a�fdQ`cdP\F^dTo·dc`V�TogbS2QJP]_ba9^dU#V2lHVFaX_bg�VFepaXqdQ`r`\�VFe�e�VurLa�_bVFg�qdV�fdQJc`�LQ2_xP�fdToe�e�VuP�q�� cdr`V�qd_be�a9P]_xndc`a�_bQJr jTg�� T2c`a-P]V�a9PFjVFeXP�T2f`_bqdVuU#VurLau¥ �~QJc`e¬fdP`sVFe�Vur2apQJr`e�_b\F_�gbVFe¬TogbS2Q`P]_ba9^dU#VFe�qdV�P]VFqd_be�a9P]_xndc`a�_bQJr · c`Vr`QJc`eDTu�LQJr`e_�U+f`g�TLr2adsVFe�q'T2r`e�gxT�n`_xn`gb_bQ2a9^�jVF· c`V�»d¼[,½�[,O'[D¼�¾#¥��Dr`V�sVFa-c`qdVNqdV�\FQJU�f`gbVFvJ_badsV�`_bVur2a#Vur`e9c`_ba�V�fdP]Q`c`�2VuP,g��ýV-±¬\uTh\F_iadsV�qdVFe#e�Q2gxc`a�_bQJr`e#\F^`Q2_be�_bVFeu¥�½�VFe�fdVuPÀ£kQJP�U+T2r`\FVFe#QJn`a�Vur`c`VFee9cdP+��rLa�VFg�O�T2P�ToS2Q`r°VFa�¼�P�Tmw©´��oÁ�\FQJP�PRQJndQJPRVur2a#r`Q2e¬P`sVFe9c`gba9Toa�eu¥���Q`c`e�U�QJr2a-P]QJr`e�gbV¤SJTo_xrQJn`apVurJc�Vur�c`a�_bgb_ie-T2r2a°cdr`V�ndQJrdr`V°qd_be�a9P]_xndc`a�_bQJr qdVFe�qdQJrdrtsVFVFe¡Tm�2VF\°�©r`QowJT2c`v�qdV°\uTogb\uc`gr`cdU#VuP]_b·dc`VXVFaDr`Q2e£kQJr`\Fa�_bQJr`e�qdV#P]VFqd_be�a9P]_xndc`a�_bQJr�¥� É%�u�FÇ������Ä¿� \uTogb\uc`g�fdT2P�Togbg5jVFgbV2lºThgiS'jVundP]V#gb_xr"sVuTo_xP]V�fdT2P�TogbgkjVFgbV2l \FQJU�U#cdr`_b\uToa�_bQJr°Vur sVF\F^dT2r`S2Va�QLa9TogdfdVuP]e�Q`rdTogb_iedsVFV2l2P]VFqd_be�a9PR_�ndc`ap_iQ`r�qdVqdQJrdrtsVFVFeulEÍDO�Î�l2P`sVufdT2P]ap_iap_iQ`r,qdVqdQJrdrtsVFVFe�VurLa9P]VFgxTo\�sVFVfdT2P~n`giQ`\Fe

Page 6: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� -

. / ��'&(��!+*��������.

´�^`_be�fdT2fµVuP�qdVFep\uP]_xndVFe#a9^`V�e�QLg�c`ap_iQ`r�Qo£�a9^`V+q'Toa9T�P]VFqd_be�a9PR_�ndc`ap_iQ`r�fdP]QJn`gbVuU�T2P]_be�_xr`S�«N^`Vur_xU�f`gbVuU#VurLa�_xr`S¬gb_xr`VuT2P�ThgiSLVundP�T�_�r�T.qd_be�a9P]_xndc`a�VFq¤e�wJepa�VuU�¥2[Dgba9^`QJc`SJ^ªT#n`_ba¹e-fdVF\F_xTogb_b­FVFq(l�a9^`VfdP]Q`n`giVuU TLr`q�_iapeNe�QLg�c`ap_iQ`r�\FQJrLa9To_xr`e,fdQ2_xrLa�e�Qo£�S2Vur`VuP�Tog _�rLa�VuP]VFepaZP]VFS`T2P]qd_xr`S�q'Toa9T�\FQJU�U#cL¨r`_b\uToa�_bQJr°fdToapa�VuP�r`e_xr�q'Toa9T+fdT2P�TogbgbVFgºgxT2r`S`cdToS2VFeu¥¸�V�fdQ2_xrLaNQ`c`a¹a-^dToaa9^`V�fdT2fdVuP~_beNr`Q2aNThqdq'P]VFe�e�_xr`Sªa9^`V�fdP]QJn`gbVuU�Qo£"^`Qo«�a�Q�qdVFa�VuP�U�_�r`V

T�P]VFgbVF�oT2r2a,q'Tha9T�qd_be�a-P]_xndc`a�_bQJr�®5VF�2Vur�_Ë£¹VFvdfdVuPR_�U�Vur2a9Thg¹P]VFe9c`gba�e�TLP]V�S2_b�2VurÅ£kQJPD��rJcdU#VuPR_i\uThg�oVuPpr`VFgie-³Fl�ndc`aD^`QE« a�Q#_�U+f`giVuU�Vur2a�T#SL_i�LVur�P]VFqd_be�a9P]_xndc`a�_bQJr�¥�¥´�^`V¡fdPRQJn`gbVuU Qh£{q'Tha9T�P]VFqd_be�a-P]_xndc`a�_bQJrBQJ\F\ucdPRe�TheXepQJQJrÂToe�w2Q`c°qdVuTog«�_ba9^BT2P�P�Tmw`e,QJr

fdT2PpTogbgiVFg�qd_be�a9P]_xndc`a�VFq°U#VuU#QJPRw¬\FQJU+fdc`a�VuPFlL£ PRQJU��2VF\Fa�QJP]ea�QªUXc`gba�_˨�qd_xU#Vur`e�_bQJrdThg"T2P�P�Tmw`eu¥J� aT2fdf`gb_bVFe�ndQ2a-^�a�Q�q'Toa-Tm¨ÌfdT2P�TogbgbVFg�g�TLr`SJcdToS2VFe,e9c`\F^°Toe�ÍDO�Î�TLr`q²apQ¡»dOH|²Á�fdP]Q2SJPpT2U#eD«�_ba9^U#VFepe9ToS2V-¨ÌfdToepe�_xr`S'¥(��r°a-^`V10�PRe�aN\uThe�V#a9^`V�P]VFqd_be�a9P]_xndc`a�_bQJrÂ_be,_�U+f`gi_b\F_ba�_xr°T2P�P�Tmw�e�a-Toa�VuU#VurLa�egb_x�oV�24365§«N^`VuPRV72�VFa"5 T2P]V�a]«QZU+Toa9P]_b\FVFe"«~_ia-^#qd_��øVuPRVur2atqd_iepa9P]_xndc`a�_bQJr`eu¥d��r+a9^`Ve�VF\FQJr`q\uToepV�Tªgb_xndP�T2PRw�£�cdr`\Fa�_bQJr°^dToe,a�Q�nµV#\uTogbgiVFqBa�QªqdQ�a9^`V#e9T2U#V�QJfdVuPpToa�_bQJr°QJP~_ba,\uT2r°Togbe�Q�ndV^`_bqdqdVur°Toa�a9^`V�nµVFS2_xrdr`_xr`S²T2r`q�Vur`q¤Qo£HTLrªQJf`a�_xU#_b­FVFq²P]Q`c`a�_xr`V�\uTogbgk¥¸�VZfdPRVFe�Vur2aD^`VuP]V,a9^`V,TogbS2QJP]_ba9^dU TLr`q¬_xU�f`gbVuU#VurLa9Toa�_bQJrÅQo£ºa9^`V,P]VFqd_be�a9PR_�ndc`ap_iQ`r�P]QJc`ap_�r`V

a9^dTha¹_beNc`epVFq�_�r�»d¼{[Z½�[,O%[N¼�¾98;:%l �=<Ì¥?>,cdPepQ2gxc`a�_bQJrª_beNT�qdw rdT2U�_i\�TLfdfdP]QJTo\F^�_xrªQJPRqdVuPa�Q\FQJr`epa9P�c`\Fa"a-^`V�\FQJU�U#cdr`_b\uToa�_bQJr+e�VFa�eT2r`q�a9^`Vur#V-±ª\F_iVurLa�gbwª\FQJU�U#cdr`_b\uToa�V�a9^`VuUÅ¥@>,cdP"ThgiSLQo¨P]_ba9^dU c`e�VFe�e�VF�2VuPpTog�e�a9P�ToapVFS2_bVFe¹qdVufµVur`qd_xr`S�QJrªa9^`V�T2U#Q`cdr2aQo£ºq'Tha9T�a�Q�ndV,\FQJU�U#cdr`_b\uToa�VFqT2r`q°QJr¤a9^`V�a9T2P]SLVFaZT2PR\¯^`_ba�VF\Fa9cdPRV�\uT2fdT2n`_bgb_ba�_bVFe#_�r}QJP]qdVuP~a�Q¡nµV#�2VuP]w�£�Toe�am¥'�ka�P�cdr`e~£kQJPNTLr2wr`cdU#ndVuP~Qo£HfdPRQJ\FVFe�epQJP]eul'U+T2�2_xr`S¬Tm�oTo_bg�TLn`giV#a9^`V#fdQ2epe�_xn`_igb_ba]w�Qo£�gbQJToqd_xr`S�T2r`q�qdQo«NrL¨�gbQJToqd_xr`S£�P]QJUBAha�Q#QJr`V�fdP]Q`\FVFe�e�QJPapQ@Am£�P]QJU�U�T2rLw¬QLa9^`VuP]eu¥»`VF\Fa�_bQJr���S2_b�2VFe�TBndP]_bV-£XqdVFep\uP]_xf`a�_bQJr�Qo£XP]VFgxToapVFq «QJP��Leu¥"»`VF\Fa�_bQJr ��_xrLa9P]Q`q'c`\FVFe�a9^`V

»d¼[,½�[,O%[N¼�¾ q'Tha9T�qd_iepa9P]_xndc`a�_bQJr°U#Q`qdVFgieDT2r`q�r`Q2a9Toap_iQ`r`e{c`epVFq�_xrªa9^`_beZfdTLfdVuPF¥'»`VF\Fap_iQ`r!:fdP]VFepVur2a�e,a-^`VXTogbS2QJPR_ia-^dU#eNa-^dToaN«VuP]V#c`e�VFq¤£5Q`P�a9^`V#P]VFqd_be�a9P]_xndc`a�_bQJr}Qo£"q'Toa9TªT2r`q�epVF\Fa�_bQJr°�fdP]VFepVur2a�e,ap_�U�_�r`S°P]VFe9c`gba�e,QJn`a-To_xr`VFq�QJr¤qd_��øVuPRVur2a�U�Th\¯^`_xr`VFe¡® rdT2U#VFgbw�a9^`V�¼�PpTuw�´~�oÁ T2r`qa9^`V#��rLa�VFgºO�T2PpToS2QJrd³F¥

C � �"������"!ED§�~&��

Î%QJPT�gbQJr`S#a�_xU#V2ldP]VFqd_be�a-P]_xndc`a�_bQJr�«�Toe\FQJr`e�_bqdVuP]VFq��LVuP]w�qd_˱¬\uc`gbaN_xrªa9^`VDS2Vur`VuP�Thg�\uThe�V2ldT2r`qU#QLe�a�_xU�f`gbVuU#VurLa9Toa�_bQJr`e�T2P]V�P]VFepa9P]_b\Fa�_xr`S©a-^`V�fdQ2e�ep_�n`gbV°qd_be�a9P]_xndc`a�_bQJr`e�apQ}n`gbQ`\¯�BQ`P#\FwJ\Fgb_b\qd_be�a9PR_�ndc`ap_iQ`r`eF8 �dl��JltW2W2ltWElHG�<Ìl QJP,_xrÂe�QJU#V�_xU�f`gbVuU#VurLa9Toap_iQ`r`e�Thgig�n`gbQ`\¯�h¨�e�_b­FVFe¬^dToq�apQ²ndVU#c`gba�_xf`gbVXQh£òVuTo\F^ªQ2a9^`VuPRe�a�Q�VuToepV,e�QJU#V�U#VuU�QJP]w¬To\F\FVFepeNQJfµVuP�Toa�_bQJr`em¥»`QJU#V²PRVF\FVur2a�«QJP��Leªe9^`Qo« a9^dToaª_ba�\uT2r�ndV°qdQJr`V�Toaª\FQJU�f`_bgbV�a�_xU#V�_xr�a-^`V�S2Vur`VuP�Thg

\uToepVI8KJ`ltWu�=<¹QJP.qdVFe�\uP]_xndVªa9^`V�To\F\FVFe�e�Qh£~TLP�P�Tmw�VFgbVuU#VurLa�e#«�_ba9^°qd_��(VuP]VurLaXepa9P]_bqdVFeL8 �=<Ì¥�¦c`a

î(î�éNM'O 0 )�)

Page 7: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

: � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

S2Vur`VuPpTogbgiw�a9^`VFepV�«QJP��Le¬Toqdq'P]VFepe�VFq�P]V-¨�qd_be�a-P]_xndc`a�_bQJr�Qo£XT2PpP�TmwJe#«�_ba9^�T 0'v`VFq§rJcdU#ndVuP+Qo£fdP]Q`\FVFe�e�Q`P]e¬Toa#\FQJU�f`_bgbV-¨�a�_xU#V2lQJP�a9^`V�\FQ`U�f`_bg�Tha�_bQJr�Qo£,q'Toa9T¤VFgbVuU#VurLa�e�To\F\FVFe�e#£kQJP#n`gbQJ\F�o¨\Fw`\Fgb_i\+qd_iepa9P]_xndc`a�_bQJr�¥�¸�V�fdP]QJfµQ2e�V#T�P�cdrL¨�a�_xU#V�T2fdfdP]Q`To\¯^ÅQo£�q'Toa9Tm¨ÌP]VFqd_be�a-P]_xndc`a�_bQJr°«�_ba9^¤r`Q\FQJU+f`_igbV-¨�a�_xU#V+_�rL£kQJP�U+Toa�_bQJr�T2r`q�T#�oT2P]_xT2n`gbV�rJcdU#nµVuPQo£"fdP]QJ\FVFepe�QJP]eu¥

������ø� ������������!#���� !#��$%�'&(��)+*,���-�. �Z.! &(�t!X�p$'�%&(��)+*Z���9�.

´�^`V�»d¼[,½�[,O%[N¼�¾�gb_xndP�T2P]w°c`e�VFe,a9^`V�n`gbQJ\F�h¨�\FwJ\Fgb_b\¬q'Tha9T�qd_be�a-P]_xndc`a�_bQJr°QJr°T+�J_xP]a9cdThg�SJP]_bqQo£NfdP]Q`\FVFe�e�Q`P]e�_xrÂQJP]qdVuPDa�Q°P]VuTo\F^°S2Q`QJq°gbQJToq`¨ÌndTogxT2r`\FV2lHS2QJQ`q°\FQJU�fdc`a9Tha�_bQJr°V-±¬\F_bVur`\FwBQJrT2PpP�TmwJe�T2r`q T2r�VF· cdTog#U#VuU#QJPRw�c`e9ToSLV}nµVFa]«¹VFVur fdP]QJ\FVFepe�QJP]eu¥~[ZPpP�TmwJe�T2P]V�«DP�T2fdfdVFq n2wn`gbQ`\¯�Le"_xrªTogbgdqd_�U�Vur`e�_bQJr`e\FQJP�P]VFe9fµQJr`qd_xr`S�a�Q,a-^`V{fdPRQJ\FVFe�epQJP�SJP]_bq(¥J´~^`V¹Î�_bSJcdPRV�W_bgigxc`e�a-P�Toa�VFea9^`V+QJP]SJT2r`_b­uToap_iQ`r°Qo£¹a-^`V¬n`gbQJ\F�o¨�\Fw`\Fgb_b\�qd_be�a9P]_xndc`a�_bQJrÂQo£NT��oÁ T2P�PpTuw`e,QJr°T��hÁ�S`P]_bq�Qo£��fdP]Q`\FVFe�e�Q`P]eu¥´�^`V�qd_be�a9PR_�ndc`ap_iQ`r°Qo£¹T�U�Tha9P]_bv�_beNqdV$0�r`VFq©nLwª£kQJcdPNU+To_xr�fdT2P�TLU#VFa�VuP]e� %T¬n`gbQ`\¯�ª«�_bqda9^

e�_b­FV2l�����T�n`gbQJ\F�°^`VF_bSJ^2a,ep_i­FV2l�����a9^`V�r`cdU#ndVuPDQo£�fdP]QJ\FVFepe�QJP~_�rÂT�P]Qo«�l����������'a-^`V¬rJcdU#nµVuPQo£�fdP]Q`\FVFe�e�QJP]e,_xr�Tª\FQLg�cdU+r�l������! "TLr`q�£kVF« Q2a9^`VuPRe,a�Q¬qdVFapVuP�U#_xr`V2l(«N^`Vur°Tªe9cdnL¨ÌU+Toa9P]_bv�_bec`e�VFq(l�«D^`_i\F^¤VFgbVuU#Vur2a,Qh£"a9^`V�S2gbQJndThgòU�Tha9P]_bv¬_be,a9^`V�e�a9T2PRa�_xr`S¡fµQ2_xr2a�T2r`q¤«N^`_b\¯^°fdP]Q`\FVFe�e�Q`P_ba,ndVFgbQJr`S2eDa�Q'¥

(0,1) (0,2)

(1,0)

(2,0)

(3,0)

(0,4) (0,5)(0,3)(0,0)

(2,3)

Blocks owned by the processors [0,0]

Grid of Processors [2,3] Block - Matrix

0 1 2

1

0 (2,0)

(1,0)

(3,0) (3,3)

(1,1)

(3,1)

(1,4)

(3,4)

(1,2) (1,5)

(3,2) (3,5)

(0,3)

(2,3)

(1,3)

(0,1) (0,4)

(2,1) (2,4)

(0,2) (0,5)

(2,2) (2,5)

(0,0)

Î�_bSJcdP]VDW� o´�^`V¹n`gbQ`\¯�D\Fw`\Fgb_i\~q'Toa9Tqd_be�a9PR_�ndc`ap_iQ`r�Qh£�T��oÁ T2P�PpTuwDQJr,TD��"��S`P]_bq�Qo£�fdP]Q`\FVFe�e�Q`P]eu¥

��r�»d¼[,½�[,O%[N¼�¾#l2a-^`V¹V-±ª\F_iVur`\Fw+Qo£øP]VFqd_be�a-P]_xndc`a�_bQJr�_bet\uP�c`\F_xTog�Toe�_xr�TLr2w#q'Toa9TDfdT2P�ThgigbVFgT2fdfdPRQJTo\F^�ndVF\uT2c`epVN_ba"e-^`QJc`gbq¡nµVZr`VFS2gb_bS2_xn`gbV�QJP�Tha"gbVuToe�ate9U�Togbg'\FQ`U�fdT2P]VFq�apQ,a9^`V�\FQJU�fdc`a-Tm¨a�_bQJr+_ba"«�Toe�qdQJr`V{£5QJP ¥2´�^`_be¹_betVFe9fdVF\F_xTogbgbw�qd_˱¬\uc`gba¹ep_�r`\FVDa9^`VNPRVFqd_iepa9P]_xndc`a�_bQJrªQJfµVuP�Toa�_bQJr�^dThe

ôöíºî�ôxõ

Page 8: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� �

a�Q�ndV+qdQJr`V#qdw rdT2U�_i\uThgigbw2l «�_ba9^°r`Q�\FQJU�f`_bgbV-¨�a�_xU#V¬Q`P�e�a-Toa�_b\�_xrL£kQJP�U�Toap_iQ`r�¥(´�^`_be,qdw rdT2U�_i\T2fdfdPRQJTo\F^B_�U+f`gi_bVFe¤a9^dToa#«V�qdVuTog£�P]QJU a9^`V²nµVFS2_xrdr`_xr`S�«�_ba9^�a9^`V�U�Q2e�a#S2Vur`VuP�Tog~\uToe�V�Qo£P]VFqd_be�a-P]_xndc`a�_bQJr°TogbgiQo«VFq�nLw¬QJcdP{\FQJr`e�a9P�Th_�rLa�euldrdT2U�VFgiwª\Fw`\Fgi_b\#«�_ba9^�n`gbQ`\¯�LeNQh£ºe�_b­FV¡®���� �u³HQJrT�������� " �����! ��`_�PRa9cdTog'S`P]_bq�a�Q�\Fw`\Fgb_i\,«~_ia-^�n`giQ`\F�2eQo£ºep_i­FV�®������ ���x³�Q`r¬T ���� ��� " ������ �J_xP]a9cdThgSJPR_iqo¥@>,cdPP]QJc`ap_�r`VFe�T2PRVNTogbe�Q�c`e9T2n`gbV,«N^`Vur+qdVuTogb_xr`S�«�_ba9^+e9cdnL¨ÌU�Toa9PR_i\FVFe�®kndc`aqdQ#r`Q2aa9T2�oV_xrLa�Q¡Th\F\FQJcdr2ae�a-P]_bqdVFe9³F¥|²Q`P]VFQo�2VuPFlJr`QªgxToa�Vur`\Fw�^`_bqd_xr`S�a�VF\F^dr`_i·dc`VFeDQJPQo�2VuP]gxT2fdf`_xr`Sª\uT2r�ndV#c`e�VFq²nµVFaÌ«VFVurÅa9^`V

P]VFqd_be�a-P]_xndc`a�_bQJr�T2r`q�a9^`V�fdP]VF�`_bQJc`e\FQJU�fdc`a9Tha�_bQJr#ndVF\uT2c`e�VDa9^`VFe�VDP]QJc`a�_xr`VFe¹TLP]V�_xr`qdVufdVur`qdVurLa®kPRVuU�T2P��.a9^dToa�_ba�qdQJVFer`Q2a"fdP]VF�2VurLa"a-^`V{c`epV�Qo£�a-^`VFe�V�a�VF\F^dr`_b· c`VFet_�r`ep_iqdV,a-^`V{PRVFqd_iepa9P]_xndc`a�_bQJrP]Q`c`a�_xr`V�_ba�e�VFgË£�l�Toe~_ia~_ieDVFvdf`g�Th_�r`VFq°_xr�epVF\Fa�_bQJr�:'¥Ë�2³F¥

� � �"!#��$%�'&(��)+*,���-�,���� �N�~&(�]�����

´�^`V+«N^`Q2gbV¡fdPRQJn`gbVuU Qo£q'Toa9T�P]VFqd_be�a9PR_�ndc`ap_iQ`r�_be�£5Q`PDVuTo\¯^°fdP]Q`\FVFe�e�Q`P,a�Q!0�r`q�«D^`_i\F^Âq'Toa9Te�apQJP]VFq�gbQ`\uTogbgiwB^dToe�a�Q�nµV�e�VurLa,a�Q�a9^`V�QLa9^`VuP]e�T2r`q©PRVFe9fdVF\Fa�_b�2VFgbw©^`Qo«�UXc`\F^¤q'Toa9Tª_ba�«~_igbgP]VF\FVF_b�2V�£�P]QJU�a-^`V�Q2a9^`VuPReZT2r`q¤«N^`VuP]V�_iaD«�_bgbgºepa�QJP]V,_ba,gbQJ\uThgigbw2¥�¸§^`Vur�a-^`V�\FQJU�U#cdr`_b\uToa�_bQJrndc �(VuP]e¬TLP]V�ndc`_bgia�®kP]VFe-fdVF\Fa�_b�2VFgbw�PRVFe�VuP]�2VFq'³Fl�a9^`VFw�^dTm�2V�a�Q�ndV¤a9P�T2r`eR£5VuP�PRVFq nµVFaÌ«VFVur a9^`VfdP]Q`\FVFe�e�Q`P]eu¥

����� ��������� ��!#"%$ �'&(��&*),+%�'$-).&*/��'�0&

1�É 32 ���F�d�54xÉ06�É.7¬��8�ħÊ(�d�F� �FÄ`�u� �5£�«¹V�Toe�e-cdU#V�a9^dToa�a-^`V�q'Toa9T°T2P]V�e�apQJP]VFq�\FQJrLa�_˨SJc`Q`c`e�gbw _xr§TBn`giQ`\F� \Fw`\Fgb_b\°£ The9^`_bQJr�QJr�a9^`V�fdP]Q`\FVFe�e�Q`P]eulta9^`V�fdP]QJn`gbVuU _be�a9^`Vur�a�Q 0�r`q«N^`_b\F^�q'Tha9T�_ba�VuU#e�e�a�QJPRVFq�QJr�fdP]Q`\FVFe�e�Q`P ��9�«~_igbgndVªe�Vur`q©apQ�fdP]QJ\FVFepe�QJP �;:E¥�´�^`VFe�Vªq'Toa9T_ba�VuU#e�^dTm�2V#a�Q�nµV¡fdTo\F�oVFq�_xrÂQJr`V¬U#VFepe9ToS2V�ndV-£kQJP]V�ndVF_xr`S°e�VurLa�a�Q � : _xrÂQJP]qdVuP.a�Q�Tm�2Q2_bqe�a-T2P]a]¨ÌcdfªqdVFgxTmwJem¥

>,cdPºTogbS2Q`P]_ba9^dU e�\uT2r`e�Toa(a9^`Ve9TLU#V�a�_xU#Va9^`V¹U+Toa9P]_bv,_xr`qd_b\FVFe"Qo£'a9^`Vtq'Toa9TDn`gbQJ\F�2e�e�a�Q`P]VFqQJr � 9 T2r`q�a9^`QLe�V�a9^dToat«�_bgbgønµV,e�a�QJPRVFq�QJr � : ¥L|°QJP]VNfdP]VF\F_be�VFgbw2ld«VN�hVFVuf�a]«Q,\FQJcdrLa�VuP]eulLQJr`V\FQJPpP]VFe9fdQ`r`qd_�r`S�apQ �09 � eNq'Toa-T#gbQJ\uToap_iQ`r�_xr¤a9^`V�S2gbQJndTog�U�Toa9P]_bv�T2r`q¤a9^`V�QLa9^`VuP�apQ �:��ýe�Q`r`V2¥¸�V�_xr`\uP]VuU#VurLa�a9^`VuU�fdP]Q2SJP]VFepe�_b�2VFgbw¡nLw¡n`gbQ`\¯��The�_xr�T�U#VuPRS2VNepQJP]a�_xr�Q`P]qdVuPta�Q�qdVFa�VuP�U�_�r`Va9^`V+QE�LVuP]gxT2f�T2P]VuThe�Toe�e9^`Qo«Nr¤_xrÂÎ�_iS`cdP]V¬�©®5a9^`V#\FQJU�f`gbVFvJ_ba]w�_be�gb_xr`VuT2P�_xr�a-^`V¬rJcdU#nµVuP,Qo£n`gbQ`\¯�Le9³F¥´�^`Vur�«V�fdTo\F�Âa9^`V�q'Toa-T°_ba�VuU#eª\FQJP�P]VFe9fµQJr`qd_xr`S©apQ�a9^`V�Qo�2VuP]gxT2f�T2P]VuToe#_xr�QJr`VU#VFepe9ToS2V,a�QªndV�epVur2a�apQ �;:2¥

< í�ß9á=û ê]Üá>=Eù-á�á>=uû ç$�-ÜÀéEÜÀÞ�ùpüøê]ù-çkÜtûýéEêÀüýè8#mÜÌçá>=oÜ"ü ß-ù�#Fûýé8��ù�é�# #mß@?�é"�öü ß9ù�#Fûýé��Xß A #mù9ákùBAbÞ�ß9ë�ù~ÝuÞ�ßFê]Ü]çkçkß�Þákßùë�èmü á=û êÌß-ëtÝuèoákÜ�Þù�é�#Nùpü çkß�ê]ùpüýü ç�ákß"Ýoù�Þ�ùpüýü ÜÀüdÞ�ß�èoá=ûýéEÜ]çCAbÞ�ß9ë�ùçkÜED-èEÜÀéuá=û ùpü'ê]ß1#uÜ-÷

î(î�éNM'O 0 )�)

Page 9: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

1�É ��6�4;�h�d�54xÉ06©É 7��8�Ä²Ê �d�F���FÄ`�u� ´�^`VD\FQJU�U#cdr`_b\uToa�_bQJr`e�a-^dToa"Q`\F\ucdP�ndVFa]«VFVur+a9^`VfdP]Q`\FVFe�e�Q`P]eN\FQ`P�P]VFe9fµQJr`q�a�Q�TªfdVuP]epQJrdTogb_b­FVFq � Togbg ¨�apQo¨ÌTogbg��'l'_k¥ýV2¥�T#a�QLa9Tog�VFvJ\F^dT2r`S2V�«�_ba9^�U�VFe]¨e9ThS2Ve�_b­FVFe"qdVufdVur`qd_xr`S�Qh£'a9^`VNfdP]Q`\FVFe�e�QJPReu¥E´~^`V�VuU#_be�e�_bQJr�ndc �(VuP�_be 0'gbgbVFq�T2r`q�a9^`V�P]VF\FVuf`a�_bQJrndc �(VuP]e�ep_i­FVFe,_be�qdVFapVuP�U#_xr`VFq°a9^dT2rd�Le�a�Q#a9^`V�\FQ`U�fdc`a9Toa�_bQJrªQh£ºa9^`V�q'Tha9T�e�VFa�eu¥

intersection in two dimensions

distribution 1indices intersection bufferdistribution 2

Î�_bSJcdP]V�� ���PpT2fd^`_b\uTogòPRVufdP]VFe�VurLa9Toa�_bQJrÅQo£"a9^`V�P]VFe�VuT2PR\¯^ÅQo£"a9^`V#_xr2a�VuPRe�VF\Fa�_bQJr¤Qo£"a]«Q¬n`gbQJ\F�o¨\Fw`\Fgb_i\Âqd_iepa9P]_xndc`a�_bQJr`eª£kQJP�T�fdTo_xP+Qo£�fdP]QJ\FVFepe�QJP]eu¥ >,r`V�qd_xU#Vur`e�_bQJr�\uToe�V¤QJrBa9^`V¤giV-£kaul�a]«Qqd_xU#Vur`e�_bQJr¤\uToepV,QJr�a9^`V�PR_iS`^2au¥

����� � ) &%) ���� ����� � ! "*$

´�^`V�n`gbQ`\¯�+e�\uT2rdr`_xr`Sª_be�qdQJr`V,qd_xU#Vur`e�_bQJr¤n2w�qd_xU#Vur`ep_iQ`r ®5e�VFVNÎ�_bSJcdPRV��2³T2r`q¬a-^`V,QE�LVuP]gxT2fL¨f`_xr`Sª_�r`qd_b\FVFe�T2PRV,a9^`V�¼�T2P]apVFe�_xT2r¬fdPRQJq'c`\FaDQo£ºa9^`V,_xrLa�VuP]�oTogbeN\FQJU+fdc`a�VFq�_xr�VuTo\F^ªqd_xU#Vur`e�_bQJr®=a-^`VuP]V#_ie�r`Q�gb_xU#_ba9Toap_iQ`r°_xr�a-^`V�r`cdU#ndVuPDQo£"qd_xU#Vur`ep_iQ`r`eXep\uT2rdr`VFq�T2r`q°a9^`V#\FQJU�f`gbVFvJ_ba]w�_begb_xr`VuT2PD_xr�a9^`V,e-cdU�Qo£ºa9^`V,qd_xU#Vur`ep_iQ`r�ep_i­FVFeD«N^`_bgbV#a9^`V�fdTo\F�2_xr`S�_beDQJn2�`_bQJc`e�gbw�gb_xr`VuT2P�_xrÅa9^`Ve�_b­FV�Qo£òa-^`V�q'Toa9TL³F¥´�^`_beX«QJPp��_be#qdQJr`Vª_xr�VuTo\F^BfdP]QJ\FVFepe�QJP�_xr�QJPRqdVuP,a�Q¤e�Vur`q©q'Tha9T�T2r`q�P]VFe9fµVF\Fa�_b�2VFgbw�a�Q

P]VF\FVF_b�2V#q'Toa-TXTLr`q�e�a�QJP]V,a-^`VuU Toa�a9^`V�PR_iS`^2aNf`gxTo\FV�_xr�gbQJ\uThgºU#VuU#QJPRw2¥'´�^`V,«N^`Q2gbV�\FQJU�U#cL¨r`_b\uToa�_bQJr¤_beNT�fµVuP]e�QJrdThgi_b­FVFq � Togbg˨�a�Qo¨ÌThgig���Toe~_�rÅe�VF\Fa�_bQJr!:'¥öW2¥

��� � ��� !#��$-����.! �'�0&*/

¸�V�fdPRVFe�Vur2a#_xrBa9^`_be#e�VF\Fa�_bQJrBe�VF�LVuP�Tog"Q`f`a�_xU#_b­uToa�_bQJr`e�Qo£Da9^`V¬SLVur`VuP�Tog�TogbS2QJP]_ba9^dU a9^dTha�«V_xU�f`gbVuU#VurLa�VFq(¥�´�^`V 0�P]e�aDQJr`V�P]VFq'c`\FVFe�a-^`V�TLU#QJcdrLaNQo£a�_xU#V�T2r`q�U#VuU#QJPRw�r`VF\FVFe�e9T2P]w¤£kQJPa9^`V�\FQJU�fdc`a9Tha�_bQJr�Qo£�a9^`V�q'Toa9T#e�VFapeZT2r`q¤a9^`V,Q2a9^`VuPReN\FQJr`\FVuPpr�a9^`V�\FQJU�U#cdr`_b\uToa�_bQJr�Qh£"a9^`VP]VFe-c`giap_�r`S�ndc �(VuP]e~qdVufdVur`qd_xr`S�Qo£�a9^`V,a9T2P]SLVFa{^dTLP]qd«�T2P]V�A�>,»ª\uT2fdTLn`_igb_ba�_bVFeu¥

ôöíºî�ôxõ

Page 10: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� �

�(�o�6C6;4�6��B� ´�^`V�QJnL�`_iQ`c`e�e�\uT2rdr`_xr`S,e�a9P�Tha�VFS2w,a�VFe�apeºQJr�VuTo\F^#fdP]Q`\FVFe�e�QJP�a9^`V«N^`QLgiV�P�TLr`S2VQo£'_xr`qd_b\FVFeQo£'a9^`VtaÌ«Q�q'Tha9T�qd_be�a-P]_xndc`a�_bQJr`e�a�Q 0�r`qXa-^`V¹_ba�VuU�eºa9^dThaònµVFgiQ`r`S2e�a�QDa9^`V¹fdP]Q`\FVFe�e�Q`P�09XT2r`q�e9^`QJc`gbq§ndV°\FQJU�U#cdr`_b\uToa�VFq�a�Q �;:E¥¹¦c`a�Toeª«¹V¤«�_bgbg,e�VFV�_xr�fdP]QJfµQ2e�_ba�_bQJr :'lta9^`V_xrLa�VuP]e�VF\Fa�_bQJr�Qo£,_xr2apVuP]�ETogbe�®=a9^dToa�P]VufdPRVFe�Vur2aªa9^`V�_xr`qd_b\FVFe�Qo£,a9^`V�q'Toa-T°_ba�VuU#e9³#_xr�T�n`gbQJ\F�\Fw`\Fgb_i\Dqd_be�a9P]_xndc`a�_bQJr�_iet_xr,£�To\Fa"fµVuP]_bQJqd_b\DQo£�fdVuP]_bQ`q�gb\uU²® P�O�l P � O � ³F¥F»`Q'lo_xr`e�a�VuToq#Qo£�T~£�c`gbg'n`gbQJ\F�e�\uTLrdr`_�r`S%lºa-^`V�e�\uT2rdr`_xr`S�TogbS2QJP]_ba9^dU \uT2rÂe�a�QJf°Toe�e�QJQ`r�Toe�_ba#P]VuTo\F^`VFe�a9^`V�\Fw`\Fgb_b\�ndQJcdr`q(¥|°QJP]VFQo�2VuP~a9^`_be�Togbe�Q�P]VFq'c`\FVFe�a-^`VXndQ`cdr`q�QJr�a-^`V�r`VF\FVFepe9T2P]w¤e�a�QJP�ThS2VD£5Q`P�a9^`V#_xrLa�VuP]e�VF\Fa�_bQJrfdToapa�VuP�r`eu¥

��� Æ�6ø� 8(�mÉ�6�É��(�����mÄd�oÄ 4���Ä�¹�oÉ� �C6;4=�o�d�54öÉ�6�� ��r.a9^dToa�\uToe�V2lma9^`V"\FQJU+UXcdr`_b\uToap_iQ`rXThg ¨S2Q`P]_ba9^dU _be"e�_xU�f`gbV2¥d´�^`V�e�_b­FVFetQo£�a9^`VDU#VFe�e9ToSLVFe�a�QZnµVNP]VF\FVF_b�2VFq�T2P]V\FQJU�fdc`apVFq 0�P]e�au¥2´~^`Vur�la9^`V�The�w r`\F^dP]QJr`Q`c`e{PRVF\FVF_i�LVFeNT2P]V,fdQ2epa�VFq�£kQ2gbgbQo«¹VFq�nLw�a9^`V,epVur`qdeu¥d´�^`VuP]V,_be�r`Q¬Toe�e9cdU+f`a�_bQJrQJrÅa9^`V,a9T2PRS2VFa\FQJU�fdc`a�VuP�TLn`_igb_ba]w�a�Q¬P]VF\FVF_b�2V#U#VFe�e9ThS2VFe� ��( �� � ���N¥

��Æ�6(� 8(�mÉ�6�É��(���oÉ� �C6;4=�o�d�54öÉ�6 � �Àr Q`P]qdVuP�apQ Tm�2Q2_bq >,» qdVuThq�gbQ`\¯�Le�q'c`V°a�QBa9^`V0'v`VFq²r`cdU#ndVuPQo£�\FQJU�U#cdr`_b\uToa�_bQJr�ndc �(VuP]e{£5QJPt_xr`e�a9TLr`\FV2l'«VDqdVFe�_bSJr`VFq�T2r�TogbS2QJP]_ba9^dU a9^dToa_be�c`e�_xr`S�T�n`giQ`\F�2_xr`S©P]VF\FVF_b�2V�fdP]Q2apQJ\FQ2gk¥�|°QJP]VFQo�2VuP l�_�rÂQJP]qdVuPDapQ²U#_xr`_xU#_b­FV�a9^`V¬fdP]Q`\FVFe�e�Q`P_bqdgbV¬ap_�U�V2l'VFv`\¯^dTLr`S2VFe�T2P]V#ndc`_igbaul _k¥ýV2¥(TogbgòPRVF\FVF_i�LV#£ cdr`\Fap_iQ`r�\uThgigbe#^dTu�LV�a9^`V#\FQJP�PRVFe9fdQJr`qd_xr`Se�Vur`q�£ cdr`\Fa�_bQJr�\uTogbgbe¹fµQ2e�a�VFq�nµV-£5QJPRV¹Q`PºToa�a9^`Ve9T2U�V"a�_xU#V�®=a9^`_be"P]VuU�QE�LVFeºa9^`V���� �� )=���������,��%����#�%��Toe�e9cdU�f`ap_iQ`rªQJrªa9^`V�a-T2P]S2VFa\FQJU�fdc`apVuP�³F¥´�^`_beXepa9P�Toa�VFSLw2l�_igbgxc`e�a9PpToa�VFq�_xr�Î�_bSJcdPRV¡�dl \uT2rBndVª\FQJU�fdT2P]VFqBa�Q°T�P]Q2gbgb_xr`S�\uToa�VuPpf`_igbgxT2P

Qo£¹fdPRQJ\FVFe�epQJP]eD«N^`VuP]VXThaNT�SL_i�LVur�e�a�Vuf��'lµVuTo\¯^¤fdP]Q`\FVFe�e�QJP � 9 l�®k������� �,³DVFv`\F^dT2r`S2VFeD_iapeq'Toa-T#«�_ba9^�fdP]Q`\FVFe�e�QJP ��� �"!$# 9 #&%('�) � %*!+'F¥

0 1 2 3

7 6 5 4

Step 1 step 2

0 1 27 3 6 5 4

step 3

7 0 1 2

6 5 4 3....8 processors

7 processors0 1 2 36 5 4

0 1 26 5 4 3

6 0 1 25 4 3

....

Î�_bSJcdP]V�� J´�^`VD\uToa�VuP�f`_bgbgxT2P~\FQJU�U#cdr`_b\uToa�_bQJrªU#VFa9^`Q`q�_ie_bgbgxc`e�a9P�ToapVFq�«�_ba9^ªT2r+VF�2Vur�®�J2³tT2r`qT2r}QJqdq§®k�2³DrJcdU#nµVuP�Qh£{fdP]Q`\FVFe�e�QJPReu¥(´�^`V�\FQJU+UXcdr`_b\uToap_iQ`r�Q`\F\ucdP]eXnµVFa]«¹VFVur}a9^`V��2VuP]ap_i\uThgfdTo_xP]eDQo£"fdP]Q`\FVFe�e�QJP]e#®kT#fdP]QJ\FVFepe�QJP�TogbQJr`V�\FQJU�U#cdr`_b\uToa�VFe�«~_ia-^�_ba�e�VFgË£¯³F¥

î(î�éNM'O 0 )�)

Page 11: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

J � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

1�É ��6�4;�h�d�54xÉ06 2�4>2�Ä ��4�6;4�6 �¿� ´�^`V�f`_xfdVFgb_xr`V¬U#VFa-^`QJq¤a9T2�oVFe�Toqd�oT2r2a-ToS2V,Qo£ºa-^`VXfdQLe]¨e�_xn`_bgb_baÌw�Qh£øqd_b�`_bqd_�r`S+a9^`VN«QJPp�,_xr�e-U�Togbg�cdr`_ba�eu¥d�Àr`e�a�VuToqªQo£(«�To_ba�_xr`S,£kQJP"Togbg%a9^`VN_xrL£kQJP�U�Tha�_bQJr£�P]QJU T2r`QLa9^`VuP�fdP]Q`\FVFe�e�QJPFloVuTh\¯^�fdPRQJ\FVFe�epQJP ��9(P]VF\FVF_b�2VFeTDe9U�Togbg�fdTo\¯�hVFa�Qo£�VFgbVuU#VurLa�eul2T2r`q+_�ra9^`VDe9T2U�V�a�_xU#V,fdTo\¯�Le¹T.e9U�Togbg�fdTo\¯�hVFa"Qo£(VFgbVuU#VurLa�ea�Q�ndVDe�VurLa{TLr`q¬cdrdfdTo\F�2ea9^`V�VFgbVuU#VurLa�e_batyRc`e�aDP]VF\FVF_b�2VFq(¥�´~^`_ieD_beNT2rªQo�2VuP]gxT2fdf`_xr`Sªe�a9P�Tha�VFS2w�\FgbQ2e�V�a�Q#a9^`V�«QJPp�#qdVFe�\uP]_xndVFq°_xr 8Ë��<]¥´�^`_be,QJf`a�_xU#_b­uToa�_bQJr¤_beN_xU�f`gbVuU#VurLa�VFq�«~_ia-^`_�r¤a9^`V�\uToa�VuP�f`_bgbgxT2P�U�VFa9^`QJq¤«N^`VuP]V.£5QJP~VuTo\F^

e�apVufªa9^`VuP]V,T2P]VDe�VF�2VuP�Tog%e�Vur`q�A2P]VF\FVF_b�2V�VFvJ\F^dT2r`S2VFem¥ ´~^`_ie~QE�LVuP]gxT2f¬nµVFa]«¹VFVur+\FQJU�U#cdr`_b\uToa�_bQJrT2r`q�\FQJU�fdc`a9Tha�_bQJrD_xU�fdP]Qo�2VFq,a9^`V�a�_xU#_xr`S2e�QJr,U�Th\¯^`_xr`VFe�«�_ba9^�P�Tha9^`VuP%e�gbQE«Â\FQJU�U#cdr`_b\uToa�_bQJr`e � ¥� ��� � �#�������]�J� �D� *.!��¸�V\FQJr`ep_iqdVuP�a-^`V¹P]VFqd_be�a9PR_�ndc`ap_iQ`r#Qo£�T�U#c`gba�_bqd_xU#Vur`e�_bQJrdTog(T2P�PpTuw~Qo£'e�_b­FV�� "�� � "���!"����,¥´�^`Vªq'Toa9TÅqd_be�a9P]_xndc`a�_bQJr`e¬TLP]V¬qdV$0�r`VFq nLw°q'Toa9T¤n`giQ`\F��e�_b­FVFe � � �� � � ��� T2r`q�T�fdP]Q`\FVFe�e�Q`PSJPR_iq�Qh£ºe�_b­FV � " � � " � � " ���§Toe_xrªe�VF\Fa�_bQJr��µ¥J[ fdP]_xU#V,«�_bgbg(_xr`qd_b\uToa�V�a9^`V�fdT2P�T2U�VFa�VuP]e_xr�a-^`V�a9T2PRS2VFaqd_iepa9P]_xndc`a�_bQJr�¥Î'QJP~a9^`V�e-T2�oV#Qo£"e�_xU�f`gb_b\F_baÌw2l «V�qdVF\FQJU�fµQ2e�V#a9^`V#e�a9c`qdw¤_xr²�ªfdT2P]apeN\FQJPpP]VFe9fdQ`r`qd_�r`S¤a�Q

a9^`V�\FQJrLa9P]_xndc`a�_bQJr�Qh£º_xr`qd_i\FVFe�\FQJU�fdc`a9Tha�_bQJr�T2r`q�q'Toa9T+U#Qo�2VFe�T2r`q�a9P�T2r`eR£5VuP]e ���uÉ2�É���4x�54öÉ�6�� ���@� ����� � � �� �� ����� ����L�&%�� ��� % 9��� �E9 3!� � ��"$#&% � ��� ) )

'���� (*)�&*&*��&���) ��$ �%�'),+*� !�-´�^`V�n`gbQ`\¯�+e�\uT2rdr`_xr`Sª_be�qdQJr`V~£5QJPtVuTo\F^�fdP]Q`\FVFe�e�QJPt_xrªQJP]qdVuPta�Q#e�Vur`q�a-^`V,q'Toa9T�TLr`q�P]VFe9fdVF\-¨a�_b�2VFgbw�a�QªP]VF\FVF_b�2V�a9^`V,q'Tha9T�T2r`q�a�Q+e�a�QJP]VFq�a-^`VuU Toaa9^`VXPR_iS`^2aNf`gxTo\FV�_xr�gbQJ\uThgºU#VuU#QJPRw/.o¥´�^`V�QJnL�`_iQ`c`e�e�\uT2rdr`_xr`S�e�a9P�Tha�VFS2w�_be�a�VFe�ap_�r`S�TogbgD_�r`qd_b\FVFe°Qo£,a9^`V�SLgiQ`ndTogZU+Toa9P]_bv�¥"[,r

VFgbVuU#VurLa9T2P]wBQJfµVuP�Toa�_bQJrÂ_beXa-^`VurBa9^`V¬\FQ`U�fdc`a9Toa�_bQJrÂQo£�a9^`Vª_xr`_iap_�ThgNT2r`q 0�rdTogQE«Dr`VuP]e#Qo£NTS2_b�2Vur}VFgiVuU�Vur2au¥(´~^`_ie#P]VF·dc`_�PRVFeXTªU#QJq'c`gbQ¤QJfdVuP�Tha�_bQJr°Toe,a9^`V#q'Toa9T+qd_be�a9P]_xndc`a�_bQJrÂ_be,\FwJ\Fgb_b\2¥¦c`a�Toe�«V¬PRVufdVuToa#_ba�£5QJP�Toqoy�To\FVur2a�_ba�VuU#eul a9^`_beX\FQ`U�fdc`a9Toa�_bQJrÂ\uT2r�nµV¬qdVF\FQ`U�fdQ2epVFq T2r`qa9PpT2r`e]£kQJP�U#VFq�_xrÅyRc`epaXT+£kVF« Thqdqd_iap_iQ`r`e�TLr`q�\FQJU�fdT2PR_iepQJr`e,£kQJP�Thgigndc`a�a9^`V 0�P]e�a�VFgbVuU#Vur2am¥´�^`_beDe�a9P�Tha�VFS2w�\FQJU+f`giVFv`_ba]w�_ie ���uÉ2�É���4x�54öÉ�610 � ����"�# 3!24365

97 �898 � � 9�:; ûx÷ Ü9÷mÛJõ�í°ß@A ?�ß�Þ�<uçkákù-á=û ß�éoç�ù�é�#>=;ßpü #�?{ÝEùpÞ�ù�üýü ÜÀü�ê]ß-ëtÝuèEákÜÀÞ�ç@ õ(á ù�éuó,á=û ëÜ��2á>=EÜtêÌß-ëtÝuèoákù9á=û ßpé#û ç$#ußpéEÜ ?�û á>= �pü ß��où�ü�ûýé8# û ê]ÜÌçtß@A�á>=EÜtë¹ù-á=Þkû -(�uèoáºßpéuü ó.ü ßFê]ù�ü%ûýé8#Fû ê]Ü]çA ê]ßpÞkÞ�ÜÌç=Ý2ßpé8# ûýé8��ákß�á>=oÜDü ßFê]ù�ü�ÝEùpÞ�á�ß Aòá>=oÜ,ë¹ù-á=Þkû -�BùpÞ�ÜDéEÜ]ê]Ü]çkçkù�Þ�óªákß�ù-ê]êÌÜ]çkç�á>=oÜ #uù-ákù#çkákß�Þ�Ü�#+ûýé¡Ü]ù-ê =

ÝmÞ�ßFêÌÜ]çkçkßpÞ�÷

ôöíºî�ôxõ

Page 12: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� �� ( ��� �=% %���(������ ������($�B� �?��( �������?%����rÂQJP]qdVuP,apQ�_�U+fdP]Qo�2V�a9^`V�\FQ`U�f`gbVFvJ_ba]w2l�«¹V+a9T2�oV�_xrLa�Q²Th\F\FQJcdr2a�a9^`V¬n`gbQ`\¯��fdTha�a�VuP�r¤Qo£

a9^`V�qd_be�a9P]_xndc`a�_bQJrÅ£kQJPa9^`V�_�r`qd_b\FVFe�fdP]QLSJP]VFe�e�_bQJrB®=qdVFe�\uP]_xnµVFq²_xr :'¥xWu³F¥d´�^`VurÅa9^`V�\FQ`U�f`gbVFvJ_ba]w_be� ���uÉ2�É���4x�54öÉ�6��

� �����"�# 3!2 �� �97 �898 #��

� 9��9���9 % � 9

� �9 � �9�� ������ � �@�!%'�$�������� � %�#�����I)��������I)�� )=�������F���#�� � ���#����������, � � �� #�� �� %'������ � ������ ������� ���@��( � ��( ������ � ! � � ( �=% � �!���� �� ! �� � )������=��%F��6 �@� ��($� �+�,����"� ( ��% �#�%$N���� � #��&%'�($��)�*+��,������( �@� %���*+( �$�&� ( �=% �#�$N���� �� ( ��� �=% %���(�� ���+��%��� � ��($�L�����$�������?%1� ( � ���%��� #!��� ��� �� �@� ��( �+� %'� ( ����� �@� ( �� �� (' �@���� � �@���$�� � ����� ������&% �*)�*+���?�� �@�����������* ) ��(N�*�7)=��������%+�,¸�V�qdVFe�\uPR_�nµV#_�r¤a9^`VD£kQ2gbgbQo«�_xr`S�a9^`V�P]VFq'c`\Fa�_bQJr¤Qo£�a9^`V�ep\uT2rdr`_xr`S�\FQJU�f`gbVFvJ_ba]w�QJn`a9Th_�r`VFq

«�_ba9^�a9^`V°QJf`ap_�U�_i­uTha�_bQJr`e�_xU�f`gbVuU#Vur2apVFq§_xr�QJcdPªTogbS2QJP]_ba9^dU�¥´�^`V�To_xU _be�a�Q©P]VFq'c`\FVÂa9^`Ve�\uTLrdr`VFq�_�rLa�VuP]�oToga�Q°_ba�e�fdVuP]_bQ`qd_i\�fdToapa�VuP�r�Toe�fdP]Q`fdQ2e�VFqB_xrBe�VF\Fa�_bQJr :'¥Ë�d¥º½�VFa�c`e�^dTm�2V�TgbQ`QJ�©ThaXa-^`V�fdP]QJn`gbVuU�_xrBQJr`V�qd_xU#Vur`e�_bQJr�lt«N^`VuP]V�n`gbQ`\¯�Â\FwJ\Fgb_b\J®��o³�U#VuTLr`e¬T¤\FwJ\Fgb_b\°q'Toa9Tqd_be�a9PR_�ndc`ap_iQ`r�nLw�n`gbQ`\¯�LeQo£(e�_b­FV �`¥2¸�V�qdVF\FQ`U�fdQ2epV,a9^`V�\Fw`\Fgi_b\#fdToa�a�VuP�r`eta�Q�a-^`VF_�Pt\uT2r`Q`r`_i\uThg£kQJP�U�®kfdP]QJfµQ2e�_ba�_bQJr :J³~_xr�Q`P]qdVuP�apQ¡nµV�TLn`giV#a�Qª\FQJU�fdc`apV�a9^`VF_xPD_xr2apVuP]e�VF\Fa�_bQJr�®kfdP]Q`fdQ2e�_ba�_bQJrG2³~T2r`q�_ba�eNep_i­FV�®kfdP]QJfµQ2e�_ba�_bQJr¤�2³F¥d´�^`Vurªa9^`V#P]VFe9c`gba�eNT2PRV,S2_b�2Vur�_xr�fdPRQJfdQ2ep_iap_iQ`r`eZ��TLr`qFJd¥-ÅÉ'�F�d�54öÉ�6��/.�®�� ��0E�*1F³ 332547684 3 � %:9 0 %<; � 9:=?> � ;�= 8 � � �@1BA W$<DC²T2r`q!E¡®�� ��0k³!3.�®�� ��0E�uWm³F¥F´�^`Vur�lo_xr#a9^`VQJr`V-¨�qd_xU#Vur`e�_bQJrdTog%\uToe�V2loa9^`Ve�VFa�Qh£�_ba�VuU#e�Qo«Nr`VFq¬nLwXTDfdP]Q`\FVFe�e�QJP�_beT�fdTha�a�VuP�rB.�®�� � � � � �o³Fl�T2r`qBE�®�� � � �,³'_xr.a9^`V"U#QJP]V"fdTLP]a�_b\uc`gxT2P(\uThe�V�Qo£'T\FwJ\Fgb_b\qd_be�a9P]_xndc`a�_bQJr�¥���uÉ2�É���4x�54öÉ�6<F ���@� �,����( %'�$������� �*�1HG � )=������� ������������� �+� �?������($?%�� %� �@�B*+����� �*�1����%$% �@� � " � � ������������� � ����($?%1�*� %'�JI'�"�K�� ������ LNMKOPORQ�S � ��UT.��T � ��V��*V � ) � � ����� �@��( %��W��X®XTYAZT � ³1�&%1� *+����9� ���B� � �@�$#J®HV���V � ³ �@��&[�1��\E¡®XT �*V�³^]_E¡®XT � ��V � ³ 3`E¡®H1������ �®HV���V � ³p³�����K%��aE¡®HT.��V�³b]_E¡®HT � �*V � ³ 3�c ����+��� ( ���*� �&%1� %'� � ��� � � � ���,�$������� �*������I���*+" �+� ��( � d�

î(î�éNM'O 0 )�)

Page 13: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

Wu� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

� ��7.�®�� � � � � �o³����# .¡® ��� � ��� � � � ���i³ ) � HG �F)������=� � ���������,� � ������( %+�.�®�� � � � � �o³ 3 �97�� 898 � # E¡®�� % �@� � �,³

�@�� G ���@�������.�®�� � � � � �o³�� .¡® � � � � � � � � � � ³ 3 �� 9 : '��� � 898 � # � 898 � � # �� E¡®�� % �@� � �,³ ]_E¡® � � % ; � � � � � ³

��� � � � ����� � ��,���� �7 �@����� � ' � ��� �B�,�#������,# *@�����,����( %'�$�������!� % � � ��1� (7�&%��B�����������*�?�� ����( �*� � ��($����#!��� �®�� � � ��� � �;³*' %��L �+� ������( %��$�������F�&%1�� *@��,� I� �1�� � %'�� "�����������������?������($?%�� ,���uÉ2�É���4x�54öÉ�6�� ���@��������( %��$�������L� �7HG �L)��������1������������� �+� �?������($?%7�&%*�?��($�,��#��,��� � �?��($�,��#��� �®�� � � � � � � ³ � G��@��( � � � �# � � ��( � �@�B�*� B) ��(��*��� ( ��� �=% %���( % �� � � ���#�� %��($� )=*@����� ���� ������ � �@� %��&%F#���($����� # ��( �� �@� � ( ���*� �*� � ( � �?��%'������� � �����*+����� �*� �?������($?%F������?��($�,��#��,� G �, �� �@� %��� � � ��($����#!��� �®������ � � � � ³1�&%1��� %���?��($�,��#��,�"G �� �F �@� %'�� � �?��($�,��# ,Í�Vur`\FV�«V�\uTLr�c`e�V#a9^dToa�fdP]Q`fdVuP]a]w°a�Q�e9fµVFVFq'cdf�a9^`V#\FQJU+fdc`a9Toa�_bQJr�TLr`q�qdVF\uP]VuToe�V+a9^`V

U#VuU�QJP]w�r`VFVFqdeu¥���uÉ2�É���4x�54öÉ�6�� ���@��( �YG ���,� ) �1� ��%��� " ��� ���� % �� ��F������(������ �*�1������@ �"V �*�1 �@�( ��%�*+������� �?������($ �� ������ � � �@����� %�� ��I�� �@�&� ( ���*�B�*� � ( � � ��%�����,�� � �@�� �@�B( ��%�*@������� �?������($ �&% �@�*+����� �*�B��� ��%� ��" ��������������� �?������($?%*'��*�B�,#�������� � � �?��($�,��#�� W � �� �?��($�,��# ' G � �@��������� ( � � ( ��%������������� �*�1� � ���� �� �?�������� �1 �@�B*+�����&,��r#a9^`Vt£kQ2gbgbQE«~_�r`S%l �öÄJ�Y1���Ä�U�Tov�®��-� �;�  ®�� � � � � � � ³p³F¥E´~^dT2rd�2e�apQNa9^`VDfdP]VF�J_bQJc`efdP]QJfµQo¨

e�_ba�_bQJr`eml'ToeqdVFe�\uP]_xndVFq°_xr!:'¥ �µl2«VN\uTLrªe�a�QJf+a9^`VDe�\uT2rdr`_xr`S¡The¹epQJQJrªToe«VZPRVuTo\¯^ªa-^`VNS2gbQJndThg_xr`qdVFv&1¹nµVF\uT2c`e�VDa9^`V�\FQJr`epa9P�c`\Fa�_bQJr#Qh£øa-^`V�_xr2a�VuPRe�VF\Fa�_bQJrªfdToa�a�VuP�r`e�_be\FQJU�f`gbVFa�VFq(¥L|°QJP]VFQo�2VuP«VX^dTm�2V#T¬nµQJcdr`q°Qo£ ��"�� � QJr�a9^`V#rJcdU#nµVuPDQo£òqdVFep\uP]_xf`a�QJP]e,Qo£"a-^`V�_xr2apVuP]e�VF\Fa�_bQJr°fdToa�apVuP�r®=a-^`_ieD«�_bgbgònµV,_xU�fdQJPRa9T2rLa£5Q`Pa9^`V�P]VF·dc`_xP]VFq�e�apQJP�ToS2Vu³ ¥��r°fdP�To\Fa�_b\FV#a9^`_be�Q`f`a�_xU#_b­uToa�_bQJr¤_be,QJr`gbw�_xr2apVuP]VFe�a�_xr`S�«D^`Vur�«V#^dTu�LV��2VuP]w¤e9U�Togbg�n`gbQJ\F�

e�_b­FVFe#QJP.�2VuP]w�gxT2PRS2V¬U�Toa-P]_b\FVFeu¥([Dgba9^`QJc`SJ^°a-^`VuP]V¬T2P]V+\uToe�VFe�«D^`VuP]V�_ba�_be�U#QJP]V#_xrLa�VuP]VFe�a�_xr`S

ôöíºî�ôxõ

Page 14: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� �K�a�Q�T2rdThgiw`a�_b\uTogbgbw�qdVFa�VuP�U#_xr`V,a9^`V,fdToa�apVuP�r#£kQ2gbgbQo«�_xr`S¬fdP]QJfµQ2e�_ba�_bQJr!:¡®;£5QJPt_xr`e�a9TLr`\FVD£5QJPH\FwJ\Fgb_b\qd_be�a9PR_�ndc`ap_iQ`r`e�Toe�_xr 8xWm��<k³Fl�S2Vur`VuP�TogbgbwB£5Q`P�n`gbQJ\F�BQo£ZP]VuThe�QJrdT2n`gbV�gbVur`S2a-^���QJcdP#epa9P�Toa�VFSLw²_benµVFa�a�VuPF¥dÎ%QJP_xr`e�a9TLr`\FV2l'e�VFV,Î�_bSJcdPRV1:�«N^`VuPRV�a9^`V,\FQJPpP]VFe9fdQ`r`qd_�r`Sª\FQJU�f`gbVFv`_ba�_bVFeXT2P]V�f`gbQ2apa�VFqToe�T~£�cdr`\Fa�_bQJr�Qo£'a9^`V�Tm�2VuP�ToS2V"n`gbQ`\¯�.e�_b­FV�� � � � ®=a9^`V"ep· cdT2PRV¹P]Q`Q2a�Qo£'a9^`V_xr`_iap_�Thgdqd_iepa9P]_xndc`a�_bQJrn`gbQ`\¯�ªe�_b­FV�a�_xU#VFeNa-^`V 0�rdTog(qd_be�a9P]_xndc`a�_bQJr°n`gbQ`\¯�ªe�_b­FVu³F¥

0.0 10.0 20.0 30.0 40.0 50.00.0

1.0

2.0

3.0

4.0

Î�_bSJcdP]V�: "¼QJU�fdTLP]_be�QJrÂQo£Na9^`V�a]«Q°e�\uT2rdr`_xr`S©epa9P�Toa�VFSL_iVFe#_xrBQJr`V�qd_xU#Vur`e�_bQJr�¥�´~^`V�a�_xU#V®kU�_igbgb_be�VF\FQJr`qde9³t_be"f`gbQ2a�a�VFq�ToS`To_xr`e�aºqd_��(VuP]VurLaòPpT2r`qdQJU n`giQ`\F��ep_i­FVFeN®5a9^`V¹T2n`ep\F_iepe9T,Tov`_ie"PRVufdP]V-¨e�VurLa�ea9^`VNTm�2VuPpToS2V�n`gbQJ\F��ep_i­FVu³ ¥��"Th\¯^+�ETogxc`VD_be¹Q`n`a9To_xr`VFq�£�P]QJU¶a9^`V,U#VuT2r#Qo£(QJr`VDa9^`QJc`e-T2r`q_ba�VuP�Tha�_bQJr`eNQh£òa9^`V,e-T2U#V,e�\uT2rdr`_xr`SªQJr�T�» � �¿»dfdTLP]\�G�«QJPp�2e�a-Toa�_bQJr�¥

��Q2ap_i\FV#a9^dTha�«�_ba9^¤a9^`VXfdPRVF\FVFqd_�r`S¤QJf`a�_xU#_b­uToa�_bQJr`eml�a9^`V�\FQJU+f`giVFv`_ba]w�_be,_xr`qdVufdVur`qdVurLa�Qo£a9^`V#U�Toa9PR_ivªe�_b­FV¡®5VFvJ\FVuf`a£kQJPe-U�Togbg(U�Toa9P]_b\FVFeD«N^`VuP]V � ��gb\uU°®�� � � � � � � ³�³ ���uÉ2�É���4x�54öÉ�6�� ���@� $N�����%�� ��������B�$�� � �������,��B�� ���� #�� �� %'����F�&% �

� .� ��"$# 3 2� 1� � % 1

� � � ��G��@��( �Y1B�&% �+��U�Tov�® �-����� �® � � � � � � � ³p³� ûx÷ Ü9÷� �CAxß�ÞºêÀü ù-çkç=û ê]ù�üdÝoù�Þ�ùpüýü ÜÀü�ê]ß9ëtÝmèEákÜÀÞ�ç

î(î�éNM'O 0 )�)

Page 15: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

Wu� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

� ������ � ( �� � ( � � ��%��,������� ' �@�N%�� ���������������(��������&%"( � # *@�$� # �� � ! )�� �@��� ( � � �$�� � *+�����,�

�*� �@� ����������� � ������( ������( %�� ����,�� %+�b,Î�_�rdThgigbw2lda-^`VZP]VFe-c`gia\uT2rªndVDe�a9PpTo_bSJ^2aR£5QJPR«{TLP]qdgbwXVFv`a�Vur`qdVFqªa�Q�U#c`giap_iqd_xU#Vur`ep_iQ`rdTogºT2P�PpTuw`e

���uÉ2�É���4x�54öÉ�6�� � �� ��"$# 3 243 �9

1 9� 9 �09 % 1 9

� �9 � �9:G��@��( �Y1 9 3§U+Tovø® � 9'����� X® ��9���9E� ���9 � �9 ³�³� ������ ����($����$# ��( �� � ( � � ��%��,����� � )��B��# #��,��F �@� �$��%' �, �$�����#�� ��?%����� �b,

��Q2ap_i\FV~a9^dToa�a9^`Ve�\uT2rdr`_xr`S#q'cdP�Toap_iQ`r#_be"SJP]VuTha�gbwXP]VFq'c`\FVFq�n2w�QJcdP�Q`f`a�_xU#_b­uToa�_bQJr�ndc`ata9^`VfdTo\F�L_�r`SªQJr`V�P]VuU+To_xr`eNa9^`V,e-T2U#V2¥

'���� � )����'& � &%+6) ��$-$�*&%� )�.! �'��& ) ��$ �*��) +*��!�-´�^`V�fdTh\¯�L_xr`S�\FQJr`e�_be�a�eQo£ � U#Qo�2V ��Q`fdVuP�Toap_iQ`r`e�a�QXndc`_bgbq²T#ndc �(VuPFl`a9^`V�ndc �(VuPt_be�a9^`Vurªe�VurLau¥Î%Q2gbgiQo«�_xr`S°a9^`V�\FgxToepe�_b\uToggi_xr`VuT2P�a9PpT2r`e]£kVuPDa�_xU#V¬U#Q`qdVFgklº«V#qdVFe�\uP]_xnµV�a9^`V�\FQJU�U#cdr`_b\uToa�_bQJr\FQ2epa�\FQJU�f`gbVFvJ_ba]w,£kQJP�U#c`giap_ ¨�qd_xU#Vur`ep_iQ`rdTog(T2P�P�Tmw`e(_xr#a�VuP�U#e�Qo£�U#Qo�2VT2r`q#a9P�T2r`eR£5VuP a�_xU#VQJr`gbw®=a-^`VuP]V,_beZTogb«�TmwJe � A WDe�a9T2PRa9cdfªqdVFgxTuw`eD_�rÅQJcdPTogbS2QJP]_ba9^dU+³ ���uÉ2�É���4x�54öÉ�6� � ��� ) ) 3!2 365

97 $898 � � 9�09�� �9

: �����F���#I�( �� % ����( � �?��( �������?% �'��( � �?����(B�*� � ( ���$��%$%'��( % � ���#I�$��� � � ( ��� �=% %���("G ���,� ��� �� �@����+�"G �, �F����� �@� �� �@��( % ��� ������ � �,�H����� ����%B �@�� ��( �L�� ) �B%����N��I� �?��( ����( *�%��) � � ���=� �$#F�� � ���)�*��7��(���( �$#�*@�$��%����($�*�� � ����( �@�$��#�� ���@�1�$��($( ��% � ���#��,��L�$��%� � %�� ( � �?��($��,������ ��� �@� �*� B) ��(��*������ ����% � ��(�� ( ��� �=% %���(�' �@� "�&%1�� ������( � �@�� 97 �898 � �B�! � ! �� � ,

ôöíºî�ôxõ

Page 16: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� ��-'�� � � �0$ � � ��/ �0&�� ) !�� ) ) & / )�&%&*�'& � &*+ � )�����&���� ) ��$-$ �%&*� )� ! ����&>) ��$��

�%�'),+*� ! �')./�Àr�QJPRqdVuP,a�Q�e�a-c`qdw²a-^`V¬P]VFgxToa�_b�2Vª_xU�fdQJPRa9T2r`\FV�Qo£�VuTo\F^�fdT2PRa,Qo£�QJcdP,TogbS2QJP]_ba9^dU ®=e�\uT2rdr`_xr`ST2r`q�fdTo\F�2_xr`S@Ah\FQJU�U#cdr`_b\uToa�_bQJrd³FlL«¹VDfdP]VFe�VurLa"_xr�a-^`V£5QLgigbQo«�_xr`S#a9^`VNnµVu^dTu�`_bQJPtQo£(a9^`VF_xP"P�Tha�_bQ«N^`Vur ��_be"_xr`\uP]VuToep_�r`S%¥J´�^`V�P�Toap_iQ�Qo£�a9^`V\FQJU�f`gbVFv`_iap_iVFeQo£�a9^`Ve�\uT2rdr`_xr`SXTLr`qXa-^`VNU#Qo�2VFeT2r`q¤a9P�T2r`eR£5VuP]e_be� ���uÉ2�É���4x�54öÉ�6����

3 ��� ��"$#� ) �� � 3!2 �� �9

� :��7 9 !��� �� 9 �

3 243 �9

� � � 9� �09 � 9 : 3 � �� 2 3 �9

� 9��9���9 :G��@��( � � 3 �097�&%1 �@� �������H�* B) ��(��*��� ( ���$��%$%'� ( %��� ������ � * %' � �� ���� ��( �� �@�B( ������L� ��� ( � �?��%'���������L���# �� ,

´�^`Vur�gbQ`QJ�2_xr`S�Toaa9^`V�gb_xU#_ba�

���uÉ2�É���4x�54öÉ�6�� � W � � 9���� �@��� 3�� 9!�� ��� �"! �# �� � � �

� ������ � �@� ( � ��,�!$�� � � �_,

[Ne�wdU�f`a�QLa�_b\NfdP]QJfµQ2e�_ba�_bQJr�W2W�U�VuT2r`e"_xr�fdP�Th\Fa�_be�V�a9^dToa"The�e�QJQ`r�Toe ��_begxT2P]S2VVur`QJc`S`^�%ol_Ë£�QJcdPDQJf`a�_xU#_b­uToap_iQ`r`eXT2P]V�T2fdf`gb_bVFq§®=\-£R¥�epVF\Fa�_bQJr :'¥ �2³Fl%a9^`V�P]VFqd_be�a9PR_�ndc`ap_iQ`r�\FQ2epa,_ie#P]QJc`SJ^`gbwVF·dcdTog�a�Q#a]«¹Q�U�VuU#QJP]w�\FQ`f`_iVFe#®=QJr`VD«N^`Vur�a-^`VNq'Toa-T�_bee�VurLa�T2r`q¬Q`r`V,«N^`Vur�_ba_beNP]VF\FVF_b�2VFq'³f`gxc`e,a9^`V,a9PpT2r`e]£kVuP"ap_�U�V�Qo£�a9^`V,_ba�VuU#e�Qo«Nr`VFq¤gbQJ\uTogbgbw2l%_5¥ V2¥'a9^`V,ep\uT2rdr`_xr`S¬_be,r`VFS2gb_iSL_�n`gbV2¥

& ' � � �- ��&(�"$ *��]��$

�Àrªa9^`_be¹epVF\Fa�_bQJr�l`«¹VD«�T2rLa"a�Q�e9^`Qo« a9^`VDV-±¬\F_bVur`\Fw¬Qh£ºQJcdP"ThgiSLQJP]_ba9^dU#e�T2r`qªa9^`VD\FQJP�P]VF\Fa9r`VFepeQo£QJcdP~\FQJU�f`gbVFvJ_ba]w�P]VFe9c`gba�e�ndc`a,Togbe�Q�a�QªqdVuU#QJr`e�a-P�Toa�V�a9^`V�P]VuTog�SJTo_xr¤a9^dToaD_ie,Q`n`a9To_xr`VFq°_�r( �=mû çdû ç�á=ÞkèEÜ ÜÌú-ÜÀé A�ßpÞ�ëù-á=Þkû ê]ÜÌç�ß AJëß1#uÜÀÞ�ù-ákܺç=û*)]Ü�� A�ßpÞ'ûýéoçkákù�éoê]Ü�?=oÜÀé,+�- � û ç�ákÜÀé{á=û ëÜ]ç �uû ���9ÜÀÞ(á>=Eùpéá>=oÜ�éFèEë �2Ü�Þ�ß@AdÝmÞ�ßFêÌÜ]çkçkßpÞ/.�ç]÷

î(î�éNM'O 0 )�)

Page 17: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

W�: � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

fdP�Th\Fa�_b\FVD£5QJPtgb_xr`VuT2PDTogbS2VundP�T#�oVuP�r`VFg�\FQJU�fdc`a-Toa�_bQJrªc`e�_xr`S�QJcdPP]VFqd_be�a9P]_xndc`a�_bQJr¤P]QJc`a�_xr`VFeu¥µ¸�V\F^`QJQLe�V,£kQJPta9^dToaN�+P]QJc`a�_xr`VFeDQo£ºa9^`V#»d¼[Z½ø[ZO%[N¼�¾¶gb_�ndPpT2P]w da9^`VXU+Toa9P]_bv�UXc`gba�_xf`gbw ®5|²|�³Fl½���£ Th\Fa�QJP]_b­uToa�_bQJrB®k½��D³NTLr`q²a-P]_xT2r`SJc`gxT2PDe�Q2gb�2V�®k´�»d³F¥%��r¤a9^`V�»d¼[,½�[ZO%[N¼~¾ gb_xndP�T2PRw2l�a9^`VP]Q`c`a�_xr`VFeZP�cdr+£kQJP�TLr2wª�J_xP]a-cdTog(SJP]_bq�Qo£"fdP]Q`\FVFe�e�QJPReNT2r`q�T2rLw¬ep_i­FV�Qo£�e�· cdTLP]V�n`gbQJ\F�2em¥

>,cdP,VFvdfdVuP]_xU#VurLa�e�«VuP]VªP�T2r¤QJr¤a9^`V¬¼�PpTuw�´~�oÁ T2r`q�QJr¤a9^`V¬�Àr2apVFg"O�T2P�ToS2Q`r�fdT2P�ThgigbVFg\FQJU+fdc`a�VuP]eu¥L¸�VNc`e�VNT.\FQJU�f`_bgbVFq��2VuP]e�_bQJr�Qo£(a9^`VN»d¼[,½�[,O'[D¼�¾¿TLr`q¬¦½�[N¼�»#gb_xndP�T2PR_iVFe~QJra�Q`f�Qh£¹a9^`V#�2Vur`qdQJP.QJf`a�_xU#_b­FVFq©¦½�[,»°QJr¤a9^`V¬O�TLP�ToS2QJr�l%T2r`q©Tª\FQJU�f`_bgbVFqB�2VuP]e�_bQJr¤Qo£a9^`VP]VFqd_be�a-P]_xndc`a�_bQJr�P]QJc`a�_xr`V�«~_ia-^�Tª�2Vur`qdQJP._�U+f`giVuU�Vur2a9Tha�_bQJr�Qh£¹a9^`V¬½ ��l |²|�l�´�»�l(¦½�[N¼�»T2r`q°¦½�[Z»¤P]QJc`a�_xr`VFe�Q`r�a9^`V�´��hÁ�¥Î'QJPT,SL_i�LVur�rJcdU#ndVuP{Qo£òfdP]Q`\FVFe�e�QJPReZ®]Wu�L³"T2r`q�T�S2_b�2VurªU�Toa9PR_iv+e�_b­FV¡®]Wu�L��: "°Wu�2� :J³Flo«V

0�P]epa�qdVFa�VuP�U#_xr`V�a9^`VZnµVFe�aNn`gbQ`\¯�ªe�_b­FV�T2r`q�S`P]_bq�e9^dT2fdV.£5Q`P"VuTh\¯^ª\FQJU+fdc`a9Toa�_bQJrª�oVuP�r`VFgk¥ >,cdPa�VFepa�ee9^`Qo«�a-^`VNa�_xU#_xr`S2eDToe�T,£�cdr`\Fa�_bQJrªQo£�a9^`VZn`gbQ`\¯�+e�_b­FVFe£kQJPt�J_xP]a-cdTog(fdP]QJ\FVFepe�QJP]eSJP]_bqdeQo£Togbg�e9^dT2fµV �W "©Wm�dld��"IJdl+: " :'l?J�"²�µl�Wm��"�W�fdP]Q`\FVFe�e�QJPRe�SJP]_bqdeu¥%Î'QJPt\FgxT2P]_ba]w2l%a9^`V�n`gbQJ\F�e�_b­FVFe,T2P]V,\F^`Q2e�Vur¤fdQo«VuP"Qh£H�dl%ndc`a�QJcdP�P]VFqd_be�a-P]_xndc`a�_bQJr°P]QJc`a�_xr`VFeNP�cdr¡£5Q`P�T2rLw�e�_b­FVFeu¥¸�V#e9^`Qo«¿_xr°Î�_bSJcdP]VBGªa9^`V�P]VFqd_be�a-P]_xndc`a�_bQJr°VFgxT2f`e�V�ap_�U�V�£kQJP,P�TLr`qdQJU n`gbQ`\¯�¤e�_b­FVFe#T2r`q

P�TLr`qdQJU�SJP]_bq¤e9^dT2fdVFe~QJr�a9^`V�O�TLP�ToS2QJr��m¥

400.0 900.0 1400.0 1900.0matrix size

0.00

0.10

0.20

0.30

0.40

0.50

time

(sec

onds

)

Î�_bSJcdP]V�G `zVFqd_be�a9P]_xndc`a�_bQJrÅVFg�TLf`e�V,a�_xU#V~£5QJPHqd_��øVuPRVur2a�U�Toa-P]_bv�e�_b­FVFeQJr¬PpT2r`qdQJU S`P]_bqde¹Qh£HW=:a�Q¤Wu��fdP]Q`\FVFe�e�QJPRe¹«~_ia-^�P�T2r`qdQ`U n`gbQJ\F�ªe�_b­FVFeu¥

��� Ü�?�ÜÀÞ�Ü�éEß-á�ù��mü Ü"ákß�ÞkèméZákÜ]çkákç?�û á>=�éoß�éDÝ2ß@?�Ü�Þ(ß A O ��Þkû #mç(ßpéNá>=EÜ������ �2Ü]êÌùpèEçkÜ�ß Adú9ÜÀé8#mß�Þ���Û`õºì(ïüýû ëtû ákù-á=û ß�é�ßpé�á>=uû ç�ë¹ù-ê =mûýéEÜ

ôöíºî�ôxõ

Page 18: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� � �¸�V�a9^`VurªfdP]VFe�VurLa"a9^`VDV-±¬\F_bVur`\Fw�Qo£(QJcdPHe�Q2gxc`a�_bQJr`eul`«�_ba9^ªT,\FQ`U�fdT2P]_be�QJr�Qo£(a9^`V�a�_xU#_xr`S2e

«�_ba9^ÂT2r`q°«�_ba9^`QJc`a�P]VFqd_be�a9PR_�ndc`ap_iQ`r`eXQh£"a9^`V#q'Toa9T�®=e�VFV#a9^`V#\FQJPpP]VFe9fdQ`r`qd_�r`S°TogbS2QJP]_ba9^dU�e,QJrÎ�_bSJcdP]V��L³t£5Q`P"a-^`V���gb_xr`VuT2PDTogbS2VundP�T#\FQ`U�fdc`a9Toa�_bQJr��hVuP�r`VFgbeX®k½���ld|°|�l ´~»d³"Q`rªa9^`V�U#c`giap_ ¨\FQJU+fdc`a�VuP�¥

� � �'É�� 4ö��8� � � �'É���È 4x��8 �mÄdÊ�4=�F�u��4=� ���54xÉ06Á�Toa9T P]VFqd_be�a9P]_xndc`a�_bQJr(®5[�¨�q'Toa-Toqd_be�aulý[~¨ÌndVFe�aR£5QJPp�oVuP�r`VFgx³

¼Q`U�fdc`a9Toa�_bQJr �oVuP�r`VFg�®=[~¨�q'Toa9Toqd_be�a-³ ¼QJU�fdc`a-Toa�_bQJr �hVuP�r`VFg�®=[�¨ÌnµVFe�a]£kQJP��oVuPpr`VFg�³Á�Toa9T P]VFqd_be�a9P]_xndc`a�_bQJr(®5[�¨ÌnµVFe�a]£kQJP��oVuP�r`VFgklý[~¨�q'Toa9Thqd_iepa9³

Î�_bSJcdP]V#� `[NgbS2Q`P]_ba9^dU#e~£5QJPtQJr`V,\FQ`U�fdc`a9Toa�_bQJr��hVuP�r`VFg(«�_ba9^`QJc`aDT2r`q�«�_ba9^�q'Tha9T#P]VFqd_be�a9P]_xndcL¨a�_bQJr�¥

¸�_ba9^�a9^dTha"P]VFe9c`gba�eulo«Va9^`Vur#e�a-c`qdwXT2r�VFvdT2U�f`gbV�«�_ba9^#a-^`V{�Dgb_xr`VuT2PtTogbS2VundP�T,Q`fdVuP�Toap_iQ`r`eVFv`VF\uc`a�VFq#e�VF·dc`Vur2a�_xTogbgbw#_xr,a9^`Vte9T2U#VfdT2P�TogbgbVFg fdPRQ2SJP�T2U�®=e�VFV�a9^`Vt\FQJP�P]VFe9fµQJr`qd_xr`S�TogbS2QJP]_ba9^dU�eQJr}Î�_iS`cdP]V¬�2³F¥(´~^`V¬rJcdU#nµVuP,Qo£e�wJepa�VuU#e�epQ2gb�2VFq�_be�e�VFa�cdfÂa�Q�QJr`V�£5QJcdPRa9^¤Qo£¹a9^`VªU�Toa9P]_bve�_b­FV,_xr+QJP]qdVuP�£kQJPHa9^dToa¹PRQJc`a�_xr`VNapQX^dTm�2VNT2r�VFvJVF\uc`a�_bQJrªap_�U�VN\FQJU+fdT2P�T2n`gbV�a�Q�a9^`V,�,Q2a9^`VuPRe9³F¥zVuU�T2P��.a9^dToa�_Ë£(e�VF�2VuP�Thg'T2P�P�Tmw`eòT2PRV¹_xU�f`gb_bVFq�_xr#a9^`V�\FQ`U�fdc`a9Toa�_bQJr#�oVuPpr`VFg5lLa9^`Vur#e�VF�LVuP�Tog

q'Toa-T,P]VFqd_be�a9P]_xndc`a�_bQJr`e�U#c`e�a"ndV~\uTogbgiVFq©®5_�r�a9^`V|}Tha9P]_bv�|�c`gba�_xf`gbw#£5Q`P�_�r`epa9T2r`\FVu³F¥`Í�Vur`\FV2lLa9^`VP]VFqd_be�a-P]_xndc`a�_bQJr�e�QLg�c`ap_iQ`rB_�r`\Fgxc`qdVFe���\uTogbgbe�a�Q°a9^`VªP]VFqd_be�a9P]_xndc`a�_bQJr�P]QJc`a�_xr`V�®=_k¥ýV2¥���£kQJP.a9^`V|°|��hVuP�r`VFgkl�W#£kQJP�a9^`V�½�� �hVuP�r`VFgkl"�Å£5QJP.a9^`V�´�»��oVuP�r`VFgkl�f`gxc`e¬QJr`Vªa�Q¤S2Q�ndTo\¯�}a�Q°a9^`V_xr`_ba�_xTog�q'Toa9T�qd_be�a9PR_�ndc`ap_iQ`rd³F¥

����� � & &��5&0! ) � � �.����& � ��'��) � $ ) "*�'&%)>,r�a-^`V©��rLa�VFg�O�T2P�ThS2QJr�l�Î�_bSJcdP]V JBe9^`Qo«�e�a9^dToaªa9^`V©nµVFe�a�fdVuPÀ£kQJP�U+T2r`\FV�Qo£#a9^`V�U�Toa9P]_bvU#c`gba�_xf`gbw�_ie~QJn`a9To_xr`VFq�«�_ba9^�T : "!:#�J_xP]a-cdTog�SJP]_bq�«�_ba9^ªa9^`V,n`_bS2S2VFe�aNn`gbQ`\¯�+e�_b­FV¬®k��:.£5Q`P"Q`cdPfdP]Q`n`giVuU+³F¥Î'QJPta9^`V�½�� qdVF\FQ`U�fdQ2ep_iap_iQ`r�lda9^`V�ndVFepaNP]VFe9c`gbaD_ie~QJn`a9To_xr`VFq�«~_ia-^�n`giQ`\F�2eDQo£ºe�_b­FV J�Q`r�T

W "BWu���J_xP]a-cdTog(SJP]_bq¤Qo£òfdP]Q`\FVFe�e�QJPReu¥Î'QJPta9^`V�e�Q2gb�2V,Qo£¹Wm�2��:�P]_bSJ^LaN^dT2r`q�ep_iqdVFemlda9^`V�ndVFepaZP]VFe-c`gia~_ieDQJn`a9Th_�rª«~_ia-^�a9^`V�n`_bS2S2VFepa

n`gbQ`\¯�ªe�_b­FV�®k��:J³tQJrÅa9^`V�Wu� "©W,�`_xP]a9cdTog SJP]_bq�Qo£"fdP]QJ\FVFepe�QJP]eu¥´�^`V,SJP]_bq�e-^dT2fdV#T2r`q�n`gbQJ\F�ªe�_b­FVFe�T2P]V,�2VuPRw¬qd_��(VuP]VurLa£5Q`Pa9^`VFe�V���PRQJc`a�_xr`VFe�®5e�VFV�´ºTLn`giV

Wu³T2r`q�a-^`V,a�_xU#_xr`S2e�qd_��(VuP]Vur`\FVFeZTLP]VZr`QLa{r`VFSLgi_bS2_xn`gbV2¥(´�^`_be�a�Vur`q�apQXqdVuU�QJr`e�a9P�Tha�VNa-^dToaa9^`VP]VFqd_be�a-P]_xndc`a�_bQJr¤_�U+fdP]Qo�2VuU#VurLa�\uT2rªgbVuToq�apQ¬T2rª_xU�fdQ`P]a9T2rLa�SJTo_xrª_xr�VFgxT2f`epV�a�_xU#V2¥¾,r`QE«~_�r`S�a-^`VFe�V�ndVFe�a,\FQJr 0'S`cdP�Toa�_bQJr`eul�«V#P�T2r¤a9^`V#a�VFe�a�eD\FQJP�PRVFe9fdQJr`qd_xr`S¤a�Q¬a-^`V�ThgiSLQo¨

P]_ba9^dU Qo£¹Î�_bSJcdPRV¬�+£5Q`PDa9^`V���rJcdU#VuPR_i\uThg¹�oVuP�r`VFgbe�«�_ba9^�Wu�ªfdP]Q`\FVFe�e�QJP]em¥�¸�V¬f`gbQ2a�a�VFq«�_ba9^

î(î�éNM'O 0 )�)

Page 19: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

Wu� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

� ���%É���4x��8� � ���%É ��È 4ö��8 �uÄdÊC4;� �u� 4=� ���54öÉ�6Á�Tha9T P]VFqd_be�a9PR_�ndc`ap_iQ`r(®=[�¨�q'Tha9Toqd_be�aulý[~¨ÌndVFepa]£5Q`P]|°|}³Á�Tha9T P]VFqd_be�a9PR_�ndc`ap_iQ`r(®k¦H¨�q'Toa9Toqd_be�aul˦�¨ÌnµVFe�a]£kQJP]|°|�³

|�Toa9PR_iv U#c`gba�_xf`gb_b\uToa�_bQJr(®=[~¨�q'Toa9Toqd_be�aml ¦H¨�q'Toa9Toqd_be�a9³ |�Toa-P]_bv UXc`gba�_xf`gb_b\uToa�_bQJr(®=[~¨ÌndVFepa]£5Q`P]|°| l˦�¨ÌnµVFe�a]£kQJP]|°|�³��� ( �=%'*+����&%1��% %�* �$# %'���( �$#B�� � ��� Á�Tha9T P]VFqd_be�a9PR_�ndc`ap_iQ`r(®=[�¨ÌnµVFe�a]£kQJP]|°|�ld[~¨ÌndVFe�a]£kQJP�½ �N³½�� £ Th\Fa�QJP]_b­uToa�_bQJr(®5[�¨�q'Toa-Toqd_be�a9³ ½ � £�To\Fa�QJP]_b­uToap_iQ`r(®=[�¨ÌnµVFe�a]£kQJP�½��D³��� ( �=%'*+���% ��( �B� % %�*� �$#B%����( �$#B�, � ��� Á�Tha9T P]VFqd_be�a9PR_�ndc`ap_iQ`r(®=[�¨ÌnµVFe�a]£kQJP�½���l�[~¨ÌndVFe�a]£kQJP�´~»d³��� � �&%1 �@� %'����*+��,� ���($� � ��� Á�Tha9T P]VFqd_be�a9PR_�ndc`ap_iQ`r(®k¼t¨�q'Toa9Thqd_iepaul ¼t¨ÌnµVFe�a]£kQJP�´�»d³´�P]_xT2r`SJc`gxT2P e�QLgi�LVJ®=[�¨�q'Tha9Toqd_be�aul ¼¹¨�q'Toa9Toqd_be�a9³ ´�P]_xT2r`SJc`gxT2P epQ2gb�2VJ®=[~¨ÌndVFe�a]£kQJP�´~»�l ¼t¨ÌndVFepa]£5Q`P�´�»d³��� ( �=%'*+����&%1��% %�* �$# %'���( �$#B�� � ��� Á�Tha9T P]VFqd_be�a9PR_�ndc`ap_iQ`r(®k¼t¨ÌndVFepa]£5Q`P�´�»�l�¼t¨�q'Toa9Toqd_be�a-³

Î�_bSJcdP]V�� J[DgbS2QJP]_ba9^dU�e{c`ep_�r`S+e�VF�2VuP�Tog(r`cdU#VuP]_b\uTog(�oVuP�r`VFg «�_ba9^`QJc`a�T2r`qª«�_ba9^ªq'Toa9T#P]VFqd_be�a9PR_ ¨ndc`a�_bQJr`e

0.0 20.0 40.0 60.0 80.0block size

3.0

3.5

4.0

4.5

5.0

time

(s)

1x162x84x48x216x1

Î�_bSJcdP]V�J |}Tha9P]_bv�UXc`gba�_xf`gbwBVFg�TLf`e�V�a�_xU#V�QJrÂa9^`V¬�Àr2apVFg{O�TLP�ToS2QJr¤«�_ba9^�Wu�¤fdP]QJ\FVFepe�QJP]e�_�rqd_��(VuP]VurLaNSJPR_iq�e-^dT2fdVFe,T2r`q�qd_��(VuP]Vur2aDe�·dcdT2P]V�n`gbQ`\¯�ªe�_b­FVFeu¥

qdQ2ape�QJr 0'SJcdPRVFe�W2W2l2Wu�,TLr`q¬Wu�DTogbgda9^`Va�_xU#_xr`S2e�«�_ba9^`QJc`a�P]VFqd_be�a9PR_�ndc`ap_iQ`r`e¹Q`n`a9To_xr`VFq#«�_ba9^#Togbgqd_��(VuP]VurLa"�`_�PRa9cdTogdSJP]_bqdeT2r`q�n`giQ`\F��ep_i­FVFe\FQJU#n`_xrdToa�_bQJr`eH£5Q`P(e�VF�LVuP�Tog'U�Toa-P]_bv�e�_b­FVFeu¥`´�^`VNndVFepa

ôöíºî�ôxõ

Page 20: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� ���

0.0 20.0 40.0 60.0 80.0block size

2.0

3.0

4.0

5.0

6.0

7.0

time

(s)

1x162x84x48x216x1

Î�_bSJcdP]V#� d½�� qdVF\FQJU�fµQ2e�_ba�_bQJrÅVFg�TLf`e�V�a�_xU#V�QJrªa9^`V��Àr2a�VFg�O�T2P�ToS2Q`r�«�_ba9^ÂWu�#fdP]Q`\FVFe�e�QJP]e~_�rqd_��(VuP]VurLaNSJPR_iq�e-^dT2fdVFe,T2r`q�qd_��(VuP]Vur2aDe�·dcdT2P]V�n`gbQ`\¯�ªe�_b­FVFeu¥

0.0 20.0 40.0 60.0 80.0block size

0.0

100.0

200.0

300.0

time

(s)

1x162x84x48x216x1

Î�_bSJcdP]V�Wu� '´ºPR_�TLr`SJc`gxT2PN»`QLgi�LVXVFgxT2f`epVXap_�U�V�QJrÅa9^`V���rLa�VFgòO�TLP�ToS2QJr+«�_ba9^BWu��fdPRQJ\FVFe�epQJP]eD_�rqd_��(VuP]VurLaNSJPR_iq�e-^dT2fdVFe,T2r`q�qd_��(VuP]Vur2aDe�·dcdT2P]V�n`gbQ`\¯�ªe�_b­FVFeu¥

î(î�éNM'O 0 )�)

Page 21: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

W�J � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

T2r`q°«QJP]epa�\uToe�VFeDQo£"a9^`V#e9T2U�V�\FQJU�fdc`a-Toa�_bQJr`eD«�_ba9^¤a9^`VXP]VFqd_be�a-P]_xndc`a�_bQJr`eXT2PRVXP]VufdP]VFepVur2a�VFqToe~\ucdP]�2VFe�QJrÅa9^`V,e9T2U�V,SJP�T2fd^�¥

200.0 400.0 600.0 800.0 1000.0 1200.0matrix size

0.0

1.0

2.0

3.0

4.0

5.0

time

(s)

best case with redistributionworst case with redistributionwithout redistribution

Î�_bSJcdP]VªW2W� `|�Toa9P]_bv�|�c`gba�_xf`gbw�VFgxT2f`e�V�a�_xU#V�Q`rªa9^`V���rLa�VFgºO�T2PpToS2QJr+«�_ba9^°Wu��fdPRQJ\FVFe�epQJP]e£kQJPTogbg(S`P]_bq�e9^dT2fdVFeDT2r`q�n`gbQJ\F�ªe�_b­FVFe�«�_ba9^`QJc`a~q'Toa9T#P]VFqd_be�a9P]_xndc`a�_bQJr`eD\FQJU�fdTLP]VFq�«�_ba9^�P]VFqd_be�a9PR_ ¨ndc`a�_bQJrÅa�Q�a9^`V�nµVFe�aD\FQJr 0'SJcdP�Toap_iQ`r�¥

´�^`V�P]VFe9c`gba�e,T2P]V#VFvJa-P]VuU#VFgbw�S2Q`Q`q²ndVF\uTLc`e�V#a9^`VuP]V#_be,QJr`gbw�e�gb_bSJ^La,qd_ �(VuP]Vur`\FVFe#ndVFa]«VFVura9^`V�ndVFepaXT2r`qB«QJP]e�aD\uToe�Vª«�_ba9^°a9^`VªP]VFqd_be�a9P]_xndc`a�_bQJr`eu¥tÍ�Vur`\FVªa9^`_be�fdVuPÀ£kQJP�U�TLr`\FV�e�a9Tmw`e,_�ra9^`V,e-T2U#V�P�T2r`SLVN\FgbQ2e�V,apQXa-^`V,QJf`a�_xU�Togk¥�>,rªa9^`V,\FQ`r2a9PpT2P]w2lJT+ndToq�qd_be�a9P]_xndc`a�_bQJrÅ\¯^`Q2_b\FV�\uT2rgbVuToqªa�Q�qdQ`cdn`giVDa9^`VDVFvJVF\uc`ap_iQ`r�a�_xU#V{£5QJPt½��§T2r`qªUXc`gba�_xf`gbw�n2w 0'�2V�a-^`VNa�_xU#V~Qo£(a9^`V�e�Q2gb�2VFem¥´(Q�\FQJr`\Fgxc`qdV2ld«VZfdP]VFepVur2at_�rªÎ�_bSJcdPRVXW�:�a9^`V�a�VFepa�e"\FQJPpP]VFe9fdQ`r`qd_�r`S+a�Q�a9^`V,TogbS2QJPR_ia-^dU Qo£

Î�_bSJcdP]V#�#«N^`VuP]V,a-^`V�rJcdU#VuPR_i\uThgò�oVuPpr`VFgie,T2P]V,\F^dTo_xr`VFq�_xrÅa9^`V,e9T2U#V�fdTLP�TogbgbVFgòfdP]QLSJP�T2U�¥´�^`V¬P]VFe9c`gba�e�fdPRQE�LVFe,a9^dToaD_�rBTogbg�\uToe�VFeml�_ia,_be#ndVFa�a�VuPDapQ�c`e�V�a9^`V�P]VFqd_be�a9PR_�ndc`ap_iQ`r`eXa-^dT2r

a�Q¤e�a9Tmw�«~_ia-^°a9^`V�e9TLU#V�QJr`V#£�P]QJU�a-^`V¬ndVFS2_xrdr`_xr`S�a�Q�a-^`V�Vur`q(¥�|²Q`P]VFQo�2VuPFl%a9^`V 0�P]e�a,a]«Q\ucdP]�LVFe�Qo£,a9^`V�a�VFe�a�e�T2PRV�a9^`V�«QJP]e�a�fdQLe�e�_xn`gbV�\uToe�V�«D^`_igbV�c`e�_xr`S©P]VFqd_be�a-P]_xndc`a�_bQJr�ndVF\uT2c`e�V0�P]epa�«VXToe�e-cdU#V,a9^dToaNr`Q`r`V�Qo£�a9^`V�T2P�PpTuw+_ieD_xr¤a9^`V,S2Q`QJq¤qd_iepa9P]_xndc`a�_bQJr°Toa�a9^`V�nµVFS2_xrdr`_xr`S'lT2r`q°e�VF\FQ`r`q(l'«VXPRVFqd_iepa9P]_xndc`a�V#a9^`VXTLP�P�TmwJeDToa�a9^`V#Vur`q�_xr¤QJP]qdVuP~a�QªS2Q¬ndTo\F�ªa�Qªa9^`V�_xr`_ba�_xTogq'Toa-T©P]VufdTLP]a�_ba�_bQJr�¥´�^`VFe�V²�Âe�a�Vuf`e�T2P]V°r`Q2a�U�T2r`q'Toa�Q`P]w©_xr�V-±¬\F_bVurLa¡fdP]QLSJP�T2U�U�_�r`S§®ö£kQJP_xr`e�a9TLr`\FV�_ £Da9^`V¬_xr`_ba�_xTog�qd_be�a-P]_xndc`a�_bQJr�_be#«VFgig\F^`Q2e�VurBT2r`qB_ £Da9^`VuP]Vª_ie�r`Q�Toe�e-cdU�f`a�_bQJrÂQJra9^`Vªq'Toa9Tªqd_be�a-P]_xndc`a�_bQJrBq'cdP]_xr`S°a9^`V¬P]VFe�a,Qh£{a-^`V¬fdP]Q2SJPpT2U�³F¥�´�^`c`e�a9^`V�S`To_xr�QJn`a-To_xr`VFq�£kQJP

ôöíºî�ôxõ

Page 22: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� �K�

0.0 500.0 1000.0 1500.0 2000.0matrix size

0.0

5.0

10.0

15.0

time

(s)

best case with redistributionworst case with redistributionwithout redistribution

Î�_bSJcdP]V¤Wu� (½���qdVF\FQJU�fµQ2e�_ba�_bQJrÂVFgxT2f`e�Vªa�_xU#V�QJr¤a9^`V¬�Àr2apVFg¹O�T2P�ThS2QJr�«~_ia-^�Wu��fdP]Q`\FVFe�e�QJPRe£kQJP,Togbg"S`P]_bq�e9^dT2fµVFeXT2r`q©n`gbQ`\¯�¤e�_b­FVFe#«�_ba9^`QJc`a,q'Tha9T�P]VFqd_be�a9P]_xndc`a�_bQJr`e#\FQJU+fdT2P]VFq�«�_ba9^ÂP]V-¨qd_be�a9PR_�ndc`ap_iQ`r�apQ�a9^`V�ndVFepaN\FQJr 0'S`cdP�Toa�_bQJr�¥

î(î�éNM'O 0 )�)

Page 23: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

�2� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

200.0 400.0 600.0 800.0 1000.0 1200.0matrix size

0.0

100.0

200.0

300.0

time

(s)

best case with redistributionworst case with redistributionwithout redistribution

Î�_bSJcdP]V�Wm� d´ºP]_xT2r`S`c`g�TLPN»`Q2gb�2V,VFgxT2f`epV�a�_xU#V,QJrªa-^`VZ�Àr2a�VFg�O�T2P�ToS2Q`r�«�_ba9^°Wm�XfdPRQJ\FVFe�epQJP]e£kQJPTogbg(S`P]_bq�e9^dT2fdVFeDT2r`q�n`gbQJ\F�ªe�_b­FVFe�«�_ba9^`QJc`a~q'Toa9T#P]VFqd_be�a9P]_xndc`a�_bQJr`eD\FQJU�fdTLP]VFq�«�_ba9^�P]VFqd_be�a9PR_ ¨ndc`a�_bQJrÅa�Q�a9^`V�nµVFe�aD\FQJr 0'SJcdP�Toap_iQ`r�¥

ôöíºî�ôxõ

Page 24: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� ���

200.0 400.0 600.0 800.0 1000.0 1200.0matrix size

0.0

10.0

20.0

30.0

time

(s)

best case with redistributionworst case with redistributionwithout redistributionoptimized case

Î�_bSJcdP]V�W�: "¼QJU�fdTLP]_be�QJr�Qh£�a9^`V�e9c`\F\FVFepe�_bQJr�Qo£�a9^`V²�Br`cdU#VuP]_b\uTog��oVuP�r`VFgbe�QJr a9^`V²�Àr2apVFgO�T2PpToS2QJr«�_ba9^�Wu��fdPRQJ\FVFe�epQJP]e#£kQJP#Togbg�SJP]_bq�e9^dT2fdVFeªT2r`q n`gbQJ\F�Be�_b­FVFe¬«~_ia-^ T2r`q�«�_ba9^`Q`c`aq'Toa-T�P]VFqd_be�a9P]_xndc`a�_bQJr`e�TLr`q�_�rÅa9^`V,QJf`a�_xU#_b­FVFq°\uToe�V2¥

î(î�éNM'O 0 )�)

Page 25: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

�2� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

a9^`_be\FQJU�fdc`a-Toa�_bQJr#_be¹n`_bS2SLVuP�£kQJP�a9^`_betQJf`a�_xU#_b­FVFq¬\uThe�V2l2«N^`_b\F^�_beP]VufdP]VFe�VurLa�VFqª_�r�a9^`V�gbQo«¹VFepa\ucdP]�LV�Qo£ºÎ�_bSJcdPRV¬W�:'¥

����� � & & � � -�� ��� � ��'�').� $ ) "%�'&*)Î%QJP(a-^`V{¼~P�Tmw�´��oÁ�lFÎ�_bSJcdP]V,W�G~e9^`Qo«�e�a9^dToa�a9^`V�ndVFe�a"fµVuPÀ£kQJP�U�T2r`\FVQo£'a-^`V{U+Toa9P]_bv�U#c`gba�_xf`giw_be�QJn`a-To_xr`VFq¬«�_ba9^ªT�: " :#�`_�PRa9cdTog'S`P]_bq�«�_ba9^�a-^`VNn`_bS2S2VFe�a�n`gbQJ\F�+e�_b­FV�®k� :,£kQJP�QJcdPfdP]Q`n`giVuU+³Toe~QJrªa9^`VXO�TLP�ToS2QJr�¥

0.0 20.0 40.0 60.0 80.0block size

1.0

2.0

3.0

4.0

time

(s)

1x162x84x48x216x1

Î�_bSJcdP]V�W�G "|�Toa9PR_iv�|�c`gba�_xf`gbw�VFg�TLf`e�V�a�_xU#V°QJr a9^`V�¼�P�Tmw ´��oÁ «�_ba9^ Wu�©fdPRQJ\FVFe�epQJP]eª_�rqd_��(VuP]VurLaNSJPR_iq�e-^dT2fdVFe,T2r`q�qd_��(VuP]Vur2aDe�·dcdT2P]V�n`gbQ`\¯�ªe�_b­FVFeu¥

Î'QJPa-^`VX½��¶qdVF\FQJU�fdQLe�_ba�_bQJr�l�a9^`VXnµVFe�a,P]VFe9c`gbaN_be,QJn`a-To_xr`VFq�«�_ba9^°n`gbQJ\F�LeNQo£"ep_i­FV�Wu��QJrT�W "©Wu���`_�PRa9cdTog(SJPR_iq¤Qo£òfdP]Q`\FVFe�e�Q`P]eu¥Î'QJPta9^`V�e�Q2gb�2V,Qo£¹Wm�2��:�P]_bSJ^LaN^dT2r`q�ep_iqdVFemlda9^`V�ndVFepaZP]VFe-c`gia~_ieDQJn`a9Th_�rª«~_ia-^�a9^`V�n`_bS2S2VFepa

n`gbQ`\¯�ªe�_b­FV�®k��:J³tQJrÅa9^`V�Wu� "©W,�`_xP]a9cdTog SJP]_bq�Qo£"fdP]QJ\FVFepe�QJP]eu¥[NSJTh_�r�lµa9^`V,SJP]_bq�e9^dT2fµVXT2r`q�n`gbQ`\¯�+e�_b­FVFe,T2P]V,�2VuP]w+qd_ �(VuP]VurLa~£5QJPta9^`VFepV���P]QJc`a�_xr`VFeX®5e�VFV

´�T2n`gbV�Wu³Fl e�Q°a9^`V¡PRVFqd_iepa9P]_xndc`a�_bQJr ThgiSLQJP]_ba9^dU \uT2rÂgiVuThq�a�Q²T2rÂ_xU�fdQ`P]a9T2rLa#SJTo_xr�_xrBVFgxT2f`e�Va�_xU#V2¥[NeH£5QJPHa9^`V,O�T2P�ToS2Q`r�lu«VNP�TLr#a9^`VDa�VFe�a�et\FQJP�P]VFe9fµQJr`qd_xr`S#a�Q�a9^`VDTogbS2QJP]_ba9^dU#eQo£(Î�_bSJcdP]VD�

£kQJPa9^`V#��r`cdU#VuP]_b\uTog��oVuP�r`VFgbeN«~_ia-^�Wu�#fdP]Q`\FVFe�e�QJPReu¥d¸�V#f`giQLa�a�VFq�Q`r 0'SJcdP]VFe�W�Jµl(Wu��T2r`q��L�Togbg�a9^`V�a�_xU#_xr`S2e,«�_ba9^`QJc`a,P]VFqd_be�a9PR_�ndc`ap_iQ`r`e�QJn`a9Th_�r`VFq°«�_ba9^°Togbg�qd_��øVuPRVur2aD�J_xP]a-cdTogºS`P]_bqde�T2r`q

ôöíºî�ôxõ

Page 26: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� ��-

0.0 20.0 40.0 60.0 80.0block size

1.0

2.0

3.0

4.0

5.0

time

(s)

1x162x84x48x216x1

Î�_bSJcdP]V°Wu� �½���qdVF\FQJU+fdQ2e�_ba�_bQJrBVFgxT2f`epV�a�_xU#V¬Q`r°a9^`V�¼�P�Tmw�´��oÁ «�_ba9^�Wu�¤fdP]QJ\FVFepe�QJP]e�_�rqd_��(VuP]VurLaNSJPR_iq�e-^dT2fdVFe,T2r`q�qd_��(VuP]Vur2aDe�·dcdT2P]V�n`gbQ`\¯�ªe�_b­FVFeu¥

0.0 20.0 40.0 60.0 80.0block size

0.0

50.0

100.0

150.0

200.0

time

(s)

1x162x84x48x216x1

Î�_bSJcdP]V�Wu� "´ºPR_�TLr`SJc`gxT2P�»`Q2gb�2V°VFgxT2f`e�V°a�_xU#V°QJr�a-^`V}¼~P�Tmw ´��hÁ «�_ba9^ Wu�©fdPRQJ\FVFe�epQJP]eª_�rqd_��(VuP]VurLaNSJPR_iq�e-^dT2fdVFe,T2r`q�qd_��(VuP]Vur2aDe�·dcdT2P]V�n`gbQ`\¯�ªe�_b­FVFeu¥

î(î�éNM'O 0 )�)

Page 27: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

��: � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

n`gbQ`\¯�°ep_i­FVFe#\FQJU#n`_xrdToa�_bQJr`e�£kQJPDe�VF�2VuPpTog¹U�Tha9P]_bv°e�_b­FVFeu¥º´�^`V�ndVFepaXT2r`q�«QJPRe�a,\uToe�VFe�Qo£�a9^`Ve9TLU#V#\FQJU�fdc`a9Toap_iQ`r`e,«�_ba9^¤a9^`V�P]VFqd_be�a9P]_xndc`a�_bQJr`e#T2P]V�P]VufdP]VFe�VurLa�VFq�Toe,\ucdP]�LVFeNQJr¤a9^`V#e9TLU#VSJPpT2fd^�¥

0.0 500.0 1000.0 1500.0 2000.0matrix size

0.0

5.0

10.0

15.0

time

(s)

best case with redistributionworst case with redistributionwithout redistribution

Î�_bSJcdP]VªW�J d|�Toa-P]_bv¬|�c`gba�_xf`gbw�VFgxT2f`e�V�a�_xU#V�QJrªa-^`VX¼�P�Tmw¬´~�oÁ «�_ba9^�Wm��fdP]Q`\FVFe�e�QJP]e{£5QJP~TogbgSJPR_iq°e9^dTLfdVFe,T2r`q²n`gbQ`\¯�Åe�_b­FVFe,«�_ba9^`QJc`a�q'Toa-T¬P]VFqd_be�a-P]_xndc`a�_bQJr`e,\FQJU�fdT2PRVFq�«�_ba9^¤P]VFqd_be�a9P]_xndcL¨a�_bQJrÅa�Q#a9^`VXnµVFe�a�\FQJr 0'S`cdP�Toa�_bQJr�¥

Í�VuP]V�ToS`To_xr�l�a9^`V�c`e�V#Qo£{P]VFqd_be�a-P]_xndc`a�_bQJr©ThgigbQo« a�Q�^dTm�2V�T2r¤VFg�TLf`e�V�a�_xU#V�TogxU#Q2epaN_xr`qdV-¨fµVur`qdVur2a�Qh£Na9^`V�_xr`_ba�_xTog�qd_be�a9PR_�ndc`ap_iQ`r T2r`qB�2VuP]w°\FgbQ2e�Vª£�P]QJU a9^`V¬Q`f`a�_xU�Togk¥�¦¹The�_b\uTogbgiw2lHa9^`VP]VFqd_be�a-P]_xndc`a�_bQJr#_be"c`e�VFgbVFe�e�QJr`gbw�«D^`Vur�a-^`V¹U�Toa-P]_bv,_ie"Thg�PRVuToqdw�_xr�a9^`V"Q`f`a�_xU�Tog`\FQJr 0'SJcdP�Toap_iQ`r�¥´(Q�\FQJr`\Fgxc`qdV2l�«V�fdP]VFe�VurLa�_xrÂÎ�_bSJcdP]V¬�dW#a-^`V�a�VFe�a�e,\FQ`P�P]VFe9fµQJr`qd_xr`S²apQ�a9^`V¬TogbS2QJPR_ia-^dU

_xr°Î�_bSJcdP]V��ª«N^`VuPRVXa-^`V�r`cdU#VuP]_b\uTog"�hVuP�r`VFgbe�T2P]V#\¯^dTh_�r`VFqÂ_�r}a9^`V�e9TLU#V�fdT2P�TogbgbVFg"fdP]Q2SJPpT2U�¥Í~VuP]V�ToSJTh_�r�lHa9^`VuP]Vª_ieªr`Q�e�_xr`S2gbV�q'Toa-T�qd_be�a-P]_xndc`a�_bQJr�a9^dTha�fµVuP�U#_ba�a�Q�P]VuTh\¯^Ba9^`VªQJf`a�_xU�ThgfµVuPÀ£5Q`P�U�T2r`\FV#QJn`a9Th_�r`VFq�nLw�a9^`VXPRVFqd_iepa9P]_xndc`a�_bQJr�nµVFaÌ«VFVur¤VuTo\F^�QJfµVuP�Toa�_bQJr�¥�´�^`V�P]VFe9c`gba�eDQo£a9^`V~QJf`a�_xU#_b­FVFq¬fdP]Q2S`P�T2U�U#_xr`S,«�_ba9^`Q`c`a�_�r`_ba�_xTog�T2r`q 0�rdTog'PRVFqd_iepa9P]_xndc`a�_bQJr`e�T2P]V�Togbe�Q�f`gbQ2a�apVFq(¥

ôöíºî�ôxõ

Page 28: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� � �

0.0 500.0 1000.0 1500.0 2000.0matrix size

0.0

5.0

10.0

15.0

time

(s)

best case with redistributionworst case with redistributionwithout redistribution

Î�_bSJcdP]V¤Wu� �½���qdVF\FQJU�fdQLe�_ba�_bQJr�VFgxT2f`epV�a�_xU#V�QJr¤a9^`Vª¼�P�Tmw�´��oÁ�«�_ba9^�Wm�¬fdP]Q`\FVFe�e�Q`P]e,£kQJPTogbg(S`P]_bq�e9^dT2fdVFeDT2r`q�n`gbQJ\F�ªe�_b­FVFe�«�_ba9^`QJc`a~q'Toa9T#P]VFqd_be�a9P]_xndc`a�_bQJr`eD\FQJU�fdTLP]VFq�«�_ba9^�P]VFqd_be�a9PR_ ¨ndc`a�_bQJrÅa�Q�a9^`V�nµVFe�aD\FQJr 0'SJcdP�Toap_iQ`r�¥

O�T2P�ToSLQJr ´��oÁ¾DVuP�r`VFg ¦¦» ¦ ��» ¦¹¦» ¦ ��»|°| ��: : " : � : : "�:½�� J W "�Wu� Wm� W "BWu�´�» ��: Wu��"�W � : Wu� "�W

´�T2n`gbVXW 2¦"VFepa"¦"gbQ`\¯�+»`_i­FV#®k¦¦¹»d³tT2r`q¬¦tVFe�a ��P]_bq¬»d^dT2fdV�®k¦ ��»d³ £kQJP�a9^`V~qd_ �(VuP]VurLa"fdT2P�ThgigbVFg\FQJU+fdc`a�VuP]e�Qo£�QJcdPapVFe�a«N^`_bgiV�P�cdrdr`_xr`Sªa9^`V,|�Toa9P]_bvª|�c`giap_�f`gbw�®=|°|}³ lJ½�� qdVF\FQJU+fdQ2e�_ba�_bQJr®k½ �N³�TLr`q²´�P]_xT2r`SJc`gxT2PD»`Q2gb�2VFeX®k´�»d³{\FQJU�fdc`a9Toap_iQ`r��oVuP�r`VFgbeu¥

î(î�éNM'O 0 )�)

Page 29: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

�2� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

0.0 500.0 1000.0 1500.0 2000.0matrix size

0.0

100.0

200.0

300.0

time

(s)

best case with redistributionworst case with redistributionwithout redistribution

Î�_bSJcdP]V#�2� d´�P]_xT2r`SJc`gxT2PD»`Q2gb�2V�VFgxT2f`e�V�a�_xU#V,QJrªa9^`V#¼�P�Tmw¬´��hÁ¿«�_ba9^ÂWu�#fdP]Q`\FVFe�e�QJP]e{£5QJP~TogbgSJPR_iq°e9^dTLfdVFe,T2r`q²n`gbQ`\¯�Åe�_b­FVFe,«�_ba9^`QJc`a�q'Toa-T¬P]VFqd_be�a-P]_xndc`a�_bQJr`e,\FQJU�fdT2PRVFq�«�_ba9^¤P]VFqd_be�a9P]_xndcL¨a�_bQJrÅa�Q#a9^`VXnµVFe�a�\FQJr 0'S`cdP�Toa�_bQJr�¥

ôöíºî�ôxõ

Page 30: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� �&�

200.0 400.0 600.0 800.0 1000.0 1200.0matrix size

0.0

5.0

10.0

15.0

time

(s)

best case with redistributionworst case with redistributionwithout redistributionoptimized case

Î�_bSJcdP]V°�dW� "¼{QJU�fdT2P]_be�Q`r�Qo£,a9^`V�e-c`\F\FVFe�e�_bQJr�Qo£,a9^`V²�BrJcdU�VuP]_b\uTogZ�hVuP�r`VFgbe¬QJr a9^`V²¼~P�Tmw´��hÁ�«~_ia-^�Wu��fdP]Q`\FVFe�e�QJPReD£5QJP,ThgigtSJP]_bq�e9^dTLfdVFeXTLr`q©n`gbQJ\F�¤e�_b­FVFe�«�_ba9^�TLr`q�«�_ba9^`QJc`a�q'Toa9TP]VFqd_be�a-P]_xndc`a�_bQJr`e�T2r`q�_xrÅa9^`V,QJf`a�_xU#_b­FVFq�\uThe�V2¥

î(î�éNM'O 0 )�)

Page 31: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

��J � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

� ���.����-*�$��p�,

�Àr a-^`V�S2Vur`VuP�Tog�\uThe�V2l"a-^`V�P]VFqd_be�a9P]_xndc`a�_bQJr�Qo£�q'Toa-T°_be¡c`e�V-£�c`g�a�QÂ_�U+fdP]Qo�2V�a9^`V�V-±ª\F_iVur`\FwQo£XfdTLP�TogbgbVFg�gb_xr`VuT2P�TogbS2VundP�T�P]Q`c`a�_xr`VFeu¥¹¦c`a�a�QBVur`e9cdP]V�TÂSJTo_xr�QJr�a-^`V²VFgxT2f`epV²ap_�U�V2la9^`VP]VFqd_be�a-P]_xndc`a�_bQJr¤Qo£òq'Tha9T�^dToe�apQ�nµV��2VuPRw¬V-±¬\F_bVurLau¥

>,cdP\FQJU�f`gbVFv`_ia]w�T2rdTogbw`e�_be�e9^`Qo«�ea9^dToaa9^`VDe�\uT2rdr`_xr`Sª_ieDr`VFS2gb_iSL_�n`gbV�£kQJPT2P�P�Tmw`e"Qo£º\FQ`U�¨U#Q`r�e�_b­FV2¥N´�^`Vur «�_ba9^�QJcdPªQJf`ap_�U�_i­uTha�_bQJr`eul�_ba¤_ie¤e9cL±¬\F_bVurLa�a�Q �`r`Qo« a9^`V°qd_iepa9P]_xndc`a�_bQJrfdT2PpT2U#VFa�VuP]e#QJr`gbw�Toa�P�cdrLa�_xU#V�T2r`q�a9^`VuP]V�_be¬r`QBU#QJP]Vª\FQJr`e�a9PpTo_xr2a�e�T2nµQJc`a�fdP]Qo�J_bqd_xr`S�Togbgqd_be�a9PR_�ndc`ap_iQ`r²fdTLP�T2U#VFa�VuPReNToa\FQJU�f`_bgbV-¨�a�_xU#V2¥´(VFe�apeX«VuP]V�P�cdrÂQJrBT²¼~P�Tmw}´~�oÁ T2r`q�QJrBT2rB��rLa�VFg�O�T2P�ToSLQJr�¥�[Ngbg�a-^`V¬VFvdfdVuP]_xU#VurLa�e

\FQJPpP]QJndQ`P�Toa�Vt�2VuP]w�«VFgbgdQJcdP�VFvdfdVF\Fa9Tha�_bQJr`euloa9^`V\FQJU�fdc`a-Toa�_bQJr#Qo£'a-^`V¹q'Tha9TDe�VFa�e�_be¹r`VFS2gb_bS2_xn`gbV\FQJU+fdT2P]VFq�apQXa-^`V,\FQJU�U#cdr`_b\uToa�_bQJr�TLr`q²fdTo\F�2_xr`S'l%T2r`q�a9^`V�SLgiQ`ndTogºP]VFqd_be�a9PR_�ndc`ap_iQ`r�P]QJc`ap_�r`VVFv`VF\uc`a�_bQJr¤a�_xU#V,_beN�LVuP]w¬SLQJQ`q(¥´�^`VFe�VXPRVFe9c`gba�e,e9^`Qo«§a9^dToaDa9^`V#P]VFqd_be�a9P]_xndc`a�_bQJr¤Qo£"q'Toa-T#«�_ba9^¤QJcdPDTogbS2QJP]_ba9^dU#eD\uT2rÅV-±�¨

\F_bVurLa�gbw²fdVuPÌ£5QJPpU�_xr²fdPpTo\Fa�_b\FV�S2_b�`_xr`S¡cdfÅa�QLJ2����SJTh_�r¤T2r`q²The�e9cdP]_xr`S�QJrÅa9^`V�Tm�2VuP�ToSLV,a9^dToaa9^`V¤\FQJU�fdc`a9Tha�_bQJrBa�_xU#V�epa9TmwJe#�2VuP]w�\FgbQ2epV�a�Q°a9^`V¤QJf`a�_xU�Tog�VF�LVur�«�_ba9^�ndToq�_xr`_iap_�Thg,q'Toa9Tqd_be�a9PR_�ndc`ap_iQ`r�\F^`Q2_b\FVFe�®ö£5Q`P�ndQ2a-^�n`giQ`\F�ªe�_b­FVFeZT2r`q�S`P]_bq�e9^dT2fµVu³F¥Î�P]QJU QJcdP�P]VFe-c`giapeulNT 0�P]e�a�_xU�fdP]Qo�2VuU�Vur2a�e9^`Q`c`giq ndV�a9^`V°_xrLa�VFSJP�Toap_iQ`r�Qo£�a9^`VFe�VBP]V-¨

qd_be�a9PR_�ndc`ap_iQ`r P]QJc`ap_�r`VFe#_xrBa9^`V�gb_xr`VuT2P�ThgiSLVundP�T°�oVuP�r`VFga9^`VuU#e�VFgb�2VFe�®knµV-£5Q`P]V¡TLr`q Tm£ka�VuP.a9^`V\FQJU+fdc`a9Toa�_bQJrd³tToe�e�Q`QJr�Toe�a9^`VFw�fdP]Qo�`_bqdVNT,SJTh_�r�lL_k¥ýV2¥o«N^`Vur+a9^`VNU�Toa-P]_bv�e�_b­FVD_ien`_bS2S2VuP�a-^dT2rT#SL_i�LVur�fdQL_�rLaDqdVufdVur`qd_xr`S�Qo£�a9^`V�f`gxToa]£kQJP�U�¥|²Q`P]VFQo�2VuPFl�QJcdPªP]VFe9c`gba�e�T2P]V�Vur`\FQ`cdP�ToS2_xr`S£5QJP+a9^`VÅ£ P]VF·dc`VurLa¡c`e�V¤Qo£�a9^`_be�P]VFqd_be�a9P]_xndcL¨

a�_bQJrÂgb_xndP�T2P]w�P]QJc`a�_xr`VFe#_xr�VFvdf`gb_b\F_ia�fdT2P�ThgigbVFgNfdPRQ2SJP�T2U+U#_xr`S'l(U�Toe�apVuP Aoe�gxTm�2V�e�\F^`VuU#VFe#QJP._�r\FQJU+f`_igbVFq°\FQ`qdVFeNS2Vur`VuPpToa�VFq�nLw¡ÍDO�Î�\FQJU+f`_igbVuP]em¥

>,cdP"TogbS2QJPR_ia-^dU#e"T2P]V_xU�f`gbVuU#VurLa�VFq�«�_ba9^`_xr#a9^`VN»d¼{[Z½�[,O%[N¼�¾�gb_xndP�T2PRw¬T2r`q�T2P]VNTh\F\FVFe]¨e�_xn`gbVt£5Q`P�a�VFe�a(T2r`q#c`e�V�QJrDa9^`V�� . e�_ba�V��������� ��������������������� �!�"� #�$�%���%�&%�$�'� ��(�&)*��+,�-����.�� ¥

� �0/u��&(�{.���t$

8öW$< ��¥[DSJP�Tm«�Togkl"[#¥�»dc`e�e9U+T2r�l�T2r`q§�'¥�»dThgiap­2¥ ¼QJU�f`_bgbVuP�T2r`q P�cdr2ap_�U�V�e9cdfdfdQ`P]a�£kQJPe�a9P�c`\Fa-cdP]VFq�T2r`q�n`gbQJ\F�ªe�a9Ppc`\Fa9cdP]VFq²TLfdf`gi_b\uToap_iQ`r`eu¥¹fdThS2VFe7G2��J�1 G�J2�dl%Wu�2�2�µ¥

8Ë��<X»�¥¹O�¥t[ZU+T2P�Toe�_xr`SJ^`V°T2r`q�|�¥"»�¥¹½ T2U�¥¿¼{QJU�U#cdr`_b\uToa�_bQJr�QJf`a�_xU#_b­uToa�_bQJr�T2r`q�\FQJqdVS2Vur`VuP�Toap_iQ`r¤£5Q`P,qd_be�a9P]_xndc`a�VFq U�VuU#QJP]w�U�To\F^`_xr`VFeu¥°��r6���� ����( ����$�B�� � ( �(�+( �� ����������+*@� �+��#���%�� �@����# � � ��� �����������'lL[Ngxndc`·dc`VuP]· c`V2l �~|�l2�Jcdr`V#Wu�2�2�µ¥o[N¼| »d���D¨O"½�[ ��¥

ôöíºî�ôxõ

Page 32: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

����������� ����������������������������! "�$#��&%'�($��)�*+��,�� ���8Ë��<X»�¥�¼~^dToa�a�VuP=y]VFV2ld�'¥�z,¥��,_bgxndVuP]aml�Î�¥d�'¥ ��¥�½�QJr`S'l�z,¥�»`\¯^dP]VF_xnµVuPFl(T2r`q²»�¥�Í�¥�´ Vur`S'¥ �.V-¨r`VuP�Toa�_xr`S#gbQJ\uThg�Toqdq'P]VFepe�VFe�T2r`q¬\FQ`U�U#cdr`_i\uTha�_bQJr�epVFa�et£5QJPHq'Toa9Tm¨ÌfdTLP�TogbgbVFg�fdP]Q2S`P�T2U#eu¥���r� � � ��%���*� 9�� �"($�,���� � ����%7���# � ( ������,�$���*��� � ( � �,����� � ( �(�+( �� ����`lJ»dT2r+qd_bVFS2Q'ld¼[#l|�Tuw¤Wu�2�2�d¥`[N¼{| »d����O"½�[ ��¥

8 :�<��'¥"¼�^`QL_5lH�'¥ý�'¥�Á,Q`r`SJT2P�P�TµlºT2r`q�Á�¥ý¸ ¥�¸©Togx�oVuP ¥ ´~^`V�Á,VFe�_bSJrBQo£�»`\uTogxT2n`gbV²»`Qh£5a]«�T2P]V½�_xndP�T2P]_bVFeª£kQJP+Á,_iepa9P]_xndc`a�VFq�|°VuU#QJP]w�¼QJr`\ucdP�PRVur2a�¼QJU�fdc`apVuP]eu¥§��r �'¥ý�'¥tÁ,QJr`SJTLP�P�TT2r`q�¦N¥�´ QJcdP�T2r`\F^`VuT2c�l�VFqd_ba�QJP]eml �"�����( ��& ����%B���# � ����� % � � ( �N��( ���,����� � ���,����� $"����� � *+��,��Jl'fdThS2VFeN� 1�W�Gd¥��HgiepVF�J_bVuPFl�Wu�2�2�µ¥

8 G�<XO�¥ò¼~P]QJQ`�2e¬TLr`q z,¥"Í�¥òO�VuPpP]Q2a�au¥B½(T2r`SJcdThS2V�\FQJr`e�a9Ppc`\Fa�£kQJP#q'Tha9T°fdT2P]a�_ba�_bQJr`_xr`S�T2r`qqd_be�a9P]_xndc`a�_bQJr�¥d´ VF\¯^dr`_b\uTogdP]VufdQ`P]aulFqdVuf`a�Qo£'¼,¥ »�¥xlF´�^`V��,c`VFVur �ýe �Dr`_b�,Qo£'¦"VFgË£�Toe�aulo¦tVFgË£ Toepa¦´���� � ��l���Q`P]a9^`VuP�r��ÀP]VFgxT2r`q(lºWm�2��:'¥

8Ë��<�Î�¥dÁ,VFe9fdP]VF­#T2r`q²¦D¥�´(Q`cdP�T2r`\F^`VuT2c�¥�½ >,¼�¼�» �½�Qo« >D�2VuPp^`VuToq�¼QJU�U#cdr`_b\uToa�_bQJr¤T2r`q¼�U�fdc`a9Tha�_bQJr©»dcdndPRQJc`a�_xr`VFeu¥ � *@�*+( ���������( ������� ���� � *+���( � ��%��� %Fl�Wu� �L�2� 1 ��J :'lWu�2��:%¥

8Ë��<��'¥ý�'¥�Á,QJr`S`T2P�P�Tdl�z,¥��"T2rBÁ,V �,VF_ yRr�ltT2r`q z,¥ ¼,¥�¸§^dTogbVFw2¥©´{«¹Q¤Á,_xU#Vur`e�_bQJrdTogN¦Toep_i\½�_xr`VuT2P�[NgbS2VundPpT�¼QJU�U#cdr`_b\uToa�_bQJrB»dcdndfdP]Q2S`P�T2U#eu¥��ÀrB�'¥ý�'¥ Á,QJr`SJT2P�PpT�T2r`q ¦D¥�´ QJcL¨P�T2r`\F^`VuT2c�l�VFqd_ba�QJP]eml �N ����( ��& ����%1���# � ��� �K% � ��( �N��( ���,����� � ��������� $"�!���� �� *@�����`lfdToS2VFe�Wm� 1 �2�d¥���gbe�VF�`_bVuPFl�»`Vuf`apVuUXnµVuP�Wm�2�2�d¥

8 J�<X»�¥(Í�_xP�TLrdT2r`q'T2r`_kl�¾#¥º¾DVurdr`VFqdw2l��'¥�|°VFgbgiQ`PÀ¨Ì¼�P�cdU�U�VFw2lºT2r`q�[#¥º»`VFa-^`_5¥ª¼QJU�f`_bgxToa�_bQJra�VF\F^dr`_i·dc`VFe£kQJPn`gbQJ\F�o¨�\Fw`\Fgb_b\Xqd_be�a-P]_xndc`a�_bQJr`eu¥"´ VF\¯^dr`_b\uTog�zVufdQJPRaN´�z���G G2�dW-¨Ì»�l`¼�z�O"¼.lz_b\FV �Dr`_b��¥xl�Í~QJc`e�a�QJr�ld´�� �2�2�L��Gdl`|�T2P]\F^�Wu�L��Gd¥

8Ë��<�[X¥o»`VFa9^`_d¾#¥E¾DVurdr`VFqdw2l ��¥ ��VFqdVFg y��oQo�J_b\2¥d[�gb_xr`VuT2PÀ¨�a�_xU#VDTogbS2QJP]_ba9^dU £5Q`P(\FQJU+fdc`a�_xr`S,a9^`VU#VuU#QJPRw�To\F\FVFe�e�e�VF· c`Vur`\FV~_�r�q'Toa-Tm¨ÌfdT2P�TogbgbVFgdfdP]Q2SJP�TLU#eu¥2��r �N( ������9� ���=%"���# � ( �������� � �*�� ��( ����������� ( � �@( �� �,��Jlm�2Q2gxcdU#VN�2�~Qo£ � �� � W��� � ��� ��� � W�� � � lFfdToS2VFe�Wu�2��1�W2W2W2l»dT2rLa9T#¦¹TLP�ndT2P�Tdld¼[#ld�Jc`gbw�Wu�L��Gd¥`[N¼| O"PRVFe�eu¥

8xWm��<��'¥d|�¥'»`a�_b\F^dr`Q2a9^�ldÁ�¥�> � ÍNThgigxT2PRQJr�l�TLr`q�´,¥'z,¥ ��P]Q2e�em¥ �,Vur`VuP�Tha�_xr`S�\FQJU�U#cdr`_b\uToa�_bQJr`e£5Q`P(T2P�P�TmwDe�a9ToapVuU#Vur2ape� -qdVFe�_bSJr�_�U+f`giVuU�Vur2a9Tha�_bQJr#T2r`q�VF�oTogxcdToa�_bQJr�¥ � � � �tlu�dW� xW�G2� 1�W�GL�dlWu�2��:%¥

8xWLW$<Xz,¥�´�^dT2�`cdPFl�[X¥�¼�^`Q`c`q'^dT2P]w2l�T2r`q ��¥(Î%Qov�¥°z�cdrLa�_xU#V²T2PpP�Tmw}PRVFqd_iepa9P]_xndc`a�_bQJr�_xr�^dfL£fdP]Q2SJPpT2U#eu¥��Àr � �$�����@)������ � ����� �"��(�����( ����$� ���� �� *@����� ��� ����( ���� �m¥(� � � ��l�|�TmwWu�2��:%¥

î(î�éNM'O 0 )�)

Page 33: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

�2� � ���� ���N( ������� ���# ����($���( #�� ��*@( �����@�$��*

��p� �~&(� � �.�p�"$

�������1 � �������Å_ie�T�Ot^�¥ýÁ ¼�T2r`qd_bq'Toa�V�T2r`q The�e�_be�a9T2rLa¬fdP]Qh£5VFe�epQJP�Toa�a-^`V � �D»L¨Ì½�w2Q`r�lÎ�P�T2r`\FV2¥'Í~VXP]VF\FVF_b�2VFq°^`_be,| ¥Ë»�¥JqdVFS`P]VFVFeN_xr¤¼QJU�fdc`a�VuP~»`\F_iVur`\FV�£�P]QJU � �D»L¨Ì½�w2Q`r�_xr�Wu�L��:'l^`V�Togbe�Q,P]VF\FVF_b�2VFq�T�|�¥ »�¥mqdVFSJP]VFVt_xr�|�Toa-^`VuU�Toa�_b\Fe�_�rªWu�L��Gd¥uÍ~_beòP]VFepVuT2P]\F^,_xr2apVuP]VFe�a�e�_xr`\Fgxc`qdVFefdT2PpTogbgiVFg(T2r`qªqd_be�a9P]_xndc`a�VFq�TogbS2QJPR_ia-^dU#eulJfdTLP�TogbgbVFg�Q`fdVuP�Toap_�r`S�e�w`e�a�VuU#eul`T2r`q�VurL�J_xP]Q`rdU#Vur2at£kQJPfdT2PpTogbgiVFg�fdP]Q2S`P�T2U�U#_xr`S'¥� � � -¤� ����� ��� �ª�&-�1����N����_beXTÅe�Vur`_bQJP#P]VFe�VuT2P]\F^�Toepe�QJ\F_xToapV�Qo£N¼ �Dz�»�Toa�a9^`V½(�ÀO _xr � �D»L¨Ì½�w2Q`r�l�Î�P�TLr`\FV2¥òÍ~V�P]VF\FVF_b�2VFq�^`_be¬ÍDT2n`_bgb_ia-Toa�_bQJrBqdVFSJP]VFVª£�P]QJU �Nr`_b�2VuPRe�_baÌwBQo£½�wLQJrª_xr°Wu�2��:'l2^`_beNOt^�¥ýÁ�¥2qdVFS`P]VFV,_xr�¼QJU�fdc`a�VuPt»`\F_bVur`\FV,£�P]QJU �Nr`_b�2VuPRe�_baÌwªQo£ ��PRVur`QJn`gbV,_�rWu� J2�d¥uÍ~V"«�ToeºT~P]VFe�VuT2P]\F^�The�e�Q`\F_�Tha�V¹Toa a9^`V �Dr`_b�2VuP]e�_ba]w,Qo£'´(Vurdr`VFe�epVFV2l2¾,r`QEv`�`_igbgbV£�P]QJU Wm�2�2�a�QªWu�2��:%¥EÍ~_be¹U�ThyÌQJP _xr2apVuP]VFe�a�eT2P]VNfdT2PpTogbgiVFg�T2r`q�qd_be�a9PR_�ndc`apVFq¡TogbS2Q`P]_ba9^dU#euloVurL�J_xP]Q`rdU#Vur2at£kQJPfdT2PpTogbgiVFg�fdP]Q2S`P�T2U�U#_xr`S'¥dÍ~V�_beNT�U�VuUXnµVuP�Qh£º_bVFVFV�TLr`q�To\uU�¥

ôöíºî�ôxõ

Page 34: Efficient Block Cyclic Data Redistribution · VuU#VFe qd_bea9P]_xndcts VFe O"PRQ2y]VFa{z{VF|}TLO z~T2fdfdQJPRa qdVXP]VF\F^`VuP]\F^`V r, ` 2 2 2 .Y JT2rL J_bVuP,Wu L 2 Y 2 fdThS2VFe

Unité de recherche INRIA Lorraine, Technopôle de Nancy-Brabois, Campus scientifique,615 rue du Jardin Botanique, BP 101, 54600 VILLERS LÈS NANCY

Unité de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES CedexUnité de recherche INRIA Rhône-Alpes, 46 avenue Félix Viallet, 38031 GRENOBLE Cedex 1

Unité de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY CedexUnité de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex

ÉditeurINRIA, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France)

ISSN 0249-6399