HFWXUH /RRS 2SWLPL]DWLRQVparallel.cs.jhu.edu/assets/slides/lec6.2.loopoptimizations.pdf · /hfwxuh...

Department of Computer Science, Johns Hopkins University

Lecture 6.2Loop Optimizations

EN 600.320/420

Instructor: Randal Burns

14 February 2018

Lecture 8: Concepts in Parallelism

How to Make Loops Faster

Make bigger to eliminate startup costs– Loop unrolling

– Loop fusion

Get more parallelism– Coalesce inner and outer loops

Improve memory access patterns– Access by row rather than column

– Tile loops

Use reductions

Loop Optimization (Fusion)

Merge loops to create larger tasks (amortize startup)

Loop Optimization (Fusion)

Merge loops to create larger tasks (amortize startup)

Loop Optimization (Coalesce)

Coalesce loops to get more UEs and thus more II-ism

Loop Optimization (Coalesce)

Coalesce loops to get more UEs and thus more II-ism

Loops that do little work have high startup costs

Loop Optimization (Unrolling)

for ( int i=0; i<N; i++ )

a[i] = b[i]+1;

c[i] = a[i]+a[i-1]+b[i-1];

Unroll loops (by hand) to reduce

– Some compiler support for this

Loop Optimization (Unrolling)

for ( int i=0; i<N; i+=2 )

a[i] = b[i]+1;

c[i] = a[i]+a[i-1]+b[i-1];

a[i+1] = b[i+1]+1;

c[i+1] = a[i+1]+a[i]+b[i];

Memory Access Patterns

Reason about how loops iterate over memory– Prefer sequential over random access (7x speedup here)

Row v. column is the classic case

http://www.akira.ruc.dk/~keld/teaching/IPDC_f10/Slides/pdf4x/4_Performance.4x.pdf

Memory Access Patterns

Reason about how loops iterate over memory– Prefer sequential over random access (7x speedup here)

Row v. column is the classic case

http://www.akira.ruc.dk/~keld/teaching/IPDC_f10/Slides/pdf4x/4_Performance.4x.pdf

cache line

Tiling localizes memory twice– In cache lines for read (sequential)

– Into cache regions for writes (TLB hits)

Loop Tiling

Tiling localizes memory twice– In cache lines for write (sequential)

– Into cache regions for writes (TLB hits)

Loop Tiling

OpenMP Reductions

Variable sharing when computing aggregates leads to poor performance

#pragma omp parallel for shared(max_val)

for( i=0;i<10; i++)

#pragma omp critical

if(arr[i] > max_val){

max_val = arr[i];

OpenMP Reductions

Reductions are private variables (not shared)– Allocated by OpenMP

Updated by function (max) on exit for each chunk– Safe to write from different threads

Eliminates interference in parallel loop

#pragma omp parallel for reduction(max : max_val)

for( i=0;i<10; i++)

if(arr[i] > max_val){

max_val = arr[i];

HFWXUH /RRS 2SWLPL]DWLRQVparallel.cs.jhu.edu/assets/slides/lec6.2.loopoptimizations.pdf · /hfwxuh...

Documents

6&+22/63,5,7 7$67(65($7 - Amazon Web Services · 6&+22/63,5,7 7$67(65($7 0dnh glqqhu d vhoiohvv dfw e\ mrlqlqj xv iru d ixqgudlvhu wr vxssruw %h\h 6fkrro 372 &rph lq wr wkh &klsrwoh

Introduction wr

A0L VWP (5) · WR-A5.1 (ma) W W h 160 WR-A3 W W W M WR-A4.2 BO A AW-L WR-A3 AW-L W AW-L WR-A5.1 BO WR-A3 WR-A6 V B WR-A5.1 BO G-LE W B-NV WR-A4.1 WR-A3 WR-A6 BO WR-A5.1 ... AW-L CO

Wr Guidance

National Football Leagueprod.static.bills.clubs.nfl.com/assets/pdf/pro-bowl...PRO BOWL SELECTIONS (ALPHABETICAL) of selections (1960.2011) 1988 Season(s) 161 WR WR TEIT WR WR WR WR

Reply Un-Starred Q.N. 367 · gr %kdool wr %kdq\dul urdg np wr gr *dudq %dvwl wr +dul]dq %dvwl np wr gr 071 urdg wr yloodjh 6rkdu np wr 7rwdo -dzdol 'lylvlrq

PRODUCT CATALOG - F.I.OR · g 0,22 g 0,34 g 0,52 g 0,75 g 1,14 g 1,82 g 2,74 g 3,70 LC00 WR 7,0 mm LC0 WR 8,4 mm LC1 WR 10,1 mm LC2 WR 11,7 mm LC3 WR 13,6 mm LC4 WR 16,1 mm LC5 WR

WR SOFT LAVAPRINT EVO Eng .qxp WR

HFWXUH 62$ LQ D QXWVKHOOusers.jyu.fi/~olkhriye/ties4560/lectures/Lecture01.pdf81,9(56,7< 2) -

WR Installation1

OODEXV /HFWXUH ZLVH %UHDNXS · -,,7 8qlyhuvlw\ 1rlgd 'hwdlohg v\oodexv /hfwxuh zlvh %uhdnxs 6xemhfw &rgh % 1+6 6hphvwhu 2'' 6hphvwhu ,,, 6hvvlrq 0rqwkv iurp -xo\ wr 'hfhpehu

4D Doc Center : 4D Write - Langage · 4D Write - Langage Introduction WR Contrôle de la zone WR Documents WR Feuille de style WR Gestion de texte WR Gestion des images WR Glisser

Installation Manual - richelieu.com · Installation Manual ... 7R ,QVWDOOHU 0DNH VXUH WR SDVV WKLV LQWHJUDWHG 3UHFDXWLRQV IRU 8VH WR WKH FXVWRPHU DIWHU LQVWDOODWLRQ 3 ... （600 …

03 CV ImageAcquisition 0702 - users.ics.forth.grusers.ics.forth.gr/~argyros/cs472_spring20/03_CV_ImageAcquisition… · /hfwxuh /ljkw froru gljlwl]dwlrq &6 &rpsxwhu 9lvlrq :kdw lv

Wr Supplement

National Football League · Julian Edelman, Troy Brown, WR Stanley Morgan, Rob Gronkowski, Ben Coates, TE Kevin Faulk, RB Irving Fryar, WR Terry Glenn, WR WR WR Years 2007-12 2009-19

Water Leakage Detecting SystemWR-10A WR-10A9 WR-20A WR-20A9 WR-0A WR-0A9 WR-0A WR-0A9 ª5_Aª55 AC100V00はAC200V00] dª5. yÈ0W'Ã&t yÈ100W'Ã&t yÈ10W'Ã&t yÈ10W'Ã&t 検知 10

HFWXUH ZLVH %UHDNXS

LJLWDO ,QWHJUDWHG &LUFXLWV ,, /HFWXUH :LUHVsist.shanghaitech.edu.cn/faculty/zhoupq/Teaching/Spr17/Lectures... · ,qwho qp 6wdfn >7krpsvrq @ >0rrq @,qwho qp 6wdfn ... 'hod\ sv s-

WR-WR USA 250/2010 WR-WR USA 300/2010 WR 250 · 2017. 11. 16. · N° Catalogo - Catalog No. - Nr. Catalogue - Katalog Nr. - N° Catalogo : 8000H1606 Pagina emessa Ottobre’09 -