Parallel Programming - SPCL · Expressing Parallelism Work partitioning –Split up work of a...

Parallel ProgrammingBasic Concepts in Parallelism

Expressing Parallelism

● Work partitioning– Split up work of a single program into parallel tasks

● Can be done:– Explicitly / Manually (task/thread parallelism)

– User explicitly expresses tasks/threads

– Implicit parallelism: – Done automatically by the system (e.g., in data parallelism)– User expresses an operation and the system does the rest

Work Partitioning & Scheduling

● work partitioning– split up work into parallel tasks/threads– (done by user)– A task is a unit of work– also called: task/thread decomposition

● scheduling– assign tasks to processors– (typically done by the system)– goal: full utilization(no processor is ever idle)

Processors

scheduling

work partitioning

# of chunks should be largerthan the # of processors

Task/Thread Granularity

work work

Coarse granularity Fine granularity4

Coarse vs Fine granularity

● Fine granularity:– more portable(can be executed in machines with more processors)– better for scheduling– but: if scheduling overhead is comparable to a single task → overhead

dominates

Task granularity guidelines

● As small as possible● but, significantly bigger than scheduling overhead

– system designers strive to make overheads small

Scalability

An overloaded concept: e.g., how well a system reacts to increased load, for example, clients in a server

In parallel programming:– speedup when we increase processors– what will happen if processors → – a program scales linearly → linear speedup

Parallel Performance

Sequential execution time: T1

Execution time Tp on p CPUs– Tp = T1 / p (perfection)– Tp > T1 / p (performance loss, what normally happens)– Tp < T1 / p (sorcery!)

(parallel) Speedup

(parallel) speedup Sp on p CPUs:

Sp = T1 / Tp● Sp = p → linear speedup (perfection)● Sp < p → sub-linear speedup (performance loss)● Sp > p → super-linear speedup (sorcery!)

● Efficiency: Sp / p

Absolute versus Relative Speed-up

Relative speedup (Efficiency): relative improvement from using P execution units. (Baseline: serialization of the parallel algorithm).

Sometimes there is a better serial algorithm that does not parallelize well. In these cases it is fairer to use that algorithm for T1 (absolute speedup). Using an unnecessarily poor baseline artificially inflates speedup and efficiency.

(parallel) speedup graph example

why Sp < p?

● Programs may not contain enough parallelism– e.g., some parts of program might be sequential

● overheads introduced by parallelization– typically associated with synchronization

● architectural limitations– e.g., memory contention

Question:

Parallel program:– sequential part: 20%– parallel part: 80% (assume it scales linearly)– T1 = 10

What is T8 ? What is the speedup S8 ?

SequentialpartParallel part

Answer:

● 𝑇! = 10

● 𝑇" = 3

● 𝑆" = 𝑇!/𝑇" = 10/3 = 3.33

SequentialpartParallel part

80% 20%

Amdahl’s Law

…the effort expended on achieving high parallel processing rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude.

— Gene Amdahl

Amdahl’s Law – Ingredients

Execution time T1 of a program falls into two categories:

• Time spent doing non-parallelizable serial work• Time spent doing parallelizable work

Call these 𝑊#$% and 𝑊𝑝𝑎𝑟 respectively

Amdahl’s Law – Ingredients

Given 𝑃 workers available to do parallelizable work, the times for sequential execution and parallel execution are:

𝑇! = 𝑊#$% +𝑊&'%

And this gives a bound on speed-up:

𝑇& ≥ 𝑊#$% +𝑊&'%𝑃

Amdahl’s Law

Plugging these relations into the definition of speedup yields Amdahl's Law:

𝑆& ≤𝑊#$% +𝑊&'%

𝑊#$% +𝑊&'%𝑃

Amdahl’s Law - Corollary

If f is the non-parallelizable serial fractions of the total work, thenthe following equalities hold:

𝑊#$% = 𝒇𝑇!,𝑊&'% = 1 − 𝒇 𝑇!

which gives:

𝑆& ≤1

𝒇 + 1 − 𝒇𝑃

𝑆! ≤𝑊"#$ +𝑊!%$

𝑊"#$ +𝑊!%$𝑃

What happens if we have infinite workers?

𝑆( ≤1𝑓

Amdahl’s Law Illustrated

Speedup

Efficiency

Remarks about Amdahl's Law● It concerns maximum speedup (Amdahl was a pessimist!)

– architectural constraints will make factors worse• But his law is mostly bad news (as it puts a limit on scalability)

● takeaway: all non-parallel parts of a program (no matter how small) can cause problems

• Amdahl’s law shows that efforts required to further reduce the fraction of the code that is sequential may pay off in large performance gains.

• Hardware that achieves even a small decrease in the percent of things executed sequentially may be considerably more efficient

Gustafson's Law

● An alternative (optimistic) view to Amdahl's Law

Observations:● consider problem size● run-time, not problem size, is constant● more processors allows to solve larger problems in the same time● parallel part of a program scales with the problem size

Gustafson’s Law

Gustafson's Law

● 𝑓: sequential part (no speedup)

𝑊 = 𝑝 1 − 𝑓 𝑇!"## + 𝑓𝑇!"##

http://link.springer.com/referenceworkentry/10.1007%2F978-0-387-09766-4_78

𝑆$ = 𝑓 + 𝑝 1 − 𝑓= 𝑝 − 𝑓(𝑝 − 1)

Amdahl's vs Gustafson's Law

p=4 p=4

Amdahl's Law Gustafson's Law

Summary

● Parallel speedup

● Amdahl's and Gustafson's law

● Parallelism: task/thread granularity

Parallel Programming - SPCL · Expressing Parallelism Work partitioning –Split up work of a...

Documents

SPCL Additional Foreclosure Full Text

ABSTRACTSABSTRACTS SCL/SPCL/ACBLPE CONFERENCE 2014 …pure.au.dk/portal/files/79695491/SCL2014_Abstracts_web.pdf · SCL/SPCL/ACBLPE CONFERENCE 2014 ... Características gramaticais

EyeDROID: Gaze tracking component for Android SPCL-Autumn …tped/teaching/pervasive/SPCL-E2014/draft01handins/02... · EyeDROID: Gaze tracking component for Android SPCL-Autumn 2014

Parallel Computingparallelcomp.github.io/Lecture1.pdf · Parallel Processing, ... parallel computers work Mechanisms used to implement abstractions efficiently ... •Understanding

STRAIGHT RUNNING STEEL CHAINS - SAEHAN …SPCL 815 K33010081L 83.8 3 19⁄64 2.60 SPSL 815 K32510085L 0.2 82.5 3 ¼ 2.55 SPSL 815 K33010086L 83.8 3 19⁄64 2.60 SPCL 815 K325 HB 10090L

Parallel Programming - SPCL · 2020. 3. 22. · Parallel Programming ForkJoin Framework & Task Parallel Algorithms. This Lecture: Java’s ForkJoin Framework We need a new framework

SPCL: Structured Policy Command Language › ~locasto › projects › spcl › docs › ... · The Structured Policy Command Language (SPCL) is designed to address a number ... an

@spcl eth Data-Centric Parallel Programming...Preprint (arXiv): Ben-Nun, de Fine Licht, Ziogas, TH: Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel

@spcl eth T HOEFLER Performance Reproducibility an ...htor.inf.ethz.ch/publications/img/hoefler-reproducibility-sos.pdf · Rule 1: When publishing parallel speedup, report if the

India Supplement of TRAVTALKtravtalkindia.com/pdf/spcl-edition/ITA19.pdfkeep doing this good work. Your encouragement makes us strive to do better in life. Thank you DDP Publications

SPCL Offer 9(New)_03.12.09

Work Stealing Strategies For Multi-Core Parallel Branch

Spcl - Divina eBook

Resilience through Innovative Risk Governance Parallel ... · Resilience through Innovative Risk Governance . Parallel ... mitigation work especially when wider ... • Seismic records

@spcl eth Data-Centric Parallel Programming · Let’s go a step further towards an explicitly data-centric viewpoint For performance engineers at least! 4 “Sophisticated software”:

Parallel Text-work in progress

Resource Scheduling Spcl

Compiling Parallel Programs into Circuits · Key Points • This is early stage work on compiling parallel C# and F# programs into parallel hardware. • Important because future

Parallel Programming - SPCL€¦ · Parallel and Concurrent vs. Distributed (Preview) Common assumption for parallel and concurrent: • one system Distributed computing: • Physical

Bullet Proof Enterprises DBA: The Gun Cage Fenton Mi ...Jan 28, 2019 · Taurus 85 Revolver (38 Spcl +P.) Conditon is New Your Price, $295.00 Cal .38 Spcl. +P. Bullet Proof Enterprises