Seminar on Parallel and Concurrent Programming

Preview:

Citation preview

Stefan Marr, Daniele Bonetta2016

Seminar onParallel and Concurrent Programming

2

Agenda

1. Modus Operandi

2. Introduction toConcurrent Programming Models

3. Seminar Paper Overview

3

MODUS OPERANDI

4

Tasks and Deadlines

• Talk on selected paper (student 1)– 30min with slides (+ 15min discussion)• to be discussed with us 1 week before

– Summary (max. 500 word)• 2 days before seminar, 11:59am

• Questions on assigned paper (student 2)– Min. 5 questions– 2 days before seminar, 11:59am

Summaries

Will Be Online

Before Talk

5

ReportCategory 1: Theoretical treatment• Focus on paper, related work, state of the art

of the field• Detailed discussionCategory 2: Practical treatment of topic, for instance• Reproduce experiments/results• Extend experiments• Experiment with variations

Reports and Slides to

be archived online

6

Report• paper summary (500 words)• outline, content, and experiments to be

discussed with us• Cat. 1: ca. 4000 word (excl. references)– state of the art, context in field, and specific

technique from paper• Cat. 2: ca. 2000 word (excl. references)– Discuss experiments, gained insights, found

limitations, etc.

Deadline: Feb. 6th

7

Consultations

• For alternative paper proposals

• To prepare presentation!

• To agree on focus of report/experiments– For experiments mandatory

Technically

optional, but…

8

Grading

• Required attendance: 80% of all meetings

• 50% slides, presentation, and discussion• 50% write-up/experiments

9

Timeline

Oct. 5th Introduction to ConcurrentProgramming Models

Oct. 10th Deadline: List of ranked papersOct. 12th Runtime Techniques for Big Data

and ParallelismWeek 3-5 Preparations and ConsultationsWeek 6-12 PresentationsFeb. 6th Deadline for Report

Depends on

#Students

10

Got Background inConcurrency/Parallelism?

Show of Hands!

Multicore is the Norm

8 Cores200 Euro Phones

24 CoresWorkstation

>=72 CoresEmbedded System

Problem: Power Wall at ca. 5 GHz

CPUs don’t get Faster But Multiply

1990 1995 2000 2005 2010 20150

1

2

3

4

0.2

1.5

3.83.33

3.8

4, 6, 12, … cores

GHz

1 core

Based on the Clock Frequency of Intel Processors

Power ≈ Voltage2 Frequency

Core

Cache Cache

Core Core

Voltage = -15%Frequency = -15%

Power = 1Performance ≈ 1.8

Problem: Memory Wall

Memory Wall

1980 1985 1990 1995 2000 20051

10

100

1000

10000CPU FrequencyDRAM Speeds

CPU -- 2x Every 2 Years

DRAM -- 2x Every 6 Years

Relative Performance

Gap

Source: Sun World Wide Analyst Conference Feb. 25, 2003

Multicore Transition

Work around physical limitationsPower Wall and Memory Wall

05/03/2023 17

Concurrency & Parallelism

70 Years of Problem SolvingFor a brief bit of history:ENIAC’s recessive geneMarcus Mitch, and Akera Atsushi. Penn Printout (March 1996)http://www.upenn.edu/computing/printout/archive/v12/4/pdf/gene.pdf

ENIAC's main control panel, U. S. Army Photo

Decades of Researchand Solutions for Everything

05/03/2023 19

20

But no Silver Bullet

CSPLocks, Monitors, …Fork/JoinTransactional MemoryData Flow

Actors

21

A Rough Categorization

CommunicatingIsolates

Threads and Locks CoordinatingThreads

22

A Rough Categorization

Marr, S. (2013), 'Supporting Concurrency Abstractions in High-level Language Virtual Machines', PhD thesis, Software Languages Lab, Vrije Universiteit Brussel.

Data Parallelism

23

THREADS AND LOCKSPowerful but hard

24

Uniform Shared Memory

A Modelfor the Machines We Used to Have

C/C++

25

Threads

• Sequences of instructions

• Unit of scheduling– Preemptive and concurrent

– Or parallel

time

A Snake Game

• Multiple players

• Compete for ‘apples’

• Shared board

05/03/2023 26

27

Race Conditions and Data Races

Race Condition• Result depending on

timing of operations

Data Race• Race condition on

memory• Synchronization

absent or incomplete

28

Locks

synchronized (board) { board.moveLeft(snake)}

Single Lock is Simple

29

Optimized Locking for more Parallelism

synchronized (board[3][3]) { synchronized (board[3][2]) { board.moveLeft(snake) }}

Strategy: Lock only cells you need to update

What could go wrong?

30

Common Issues

• Lack of Progress– Deadlock– Livelock

• Race Condition– Data race– Atomicity violation

• Performance– Sequential bottle necks– False sharing

31

Basic ConceptsShared Memory with Threads and Locks

• Threads• Synchronization

• No safety guarantees– Data Races– Deadlocks

P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al.P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. TrigonakisP2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. Bond et al.P2.10 Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al.

Questions?

32

COORDINATING THREADSMaking Coordination Explicit

CommunicatingThreads

Shared Memory withExplicit Coordination

Raising the Abstraction Level

Libraries formost languages

34

Two Main Variants

Temporal IsolationTransactional Memory

Explicit CommunicationChannel or Message-based

35

Transactional Memory

atomic { board.moveLeft(snake)}

Coordinated byRuntime System

36

Transactional Memory

Simple Programing Model

• No Data Races(within transactions)

• No Deadlocks

Issues• Performance overhead• Still experimental

• Livelocks• Inter-transactional

race conditions• I/O semantics

37

Some Issues

atomic { dataArray = getData(); fork { compute(dataArray[0]); } compute(dataArray[1]);}

P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al.P1.1 Transactional Data Structure Libraries, A. Spiegelman et al.P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al.

What happens with forked thread when transaction aborts?

38

Channel-based Communication

coordChannel ! (#moveLeft, snake)

for i in players(): msg ? coordChannels[i] match msg: (#moveLeft, snake): board[…,…] = …

Player Thread

Coordinator Thread Coordinator Thread

Player Thread Player Thread

send

receive

High-level communicationbut no safety guarantees

39

Coordinating Threads

Transactional Memory• Transactions• Simple Programming Model

• Practical Issues

Channel/Message Communication

• Explicit coordination– Channels or message sending– Higher abstraction level

• No safety guarantees

P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S. Tasharofi et al.

P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc-model, AMP'10)

Questions?

40

COMMUNICATING ISOLATESCommunication is Everything

41

Explicit Communication Only

Absence of Low-level Data Races

42

All Interactions Explicit

Actor A Actor B

Actor Principle

43

Many Many Variations

• Channel based– Communicating Sequential Processes

• Message based– Actor models

P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.

44

Communicating Event Loops

Actor A Actor B

One Message at a Time

45

Communicating Event Loops

Actor A Actor B

Actors Contain Objects

46

Communicating Event Loops

Actor A Actor B

Interacting via Messages

47

Message-based Communication

Player 1

Player 1

Board Actor

board <- moveLeft(snake)

class Board { private array; public moveLeft(snake) { array[snake.x][snake.y] = ... }}

Player Actor

Board Actor

async send

actors.create(Board)actors.create(Snake)actors.create(Snake)

Main Program

48

Communicating Isolates

Message or Channel Based• Explicit communication• No shared memory

• Still potential for– Behavioral deadlocks– Livelocks– Bad message inter-leavings– Message protocol violations

P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.

P1.11 Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)

Questions?

49

DATA PARALLELISMParallelism for Structured Problems

DATA PARALLELISM WITH FORK/JOINJust one Example

50

Fork/Join with Work-Stealing

• Recursivedivide-and-conquer

• Automatic and efficient parallel scheduling

• Widely available for C++, Java, and .NET

05/03/2023 51

Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H. & Zhou, Y. (1995), 'Cilk: An Efficient Multithreaded Runtime System', SIGPLAN Not. 30 (8), 207-216.

Typical Applications

• Recursive Algorithms1

– Mergesort– List and tree traversals

• Parallel prefix, pack, and sorting problems2

• Irregular and unbalanced computation– On directed acyclic graphs (DAGs) – Ideally tree-shaped

52

1) More material can be found at: http://homes.cs.washington.edu/~djg/teachingMaterials/spac/2) Prefix Sums and Their Applications: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf

3 1 7 2

Tiny Example: Summing a large Array

• Simple array with numbers

• Recursively divide– Every ‘ ’ is a parallel fork

• Then do addition– Every ‘ ’ is a join

53

Note: This example is academic, and could be better expressed with a parallel map/reduce library, such as Scala’s Parallel Collections, Java 8 Streams, or Microsoft’s PLINQ.

46 9 42 7 55

45724965

4965

5 6

11+ 49

13+

24+

4572

72 45

9 9

18

+++

42++

Data Parallelism with Fork/Join

• Parallel programming technique

• Recursive divide-and-conquer

• Automatic and efficient load-balancing

58P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)

59

CONCLUSION CONCURRENCY MODELS

60

Four Rough Categories

CommunicatingIsolatesThreads and Locks

CoordinatingThreads

Data ParallelismQuestions?

61

SEMINAR PAPERS

62

These are Suggestions

Please, feel free topropose papers of your interest.

(Papers need to be approved by us)

63

Topics of Interest

• High-level language concurrency models– Actors, Communicating

Sequential Processes, STM, Stream Processing, ...

• Tooling– Debugging– Profiling

• Implementation and runtime systems– Communication

mechanisms– Data/object

representation– System-level aspects

• Big Data Frameworks– Programming models– Runtime level problems

64

Papers without ArtifactsP1.1 Transactional Data Structure Libraries, A. Spiegelman et al. (conc-

model, PLDI'16)P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman

et al. (conc-model, runtime, EuroSys'16)P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key

Properties, J. De Koster et al. (conc-model, Agere'16)P1.4 Why Do Scala Developers Mix the Actor Model with other

Concurrency Models?, S. Tasharofi et al. (conc-model, ECOOP'13)

P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)P1.6 The Asynchronous Partitioned Global Address Space Model, V.

Saraswat et al. (conc-model, AMP'10)

65

Papers without Artifacts

P1.7 Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud, S. C. Müller et al. (conc-model, runtime, OSDI'15)

P1.8 Fast Splittable Pseudorandom Number Generators, G. L. Steele et al. (runtime, OOPSLA'14)

P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al. (runtime, EuroSys'15)

P1.10Application-Assisted Live Migration of Virtual Machines with Java Applications, K.-Y. Hou et al. (runtime, EuroSys'15)

P1.11Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)

66

Papers with Artifacts

P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis (conc-model, PPoPP'16)

P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al. (conc-model, ECOOP'16)

P2.3 StreamJIT: a commensal compiler for high-performance stream programming, J. Bosboom et al. (conc-model, runtime, OOPSLA'14)

P2.4 An Efficient Synchronization Mechanism for Multi-core Systems, M. Aldinucci et al. (conc-model, runtime, EuroPar'12)

P2.5 Parallel parsing made practical, A. Barenghi et al. (runtime, SCP'15)

67

Papers with Artifacts

P2.6 SparkR : Scaling R Program with Spark, S. Venkataraman et al. (conc-model, bigdata, SIGMOD'16)

P2.7 SparkSQL: Relational Data Processing in Spark, M. Armbrust et al. (bigdata, runtime, VLDB'14)

P2.8 Twitter Heron: Stream Processing at Scale, S. Kulkarni et al. (bigdata, SIGMOD'15)

P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. D. Bond et al. (tooling, OOPSLA'13)

P2.10Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al. (runtime, OOPSLA'16)

Recommended