Upload
stefan-marr
View
1.664
Download
0
Embed Size (px)
Citation preview
Stefan Marr, Daniele Bonetta2016
Seminar onParallel and Concurrent Programming
2
Agenda
1. Modus Operandi
2. Introduction toConcurrent Programming Models
3. Seminar Paper Overview
3
MODUS OPERANDI
4
Tasks and Deadlines
• Talk on selected paper (student 1)– 30min with slides (+ 15min discussion)• to be discussed with us 1 week before
– Summary (max. 500 word)• 2 days before seminar, 11:59am
• Questions on assigned paper (student 2)– Min. 5 questions– 2 days before seminar, 11:59am
Summaries
Will Be Online
Before Talk
5
ReportCategory 1: Theoretical treatment• Focus on paper, related work, state of the art
of the field• Detailed discussionCategory 2: Practical treatment of topic, for instance• Reproduce experiments/results• Extend experiments• Experiment with variations
Reports and Slides to
be archived online
6
Report• paper summary (500 words)• outline, content, and experiments to be
discussed with us• Cat. 1: ca. 4000 word (excl. references)– state of the art, context in field, and specific
technique from paper• Cat. 2: ca. 2000 word (excl. references)– Discuss experiments, gained insights, found
limitations, etc.
Deadline: Feb. 6th
7
Consultations
• For alternative paper proposals
• To prepare presentation!
• To agree on focus of report/experiments– For experiments mandatory
Technically
optional, but…
8
Grading
• Required attendance: 80% of all meetings
• 50% slides, presentation, and discussion• 50% write-up/experiments
9
Timeline
Oct. 5th Introduction to ConcurrentProgramming Models
Oct. 10th Deadline: List of ranked papersOct. 12th Runtime Techniques for Big Data
and ParallelismWeek 3-5 Preparations and ConsultationsWeek 6-12 PresentationsFeb. 6th Deadline for Report
Depends on
#Students
10
Got Background inConcurrency/Parallelism?
Show of Hands!
Multicore is the Norm
8 Cores200 Euro Phones
24 CoresWorkstation
>=72 CoresEmbedded System
Problem: Power Wall at ca. 5 GHz
CPUs don’t get Faster But Multiply
1990 1995 2000 2005 2010 20150
1
2
3
4
0.2
1.5
3.83.33
3.8
4, 6, 12, … cores
GHz
1 core
Based on the Clock Frequency of Intel Processors
Power ≈ Voltage2 Frequency
Core
Cache Cache
Core Core
Voltage = -15%Frequency = -15%
Power = 1Performance ≈ 1.8
Problem: Memory Wall
Memory Wall
1980 1985 1990 1995 2000 20051
10
100
1000
10000CPU FrequencyDRAM Speeds
CPU -- 2x Every 2 Years
DRAM -- 2x Every 6 Years
Relative Performance
Gap
Source: Sun World Wide Analyst Conference Feb. 25, 2003
Multicore Transition
Work around physical limitationsPower Wall and Memory Wall
05/03/2023 17
Concurrency & Parallelism
70 Years of Problem SolvingFor a brief bit of history:ENIAC’s recessive geneMarcus Mitch, and Akera Atsushi. Penn Printout (March 1996)http://www.upenn.edu/computing/printout/archive/v12/4/pdf/gene.pdf
ENIAC's main control panel, U. S. Army Photo
Decades of Researchand Solutions for Everything
05/03/2023 19
20
…
But no Silver Bullet
CSPLocks, Monitors, …Fork/JoinTransactional MemoryData Flow
Actors
21
A Rough Categorization
CommunicatingIsolates
Threads and Locks CoordinatingThreads
22
A Rough Categorization
Marr, S. (2013), 'Supporting Concurrency Abstractions in High-level Language Virtual Machines', PhD thesis, Software Languages Lab, Vrije Universiteit Brussel.
Data Parallelism
23
THREADS AND LOCKSPowerful but hard
24
Uniform Shared Memory
A Modelfor the Machines We Used to Have
C/C++
25
Threads
• Sequences of instructions
• Unit of scheduling– Preemptive and concurrent
– Or parallel
time
A Snake Game
• Multiple players
• Compete for ‘apples’
• Shared board
05/03/2023 26
27
Race Conditions and Data Races
Race Condition• Result depending on
timing of operations
Data Race• Race condition on
memory• Synchronization
absent or incomplete
28
Locks
synchronized (board) { board.moveLeft(snake)}
Single Lock is Simple
29
Optimized Locking for more Parallelism
synchronized (board[3][3]) { synchronized (board[3][2]) { board.moveLeft(snake) }}
Strategy: Lock only cells you need to update
What could go wrong?
30
Common Issues
• Lack of Progress– Deadlock– Livelock
• Race Condition– Data race– Atomicity violation
• Performance– Sequential bottle necks– False sharing
31
Basic ConceptsShared Memory with Threads and Locks
• Threads• Synchronization
• No safety guarantees– Data Races– Deadlocks
P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al.P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. TrigonakisP2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. Bond et al.P2.10 Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al.
Questions?
32
COORDINATING THREADSMaking Coordination Explicit
CommunicatingThreads
Shared Memory withExplicit Coordination
Raising the Abstraction Level
Libraries formost languages
34
Two Main Variants
Temporal IsolationTransactional Memory
Explicit CommunicationChannel or Message-based
35
Transactional Memory
atomic { board.moveLeft(snake)}
Coordinated byRuntime System
36
Transactional Memory
Simple Programing Model
• No Data Races(within transactions)
• No Deadlocks
Issues• Performance overhead• Still experimental
• Livelocks• Inter-transactional
race conditions• I/O semantics
37
Some Issues
atomic { dataArray = getData(); fork { compute(dataArray[0]); } compute(dataArray[1]);}
P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al.P1.1 Transactional Data Structure Libraries, A. Spiegelman et al.P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman et al.
What happens with forked thread when transaction aborts?
38
Channel-based Communication
coordChannel ! (#moveLeft, snake)
for i in players(): msg ? coordChannels[i] match msg: (#moveLeft, snake): board[…,…] = …
Player Thread
Coordinator Thread Coordinator Thread
Player Thread Player Thread
send
receive
High-level communicationbut no safety guarantees
39
Coordinating Threads
Transactional Memory• Transactions• Simple Programming Model
• Practical Issues
Channel/Message Communication
• Explicit coordination– Channels or message sending– Higher abstraction level
• No safety guarantees
P1.4 Why Do Scala Developers Mix the Actor Model with other Concurrency Models?, S. Tasharofi et al.
P1.6 The Asynchronous Partitioned Global Address Space Model, V. Saraswat et al. (conc-model, AMP'10)
Questions?
40
COMMUNICATING ISOLATESCommunication is Everything
41
Explicit Communication Only
Absence of Low-level Data Races
42
All Interactions Explicit
Actor A Actor B
Actor Principle
43
Many Many Variations
• Channel based– Communicating Sequential Processes
• Message based– Actor models
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.
44
Communicating Event Loops
Actor A Actor B
One Message at a Time
45
Communicating Event Loops
Actor A Actor B
Actors Contain Objects
46
Communicating Event Loops
Actor A Actor B
Interacting via Messages
47
Message-based Communication
Player 1
Player 1
Board Actor
board <- moveLeft(snake)
class Board { private array; public moveLeft(snake) { array[snake.x][snake.y] = ... }}
Player Actor
Board Actor
async send
actors.create(Board)actors.create(Snake)actors.create(Snake)
Main Program
48
Communicating Isolates
Message or Channel Based• Explicit communication• No shared memory
• Still potential for– Behavioral deadlocks– Livelocks– Bad message inter-leavings– Message protocol violations
P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key Properties, J. De Koster et al.
P1.11 Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)
Questions?
49
DATA PARALLELISMParallelism for Structured Problems
DATA PARALLELISM WITH FORK/JOINJust one Example
50
Fork/Join with Work-Stealing
• Recursivedivide-and-conquer
• Automatic and efficient parallel scheduling
• Widely available for C++, Java, and .NET
05/03/2023 51
Blumofe, R. D.; Joerg, C. F.; Kuszmaul, B. C.; Leiserson, C. E.; Randall, K. H. & Zhou, Y. (1995), 'Cilk: An Efficient Multithreaded Runtime System', SIGPLAN Not. 30 (8), 207-216.
Typical Applications
• Recursive Algorithms1
– Mergesort– List and tree traversals
• Parallel prefix, pack, and sorting problems2
• Irregular and unbalanced computation– On directed acyclic graphs (DAGs) – Ideally tree-shaped
52
1) More material can be found at: http://homes.cs.washington.edu/~djg/teachingMaterials/spac/2) Prefix Sums and Their Applications: http://www.cs.cmu.edu/~guyb/papers/Ble93.pdf
3 1 7 2
Tiny Example: Summing a large Array
• Simple array with numbers
• Recursively divide– Every ‘ ’ is a parallel fork
• Then do addition– Every ‘ ’ is a join
53
Note: This example is academic, and could be better expressed with a parallel map/reduce library, such as Scala’s Parallel Collections, Java 8 Streams, or Microsoft’s PLINQ.
46 9 42 7 55
45724965
4965
5 6
11+ 49
13+
24+
4572
72 45
9 9
18
+++
42++
Data Parallelism with Fork/Join
• Parallel programming technique
• Recursive divide-and-conquer
• Automatic and efficient load-balancing
58P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)
59
CONCLUSION CONCURRENCY MODELS
60
Four Rough Categories
CommunicatingIsolatesThreads and Locks
CoordinatingThreads
Data ParallelismQuestions?
61
SEMINAR PAPERS
62
These are Suggestions
Please, feel free topropose papers of your interest.
(Papers need to be approved by us)
63
Topics of Interest
• High-level language concurrency models– Actors, Communicating
Sequential Processes, STM, Stream Processing, ...
• Tooling– Debugging– Profiling
• Implementation and runtime systems– Communication
mechanisms– Data/object
representation– System-level aspects
• Big Data Frameworks– Programming models– Runtime level problems
64
Papers without ArtifactsP1.1 Transactional Data Structure Libraries, A. Spiegelman et al. (conc-
model, PLDI'16)P1.2 Type-Aware Transactions for Faster Concurrent Code, N. Herman
et al. (conc-model, runtime, EuroSys'16)P1.3 43 Years of Actors: a Taxonomy of Actor Models and Their Key
Properties, J. De Koster et al. (conc-model, Agere'16)P1.4 Why Do Scala Developers Mix the Actor Model with other
Concurrency Models?, S. Tasharofi et al. (conc-model, ECOOP'13)
P1.5 A Java Fork/Join Framework, D. Lea (conc-model, runtime, Java'00)P1.6 The Asynchronous Partitioned Global Address Space Model, V.
Saraswat et al. (conc-model, AMP'10)
65
Papers without Artifacts
P1.7 Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud, S. C. Müller et al. (conc-model, runtime, OSDI'15)
P1.8 Fast Splittable Pseudorandom Number Generators, G. L. Steele et al. (runtime, OOPSLA'14)
P1.9 The Linux Scheduler: A Decade of Wasted Cores, J.-P. Lozi et al. (runtime, EuroSys'15)
P1.10Application-Assisted Live Migration of Virtual Machines with Java Applications, K.-Y. Hou et al. (runtime, EuroSys'15)
P1.11Distributed Debugging for Mobile Networks, E. Gonzalez Boix et al. (tooling, JSS'14)
66
Papers with Artifacts
P2.1 Optimistic Concurrency with OPTIK, R. Guerraoui, V. Trigonakis (conc-model, PPoPP'16)
P2.2 Transactional Tasks: Parallelism in Software Transactions, J. Swalens et al. (conc-model, ECOOP'16)
P2.3 StreamJIT: a commensal compiler for high-performance stream programming, J. Bosboom et al. (conc-model, runtime, OOPSLA'14)
P2.4 An Efficient Synchronization Mechanism for Multi-core Systems, M. Aldinucci et al. (conc-model, runtime, EuroPar'12)
P2.5 Parallel parsing made practical, A. Barenghi et al. (runtime, SCP'15)
67
Papers with Artifacts
P2.6 SparkR : Scaling R Program with Spark, S. Venkataraman et al. (conc-model, bigdata, SIGMOD'16)
P2.7 SparkSQL: Relational Data Processing in Spark, M. Armbrust et al. (bigdata, runtime, VLDB'14)
P2.8 Twitter Heron: Stream Processing at Scale, S. Kulkarni et al. (bigdata, SIGMOD'15)
P2.9 OCTET: Capturing and Controlling Cross-Thread Dependences Efficiently, M. D. Bond et al. (tooling, OOPSLA'13)
P2.10Efficient and Thread-Safe Objects for Dynamically-Typed Languages, B. Daloze et al. (runtime, OOPSLA'16)