Upload
igor-bronovskyy
View
446
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Паралелізм та concurrency -- напрямок, в якому технології програмування прямують зараз і, без сумніву, прямуватимуть в майбутньому. Багатоядерними процесорами оснащуються комп'ютери навіть початкового рівня, що відкриває можливості для створення швидких ефективних програм із "живими" інтерфейсами. Тому навіть ті розробники, які раніше не стикались із concurrent кодом, будуть все частіше і частіше програмувати із врахуванням паралелізму. В доповіді буде розглянуто шаблони паралельного програмування, способи асинхронного програмування, а також тенденції окремих сучасних технологій в області паралелізму та асинхронності. Слухачі зможуть отримати знання про основні способи організації паралельних обчислень у desktop-, web- і серверних аплікаціях, засоби досягнення responsive GUI, техніки вирішення проблем, що виникають у concurrent програмуванні.
Citation preview
Зміст
- Тренд- Основні терміни- Managing state- Паралелізм- Засоби
Вчора
Сьогодні
Завтра
Що відбувається?
- Ріст частоти CPU вповільнився- Через фізичні обмеження- Free lunch is over- ПЗ більше не стає швидшим саме по собі
Сучасні тренди
- Manycore, multicore- GPGPU, GPU acceleration, heterogeneous
computing- Distributed computing, HPC
Основні поняття
- Concurrency- Many interleaved threads of control
- Parallelism- Same result, but faster
- Concurrency != Parallelism- It is not always necessary to care about
concurrency while implementing parallelism- Multithreading- Asynchrony
Задачі
- CPU-bound- number crunching
- I/O-bound- network, disk
Стан
- Shared- accessible by more than one thread- sharing is transitive
- Private- used by single thread only
Task-based program
Application
Tasks (CPU, I/O)
Runtime (queuing, scheduling)
Processors (threads, processes)
Managing state
Isolation
- Avoiding shared state- Own copy of state- Examples:
- process isolation- intraprocess isolation- by convention
Immutability
- Multiple read -- not a problem!- All functions are pure- Requires immutable collections- Functional way: Haskell, F#, Lisp
Synchronization
- The only thing that remains to deal with shared mutable state
- Kinds:- data synchronization- control synchronization
Data synchronization
- Why? To avoid race conditions and data corruption
- How? Mutual exclusion- Data remains consistent- Critical regions
- locks, monitors, critical sections, spin locks- Code-centered
- rather than associated with data
Critical region|Thread 1|// ...|lock (locker)|{| // ...| data.Operation();| // ...|}|// ...|||
|Thread 2|// ...|||||||lock (locker)|{| // ...| data.Operation();| // ...|}
Control synchronization
- To coordinate control flow- exchange data- orchestrate threads
- Waiting, notifications- spin waiting- events- alternative: continuations
Three ways to manage state
- Isolation: simple, loosely coupled, highly scalable, right data structures, locality- Immutability: avoids sync- Synchronization: complex, runtime overheads, contention
- in that order
Паралелізм
Підходи до розбиття задач
- Data parallelism- Task parallelism- Message based parallelism
Data parallelism
How?
- Data is divided up among hardware processors- Same operation is performed on elements - Optionally -- final aggregation
Data parallelism
When?
- Large amounts of data- Processing operation is costly- or both
Data parallelism
Why?
- To achieve speedup
- For example, with GPU acceleration:- hours instead of days!
Data parallelism
Embarrassingly parallel problems- parallelizable loops- image processing
Non-embarrassingly parallel problems- parallel QuickSort
Data parallelism
...
Thread 1
...
Thread 2
Data parallelism
Structured parallelism
- Well defined begin and end points- Examples:
- CoBegin- ForAll
CoBegin
var firstDataset = new DataItem[1000];var secondDataset = new DataItem[1000];var thirdDataset = new DataItem[1000];
Parallel.Invoke( () => Process(firstDataset), () => Process(secondDataset), () => Process(thirdDataset) );
Parallel For
var items = new DataItem[1000 * 1000];// ...Parallel.For(0, items.Length, i => { Process(items[i]); });
Parallel ForEach
var tickers = GetNasdaqTickersStream();Parallel.ForEach(tickers, ticker => { Process(ticker); });
Striped Partitioning
Thread 1 Thread 2
...
Iterate complex data structures
var tree = new TreeNode();// ...Parallel.ForEach( TraversePreOrder(tree), node => { Process(node); });
Iterate complex data
...
Thread 1
Thread 2
Declarative parallelismvar items = new DataItem[1000 * 1000];// ...var validItems = from item in items.AsParallel() let processedItem = Process(item) where processedItem.Property > 42 select Convert(processedItem);
foreach (var item in validItems){ // ...}
Data parallelism
Challenges
- Partitioning- Scheduling- Ordering- Merging- Aggregation- Concurrency hazards: data races, contention
Task parallelism
How?
- Programs are already functionally partitioned: statements, methods etc.- Run independent pieces in parallel- Control synchronization- State isolation
Task parallelism
Why?
- To achieve speedup
Task parallelism
Kinds- Structured
- clear begin and end points- Unstructured
- often demands explicit synchronization
Fork/join
- Fork: launch tasks asynchronously- Join: wait until they complete- CoBegin, ForAll- Recursive decomposition
Fork/join
Task 1
Task 2
Task 3
SeqSeq
Fork/join
Parallel.Invoke( () => LoadDataFromFile(), () => SavePreviousDataToDB(), () => RenewOtherDataFromWebService());
Fork/join
Task loadData = Task.Factory.StartNew(() => { // ... });Task saveAnotherDataToDB = Task.Factory.StartNew(() => { // ... });// ...Task.WaitAll(loadData, saveAnotherDataToDB);// ...
Fork/join
void Walk(TreeNode node) { var tasks = new[] { Task.Factory.StartNew(() => Process(node.Value)), Task.Factory.StartNew(() => Walk(node.Left)), Task.Factory.StartNew(() => Walk(node.Right)) }; Task.WaitAll(tasks);}
Fork/join recursive
Root
SeqSeq Left
Right
Node
Left
Right
Node
Left
Right
Dataflow parallelism: Futures
Task<DataItem[]> loadDataFuture = Task.Factory.StartNew(() => { //... return LoadDataFromFile(); });
var dataIdentifier = SavePreviousDataToDB();RenewOtherDataFromWebService(dataIdentifier);//...DisplayDataToUser(loadDataFuture.Result);
Dataflow parallelism: Futures
Seq
Future
Seq Seq
Dataflow parallelism: Futures
Seq
Future
Seq
Future
Seq
Future
Seq Seq
Continuations
Seq
Task
Seq Seq
Task
Task
Continuationsvar loadData = Task.Factory.StartNew(() => { return LoadDataFromFile(); });
var writeToDB = loadData.ContinueWith(dataItems => { WriteToDatabase(dataItems.Result); });
var reportToUser = writeToDB.ContinueWith(t => { // ... });reportProgressToUser.Wait();
Producer/consumerpipeline
lines parsedlines DB
reading parsing storing
Producer/consumerpipeline
lines
parsedlines
DB
Producer/consumer
var lines = new BlockingCollection<string>();
Task.Factory.StartNew(() => { foreach (var line in File.ReadLines(...)) lines.Add(line); lines.CompleteAdding(); });
Producer/consumer
var dataItems = new BlockingCollection<DataItem>();
Task.Factory.StartNew(() => { foreach (var line in lines.GetConsumingEnumerable() ) dataItems.Add(Parse(line)); dataItems.CompleteAdding(); });
Producer/consumer
var dbTask = Task.Factory.StartNew(() => { foreach (var item in dataItems.GetConsumingEnumerable() ) WriteToDatabase(item); });
dbTask.Wait();
Task parallelism
Challenges
- Scheduling- Cancellation- Exception handling- Concurrency hazards: deadlocks, livelocks, priority inversions etc.
Message based parallelism
- Accessing shared state vs. local state- No distinction, unfortunately- Idea: encapsulate shared state changes into
messages- Async events- Actors, agents
Засоби
Concurrent data structures
- Concurrent Queues, Stacks, Sets, Lists- Blocking collections, - Work stealing queues- Lock free data structures- Immutable data structures
Synchronization primitives
- Critical sections, - Monitors, - Auto- and Manual-Reset Events,- Coundown Events, - Mutexes, - Semaphores, - Timers, - RW locks- Barriers
Thread local state
- A way to achieve isolation
var parser = new ThreadLocal<Parser>( () => CreateParser());
Parallel.ForEach(items, item => parser.Value.Parse(item));
Thread pools
ThreadPool.QueueUserWorkItem(_ => { // do some work });
AsyncTask.Factory.StartNew(() => { //... return LoadDataFromFile(); }) .ContinueWith(dataItems => { WriteToDatabase(dataItems.Result); }) .ContinueWith(t => { // ... });
Asyncvar dataItems = await LoadDataFromFileAsync();
textBox.Text = dataItems.Count.ToString();
await WriteToDatabaseAsync(dataItems);
// continue work
Технології
- TPL, PLINQ, C# async, TPL Dataflow- PPL, Intel TBB, OpenMP- CUDA, OpenCL, C++ AMP- Actors, STM- Many others
Підсумок
- Програмування для багатьох CPU- Concurrency != parallelism- CPU-bound vs. I/O-bound tasks- Private vs. shared state
Підсумок
- Managing state:- Isolation - Immutability- Synchronization
- Data: mutual exclusion- Control: notifications
Підсумок
- Паралелізм:- Data parallelism: scalable- Task parallelism: less scalable- Message based parallelism
Підсумок
- Data parallelism- CoBegin- Parallel ForAll- Parallel ForEach- Parallel ForEach over complex data structures- Declarative data parallelism
- Challenges: partitioning, scheduling, ordering, merging, aggregation, concurrency hazards
Підсумок
- Task parallelism: structured, unstructured- Fork/Join
- CoBegin- Recursive decomposition
- Futures- Continuations- Producer/consumer (pipelines)
- Challenges: scheduling, cancellation, exceptions, concurrency hazards
Підсумок
- Засоби/інструменти- Компілятори, бібліотеки- Concurrent data structures- Synchronization primitives- Thread local state- Thread pools- Async invocations- ...
Q/A