View
222
Download
1
Category
Preview:
Citation preview
Parallelism in the Standard C++: What to Expect in C++ 17
Artur Laksberg
Microsoft Corp.
May 8th, 2014
Parallelism in C++11/14
Fundamentals: Memory model Atomics
Basics: thread mutex condition_variable async future
Quicksort: Serial
void quicksort(int *v, int start, int end) { if (start < end) { int pivot = partition(v, start, end);
quicksort(v, start, pivot - 1);
quicksort(v, pivot + 1, end);
}}
Quicksort: Use Threads
void quicksort(int *v, int start, int end) { if (start < end) {
int pivot = partition(v, start, end);
std::thread t1([&] { quicksort(v, start, pivot - 1); });
std::thread t2([&] { quicksort(v, pivot + 1, end); });
t1.join(); t2.join(); }}
Problem 1:expensive
Problem 2:Fork-join not enforced
Problem 3:Exceptions??
Quicksort: Fork-Join Parallelism
void quicksort(int *v, int start, int end) { if (start < end) {
int pivot = partition(v, start, end);
quicksort(v, start, pivot - 1);
quicksort(v, pivot + 1, end);
}}
parallel region
task
task
Quicksort: Using Task Regions (N3832)void quicksort(int *v, int start, int end) { if (start < end) {
task_region([&] (auto& r) {
int pivot = partition(v, start, end);
r.run([&] { quicksort(v, start, pivot - 1); });
r.run([&] { quicksort(v, pivot + 1, end); });
}); }}
task
task
parallel region
Fork-Join Parallelism and Work Stealing
e();
task_region([] (auto& r) {
r.run(f);
g();
});
h();
e()
f() g()
h()
Q2: What thread runs g?
Q3: What thread runs h?
Q1: What thread runs f?
Work Stealing Design Choices What Thread Executes After
a Spawn? Child Stealing Continuation (parent)
Stealing
What Thread Executes After a Join? Stalling: initiating thread
waits Greedy: the last thread to
reach join continuestask_region([] (auto& r) { for(int i=0; i<n; ++i) r.run(f);});
Inspiration
Performing Parallel Operations On Containers
Intel Threading Building Blocks
Microsoft Parallel Patterns Library, C++ AMP
Nvidia Thrust
Parallel STL
Just like STL, only parallel… Can be faster
If you know what you’re doing
Two Execution Policies: std:par std::vec
Parallelization: What’s a Big Deal?
Why not already parallel?
std::sort(begin, end, [](int a, int b) { return a < b; });
User-provided closures must be thread safe:
int comparisons = 0;std::sort(begin, end, [&](int a, int b) { comparisons++; return a < b; });
But also special-member functions, std::swap etc.
It’s a Contract
What the user can do What the implementer can do
Asymptotic Guarantees:std::sort: O(n*log(n)), std::stable_sort: O(n*log2(n)), what about parallel sort?
What is a valid implementation? (see next slide)
Chaos Sorttemplate<typename Iterator, typename Compare>void chaos_sort( Iterator first, Iterator last, Compare comp ) { auto n = last-first; std::vector<char> c(n); for(;;) { bool flag = false; for( size_t i=1; i<n; ++i ) { c[i] = comp(first[i],first[i-1]); flag |= c[i]; } if( !flag ) break; for( size_t i=1; i<n; ++i ) if( c[i] ) std::swap( first[i-1], first[i] ); }}
Execution Policies
Built-in Execution Policies:extern const sequential_execution_policy seq;extern const parallel_execution_policy par;extern const vector_execution_policy vec;
Dynamic Execution Policy:class execution_policy{public:// ... const type_info& target_type() const; template<class T> T *target(); template<class T> const T *target() const;};
Using Execution Policy To Write Paralel Code
std::vector<int> vec = ...
// standard sequential sortstd::sort(vec.begin(), vec.end());
using namespace std::experimental::parallel;
// explicitly sequential sortsort(seq, vec.begin(), vec.end());
// permitting parallel executionsort(par, vec.begin(), vec.end());
// permitting vectorization as wellsort(vec, vec.begin(), vec.end());
Picking Execution Policy Dynamically
size_t threshold = ...
execution_policy exec = seq;
if(vec.size() > threshold){ exec = par;}
sort(exec, vec.begin(), vec.end());
Exception Handling
In C++ philosophy, no exception is silently ignored Exception list: container of exception_ptr objects
try{ r = std::inner_product(std::par, a.begin(), a.end(), b.begin(), func1, func2, 0);}catch(const exception_list& list){ for(auto& exptr : list) { // process exception pointer exptr }}
Vectorization: What’s a Big Deal?
int a[n] = ...;int b[n] = ...;for(int i=0; i<n; ++i){ a[i] = b[i] + c;}
movdqu xmm1, XMMWORD PTR _b$[esp+eax+132]movdqu xmm0, XMMWORD PTR _a$[esp+eax+132]paddd xmm1, xmm2paddd xmm1, xmm0movdqu XMMWORD PTR _a$[esp+eax+132], xmm1
a[i:i+3] = b[i:i+3] + c;
Vector Lane is not a Thread!
Taking locks Thread with thread_id x takes a lock… Then another “thread” with the same thread_id enters the
lock… Deadlock!!!
Exceptions Can we unwind 1/4th of the stack?
Vectorization: Not So Easy Any More…
void f(int* a, int*b){ for(int i=0; i<n; ++i) { a[i] = b[i] + c; func();
}}
mov ecx, DWORD PTR _b$[esp+esi+140]add ecx, ediadd DWORD PTR _a$[esp+esi+140], ecxcall func
Aliasing?
Side effects?Dependence?Exceptions?
Vectorization Hazard: Locks
for(int i=0; i<n; ++i){ lock.enter(); a[i] = b[i] + c; lock.release();}
for(int i=0; i<n; i+=4){ for(int j=0; j<4; ++j) lock.enter();
a[i:i+3] = b[i:i+3] + c;
for(int j=0; j<4; ++j) lock.release();}
This transformation is not safe!
Consider: f takes a lock, g releases the lock:
?
How Do We Get This?
void f(int* a, int*b){ for(int i=0; i<n; ++i) { a[i] = b[i] + c; func(); }
}
for(int i=0; i<n; i+=4){ a[i:i+3] = b[i:i+3] + c; for(int j=0; j<4; ++j) func();}
Need a helping hand from the programmer, because…
Vector Loop with Parallel STL
void f(int* a, int*b){ integer_iterator begin {0}; integer_iterator end {n};
std::for_each( std::vec, begin, end, [&](int i) { a[i] = b[i] + c; func(); }}
Parallelization vs. Vectorization
Parallelization Threads Stack Good for divergent code Relatively heavy-weight
Vectorization Vector Lanes No stack Lock-step execution Very light-weight
When To Vectorize
std::par No race conditions No aliasing
std::vec Same as std::vec, plus: No Exceptions No Locks No/Little Divergence
References
N3832: Task Region N3872: A Primer on Scheduling Fork-Join Parallelism
with Work Stealing N3724: A Parallel Algorithms Library N3850: Working Draft, Technical Specification for C++
Extensions for Parallelism parallelstl.codeplex.com
Recommended