Upload
davida
View
30
Download
0
Embed Size (px)
DESCRIPTION
Overview Let’s get started!. Outline Quick Introduction PLINQ Hands-On Performance Tips Prerequisites .NET and C# LINQ Threading Basics. Multi-Core and .NET 4 In the words of developers. - PowerPoint PPT Presentation
Citation preview
OverviewLet’s get started!
> Outline1. Quick Introduction2. PLINQ Hands-On 3. Performance Tips
> Prerequisites1. .NET and C#2. LINQ3. Threading Basics
Multi-Core and .NET 4In the words of developers
> “Getting an hour-long computation done in 10 minutes changes how we work.”- Carl Kadie, Microsoft’s eScience Research Group
> “.NET 4 has made it practical and cost-effective to implement parallelism where it may have been hard to justify in the past.“- Kieran Mockford, MSBuild
> “I do believe the .NET Framework 4 will change the way developers think about parallel programming.“- Gastón C. Hillar, independent IT consultant and freelance author
Visual Studio 2010Tools, programming models and runtimes
Parallel Pattern Library
Resource Manager
Task Scheduler
Task Parallel Library
Parallel LINQ
Managed NativeKey:
ThreadsOperating System
Concurrency Runtime
Programming Models
ThreadPool
Task Scheduler
Resource Manager
Data
Stru
ctu
res
Data
Str
uctu
res
Tools
Tooling
ParallelDebugge
r Tool Windows
Concurrency
Visualizer
AgentsLibrary
UMS Threads
.NET Framework 4 Visual C++ 10
Visual Studio
IDE
Windows
Parallel LINQ
From LINQ to Objects to PLINQAn easy change
> LINQ to Objects query:
int[] output = arr .Select(x => Foo(x)) .ToArray();
int[] output = arr.AsParallel() .Select(x => Foo(x)) .ToArray();
> PLINQ query:
PLINQ hands-on
coding walkthrough
Array Mapping
1Thread
1
Select
int[] input = ...bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();
input: output:
6
3
8
2
7
…
Thread 2
Select
Thread N
Select
F
F
T
F
T
T
…
Array to array mapping is simple and efficient.
Sequence Mapping
Thread 1
Thread 2
Thread N
Select
IEnumerable<int> input = Enumerable.Range(1,100);bool[] output = input.AsParallel() .Select(x => IsPrime(x)) .ToArray();
Select
Select
Results 2
Results N
...
Results 1
Input Enumerator
Lock
output:
Each thread processes a partition of inputs and stores results into a buffer.
Buffers are combined into one array.
Asynchronous Mapping
var q = input.AsParallel() .Select(x => IsPrime(x));foreach(var x in q) { ... }
Thread 1
Thread 2
Thread N
Input Enumerator
Select
Select
Select
Lock
Results 1
Results 2
Results N
OutputEnumerator
Main Thread
foreach
...
Poll
MoveNext
In this query, the foreach loop starts consuming results as they are getting computed.
Async Ordered Mapping or Filter
var q = input.AsParallel().AsOrdered() .Select(x => IsPrime(x));foreach(var x in q) { ... }
Thread 1
Thread 2
Thread N
Input Enumerator
Op
Op
Op
Lock
Results 1
Results 2
Results N
OutputEnumerator
Main Thread
foreach
...
Poll
MoveNext
Ordering
Buffer
When ordering is turned on, PLINQ orders elements in a reordering buffer before yielding them to the foreach loop.
Aggregation
Thread 1
Input Enumerator
Aggregate
Lock
...
Thread 2
Aggregate
Thread N
Aggregate
int result = input.AsParallel() .Aggregate( 0, (a, e) => a + Foo(e), (a1,a2) => a1 + a2);
res1:
res2:
resN:
result:
Each thread computes a local result.
The local results are combined into a final result.
Search
Thread 1
Input Enumerator
First
Lock
...
Thread 2
First
Thread N
First
int result = input.AsParallel().AsOrdered() .Where(x => IsPrime(x)) .First();
resultFound:
Fresult:
Poll
Set
More complex queryint[] output = input.AsParallel() .Where(x => IsPrime(x)) .GroupBy(x => x % 5) .Select(g => ProcessGroup(g)) .ToArray();
Thread 1
Where
...
Input Enumerator
Lock
GroupBy
Groups 2
Groups1
output:
Thread 2
Where
GroupBy
Thread 1
Select
...
Thread 2
Select Results2
Results1
PLINQ PERFORMANCE TIPS
Performance Tip #1:Avoid memory allocations
> When the delegate allocates memory> GC and memory allocations can become
the bottleneck> Then, your algorithm is only as scalable
as GC> Mitigations:
> Reduce memory allocations> Turn on server GC
Performance Tip #2:Avoid true and false sharing
> Modern CPUs exploit locality> Recently accessed memory locations are
stored in a fast cache> Multiple cores
> Each core has its own cache> When a memory location is modified, it is
invalidated in all caches> In fact, the entire cache line is invalidated
> A cache line is usually 64 or 128 bytes
Core 1 5
Performance Tip #2:Avoid True and False Sharing
Thread 1
Core 2
Thread 2
Core 3
Thread 3
Core 4
Thread 4
Invalidate
Cache
Cache
Cache
Cache
6 7 3 25 7 3 2
Memory:
5 7 3 2
5 7 3 2
5 7 3 2
Cache line
If cores continue stomping on each other’s caches, most reads and writes will go to the main memory!
Performance Tip #3:Use expensive delegates
> Computationally expensive delegate is the best case for PLINQ
> Cheap delegate over a long sequence may also scale, but:> Overheads reduce the benefit of scaling
> MoveNext and Current virtual method calls on enumerator
> Virtual method calls to execute delegates> Reading a long input sequence may be
limited by the memory throughput
Performance Tip #4:Write simple PLINQ queries
> PLINQ can execute all LINQ queries> Simple queries are easier to reason about> Break up complex queries so that only the
expensive data-parallel part is in PLINQ:
src.Select(x => Foo(x)) .TakeWhile(x => Filter(x)) .AsParallel() .Select(x => Bar(x)) .ToArray();
Performance Tip #5:Choose appropriate partitioning
> Partitioning algorithms vary in:> Overhead> Load-balancing> The required input representation
> By default:> Array, IList<> are partitioned statically> Other IEnumerable<> types are partitioned
on demand in chunks> Custom partitioning supported via
Partitioner
Performance Tip #6Use PLINQ with thought and care
> Measure, measure, measure!> Find the bottleneck in your code> If the bottleneck fits a data-parallel
pattern, try PLINQ> Measure again to validate the
improvement> If no improvement, check
performance tips 1-5
More Information
> Parallel Computing Dev Center> http://msdn.com/concurrency
> Code samples> http://code.msdn.microsoft.com/ParExtSamples
> Team Blogs> Managed: http://blogs.msdn.com/pfxteam > Tools: http://blogs.msdn.com/visualizeparallel
> Forums> http://social.msdn.microsoft.com/Forums/en-US/categor
y/parallelcomputing> My blog
> http://igoro.com/
YOUR FEEDBACK IS IMPORTANT TO US!
Please fill out session evaluation
forms online atMicrosoftPDC.com
Learn More On Channel 9
> Expand your PDC experience through Channel 9
> Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses
channel9.msdn.com/learnBuilt by Developers for Developers….
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.