4
COMPILERS There are a number of optimizing compilers available for the Windows ® platform: Visual studio itself has Visual C++, Visual C# and Visual Basic ® . New to the family is F#, which is a high performance functional programming language based on Ocaml/ ML. These are available in 32/64 bit form for x86/x64 and Itanium Platforms. Among the many partner compilers include Intel C++ and Fortran and PGI C++ and Fortran, both of which support automatic parallelization and produce excellent FP code. In addition, these compilers are fully integrated into the Visual Studio IDE with project/build/syntax highlighting/ support etc. New dynamic languages in the Microsoft ® . NET family also include IronPython and IronRuby. These languages can take full advantage of the CLR’s built-in libraries. The latest GNU compilers are also available on Windows and can be used effectively in a variety of scenarios— primarily source/compiler compatibility between Windows/Linux. PROFILERS There are numerous profilers aimed at different types of profiling tasks available on Windows. Some do system wide profiling, others look for single Dcache misses in a three-level nested inner loop. Here’s a sampling of what’s available: With Visual Studio, you get a built in profiler that does both instrumentation and IP sampling profiling along with A/B comparisons, fully integrated into the IDE. For lower level profiling, you can use Intel’s Vtune or AMD’s Catalyst to dig even deeper for instruction/chip level issues. Intel’s Trace Collector Analyzer For MPI profiling, Windows HPC Server 2008 has an ETW based profiler which collects events/msgs at the application, MPI library and driver level, performs clock corrections and displays the results. In Windows Compute Cluster 2003, the display is textual, but Windows HPC Server 2008 has a variety of advanced viewers. We’re excited to announce that Vampir is currently being ported natively to Windows! The output for Windows HPC Server 2008’s tracing tools is the industry standard OTF. OTF traces can be viewed by a number of 3rd party viewers. Another powerful viewer is Intel’s Trace Collector Analyzer, already in Beta form on Windows using Intel’s MPI library and is being ported directly to MSMPI now. Both of these excellent trace viewers can be used not only for profiling, but in many “debugging” scenarios as well. OVERVIEW What’s in a programmer’s toolbox these days? A compiler, a debugger, an editor and a profiler perhaps? In the era of parallel computing, whether on an eight-way multi-core system or an 800 node cluster, what you choose to place in your toolbox will suddenly matter a great deal more and have a dramatic impact on how productive you will be, and how efficient your code will be. Microsoft understands this and has put together a rich ecosystem of tools that address the needs of multi-core and cluster programmers. Let’s take a quick tour of Microsoft Visual Studio ® and some of its partner tools. THE WINDOWS ® PARALLEL PROGRAMMING ECOSYSTEM—TOOLS FOR A PARALLEL UNIVERSE Visual Studio Profiler PGI Fortran in Visual Studio

THE WINDOWS PARALLEL PROGRAMMING …dsp.vscht.cz/konference_matlab/MATLAB08/prispevky... · available for the Windows ... correct usage of the APIs and their arguments at runtime

  • Upload
    lamthu

  • View
    219

  • Download
    6

Embed Size (px)

Citation preview

Page 1: THE WINDOWS PARALLEL PROGRAMMING …dsp.vscht.cz/konference_matlab/MATLAB08/prispevky... · available for the Windows ... correct usage of the APIs and their arguments at runtime

COMPILERS

There are a number of optimizing compilers available for the Windows® platform: Visual studio itself has Visual C++, Visual C# and Visual Basic®. New to the family is F#, which is a high performance functional programming language based on Ocaml/ML. These are available in 32/64 bit form for x86/x64 and Itanium Platforms.

Among the many partner compilers include Intel C++ and Fortran and PGI C++ and Fortran, both of which support automatic parallelization and produce excellent FP code. In addition, these compilers are fully integrated into the Visual Studio IDE with project/build/syntax highlighting/support etc. New dynamic languages in the Microsoft®. NET family also include IronPython and IronRuby. These languages can take full advantage of the CLR’s built-in libraries. The latest GNU compilers are also available on Windows and can be used effectively in a variety of scenarios—primarily source/compiler compatibility between Windows/Linux.

PROFILERS

There are numerous profilers aimed at different types of profiling tasks available on Windows. Some do system wide profiling, others look for single Dcache misses in

a three-level nested inner loop. Here’s a sampling of what’s available:

With Visual Studio, you get a built in profiler that does both instrumentation and IP sampling profiling along with A/B comparisons, fully integrated into the IDE. For lower level profiling, you can use Intel’s Vtune or AMD’s Catalyst to dig even deeper for instruction/chip level issues.

Intel’s Trace Collector Analyzer

For MPI profiling, Windows HPC Server 2008 has an ETW based profiler which collects events/msgs at the application, MPI library and driver level, performs clock corrections and displays the results. In Windows Compute Cluster 2003, the display is textual, but Windows HPC Server 2008 has a variety of advanced viewers. We’re excited to announce that Vampir is currently being ported natively to Windows! The output for Windows HPC Server 2008’s tracing tools is the industry standard OTF. OTF traces can be viewed by a number of 3rd party viewers. Another powerful viewer is Intel’s Trace Collector Analyzer, already in Beta form on Windows using Intel’s MPI library and is being ported directly to MSMPI now. Both of these excellent trace viewers can be used not only for profiling, but in many “debugging” scenarios as well.

OVERVIEW

What’s in a programmer’s toolbox these days? A compiler, a debugger, an editor and a profiler perhaps? In the era of parallel computing, whether on an eight-way multi-core system or an 800 node cluster, what you choose to place in your toolbox will suddenly matter a great deal more and have a dramatic impact on how productive you will be, and how efficient your code will be. Microsoft understands this and has put together a rich ecosystem of tools that address the needs of multi-core and cluster programmers. Let’s take a quick tour of Microsoft Visual Studio® and some of its partner tools.

THE WINDOWS® PARALLEL PROGRAMMING ECOSYSTEM—TOOLS FOR A PARALLEL UNIVERSE

Visual Studio Profiler

PGI Fortran in Visual Studio

Page 2: THE WINDOWS PARALLEL PROGRAMMING …dsp.vscht.cz/konference_matlab/MATLAB08/prispevky... · available for the Windows ... correct usage of the APIs and their arguments at runtime

CODE ANALYZERS

Various static and dynamic code analysis tools exist on Windows: Some perform correctness checks, others look for anomalies of one sort or another, while others enforce policies. Of these, some relevant ones for parallel programming include Intel’s Thread Checker, which performs analysis to look for deadlocks, stalls, data races, etc for example. The MPI dynamic analysis tool Marmot (HLRS) is currently being ported to Windows and MSMPI as well. This tool surveys the MPI-calls and automatically checks for correct usage of the APIs and their arguments at runtime as well as detecting certain deadlock conditions. It does not replace classic debuggers, but can be used in addition to them.

PARALLEL PROGRAMMING MODELS

The parallel programming tools available on Windows can be generally divided into Cluster and Multi-core categories. On the cluster we provide MSMPI, which is based on MPICH2 with some Windows specific changes for security & performance. Note that this is OSS and Microsoft is providing its bug fixes and enhancements back to the main trunk. There are other MPI libraries available on Windows such as Intel’s and HP’s. Beyond MPI, you can use the Parametric Sweep feature of the Windows HPC Server 2008 job scheduler to quickly run task parallel sweeps with automatic input/output file mapping and naming.

At the multi-core node level, Visual Studio C++ provides support for the OpenMP standard in both native and managed (.NET) modes. Partner compilers such as Intel’s and PGI’s also provide OpenMP support as well automatic parallelization for certain loops.

Beyond OpenMP, for managed languages, Microsoft is working on PFx (Parallel Frameworks) which consists of two major technologies: TPL and Plinq. The Task Parallel

Library, provides support for easy threading & library based parallelization. These take advantage of various C# features such as Lambdas and generics to “extend” the language to support various parallel constructs. TPL also provides basic features such as a work stealing infrastructure for balanced parallelism. Our partner Intel also provides the TBB (Thread Building Blocks)

library which has some similar features and is available now on Windows for C++. Another member of the upcoming PFx family for parallelization of .NET code is Plinq (Parallel LINQ).

void ParaMatrixMult(int size, double[,] m1, double[,] m2, double[,] result) { Parallel.For( 0, size, delegate(int i) { for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } } }); }

Easy loop parallelization in C# using PFx

Linq itself is a new feature in .NET languages which enables writing SQL like queries in your code directly. Plinq in turn enables parallelization of those queries over multiple processors/threads by simply annotating the query with “AsParallel()”. Note the absence of any explicit threading, locks, etc. in the example shown. Plinq takes advantage of many of the core features that are present in the TPL package, C# 3.0 language features and .NET. PFx will be release as a technology preview release in early 2008.

SOA style applications on Windows HPC Server 2008 cluster

On the native C++ front, we are also experimenting with various library and language extensions to help developers in parallelizing their code. These range for example from the “parallel” keyword to Software Transactional Memory. CTP’s for parallelizing C++ beyond OpenMP will also be available in 2008. While Windows Compute Cluster Server 2003 focused primarily on traditional “batch” HPC applications, Windows HPC Server 2008 supports an SOA like model which enables creating “interactive” applications. These SOA apps are ideal for scenarios where many short duration calculations for example need to be made and the granularity does not justify the full overhead of launching a job and the corresponding program stack setup/tear down. These user supplied service instances are instantiated and throttled as needed dynamically by Windows HPC Server 2008 and can handle anything from UDF’s in a complex Excel spread sheet to portfolio analysis type calculations. The SOA interactive app model uses the WCF (Windows Communication Foundation)

>mpiexec -np 3 deadlock 1 rank 0 performs MPI_Init2 rank 1 performs MPI_Init3 rank 0 performs MPI_Comm_rank4 rank 1 performs MPI_Comm_rank5 rank 0 performs MPI_Comm_size6 rank 1 performs MPI_Comm_size7 rank 0 performs MPI_Recv8 rank 1 performs MPI_Recv8 Rank 0 is pending!8 Rank 1 is pending!WARNING: deadlock detected, all clients are pending

OpenMP in Visual C++

. . .

main () {

int i, chunk;float a[N], b[N], c[N];

. . .

#pragma omp parallel for \ shared(a,b,c,chunk) private(i) \ schedule(static,chunk) for (i=0; i < n; i++) c[i] = a[i] + b[i];}

Sample output from Marmot

IEnumerable<T> data = . . . ;

var q = from x in data.AsParallel()

where p(x)

orderby k(x)

select f(x);

foreach (var e in q) a(e);

Auto query parallelization for Linq

Easy loop parallelization in C# using PFx

Auto query parallelization for Linq

Page 3: THE WINDOWS PARALLEL PROGRAMMING …dsp.vscht.cz/konference_matlab/MATLAB08/prispevky... · available for the Windows ... correct usage of the APIs and their arguments at runtime

framework and is fully supported by the Visual Studio languages and debugging / profiling tools. Speaking of UDF’s, Excel 14 and Windows HPC Server 2008 now support offloading long running spread sheets automatically onto the cluster!

MATH LIBRARIES

A large portion of today’s HPC, graphics, financial programs requires highly tuned math libraries. On the Windows platform there are numerous choices of general and

domain specific math libraries. From Intel and AMD you get platform specific versions of their highly tuned math libraries for x86, x64 and Itanium (Intel) platforms. 3rd party vendors such as Visual Numerics and NAG also provide advanced higher level math libraries for a variety of verticals. One unique feature of VNI’s math libraries is that it’s available in both Native and Managed (.NET) versions (all written in C#). Of course there is an abundance of math libraries from the OSS world that’s available on Windows as well (see LAPACK, ScaLAPACK, reference BLAS, BLACS, ATLAS automatic tuners and various netlib).

VISUALIZATION TOOLS

Visualization in the context of development in HPC often refers to the ability to visualize very large arrays, data sets or program events. For both of these, there are several choices available to the Windows programmer. For visualization of large data sets,

tools as such PV-Wave can be used to quickly manipulate and analyze complex data. OSS tools such as ParaView which is based on the Visualization Toolkit are available as well. These can be run on multi-core or distributed systems in standalone and client/server modes. Other tools such as Dresden U’s Vampir and Intel’s Trace Collector/Analyzer can be used to visualize MPI message traces

and across a 1000 nodes with ease. For end user applications, GPUs can be used to perform sophisticated visualization as well. Products from Nvidia and ATI provide support for writing GPU targeted code from inside Visual Studio.

SCRIPTING

While HPC and MC programs can be complex by themselves, they’re usually just one part of a larger pipeline requiring all types of glue code & scripting to connect them all together. These scripts are generally written in standard scripting languages many of which are available on Windows: awk, sed, perl, various Unix shells (C, Korn, Bash, …). Beyond traditional scripting tools, there are other dynamic languages that are used

effectively for scripting as well. Examples of these include Python and Ruby. Both of these languages (IronPython and IronRuby) are available in the .NET languages family running on top the CLR. In addition to the core language, they also come with full Visual Studio integration and access to the extensive CLR frameworks. Since they all run on the CLR, mixing script & standard languages. For example C#/VB is quite easy. Another language that’s gaining popularity for scripting as well is F#. This functional programming language based on OCaml is now a full member of the Visual Studio languages.

Another exciting arrival in this zip code is Windows PowerShell™. Windows PowerShell is the new command shell & scripting language that replaces the aging cmd.exe processor. What’s special about Windows PowerShell? It’s a modern shell that has all the features of the latest Unix shells with the addition that instead of passing “text” between stages, it passes “objects”. This enables you to build powerful, intelligent scripts in minimal number of lines that are still highly readable and maintainable.

TESTING

Needless to say, parallel programs are in general much harder to debug and maintain. It’s critical to know where you stand with every line of code that’s checked in by your team of programmers. Visual Studio has built-in

testing features that go a long way in helping you maintain and ship high quality software. You can manage a large number of test cases, group and sort tests using attributes or categorize into lists to manage as a single artifact.

UNIX COEXISTENCE AND MIGRATION TOOLS

Interested in porting your Linux/Unix code to Windows but don’t want to go through yet another learning curve? No problem, consider using SUA (Services for Unix Applications). This POSIX compliant environment allows you to build your applications natively on Windows, while keeping all of your

Unix API calls in place. With minimal to no modifications you can build Unix targeted code with Visual Studio, GCC or other compilers, link with the SUA runtime and run at full speed on Windows.

Built-in support for unit testing

$ echo "this is a string" | tr '[a-z]' '[A-Z]'

PS> "string".Insert(1,"ABC")

Text vs objects in PowerShell

O

Building Unix apps under SUA is easy.

Text vs. objects in Powershell

Page 4: THE WINDOWS PARALLEL PROGRAMMING …dsp.vscht.cz/konference_matlab/MATLAB08/prispevky... · available for the Windows ... correct usage of the APIs and their arguments at runtime

For example, the Apache server can be built under SUA with 5 lines of change. You can also choose to port fully to Windows of course. For recreating your Unix like development environment on Windows, you have many options which include the GNU tools, SUA tools (100+ utilities, shells, POSIX runtime, etc), including Emacs and vi to avoid a complete reprogramming of your memory cells.

SCIENTIFIC TOOLBOXES

Beyond the usual generic programming languages, IDEs and libraries, there are several packages that are aimed at scientists and engineers such as Mathematica, Matlab and Maple. These packages provide a rich, integrated environment for scripting, data manipulation, visualization, etc. They all provide built-in support for parallelization (such as Matlab’s ParFor keyword) for easily threading for loops.

Matlab can offload to WHPC

Another key feature is that these systems are integrated with Windows HPC Server 2008 job scheduler. For problems that require the power of a cluster, Mathematica and Matlab enable you to easily distribute your workload onto a Windows HPC Server 2008 cluster from inside the IDE.

GPGPUs If you have a workload that lends itself well today’s powerful yet affordable GPUs, there are several choices available on Windows. Nvidia provides a Hardware/Software solutions called CUDA which comes with an SDK that includes numerous examples on matrix manipulation, parallel versions of various standard algorithms, FFTs,

BLAS, etc. The interface is C based and is available from inside Visual Studio. ATI has a similar offering they call “Close To Metal” which is also usable from Visual Studio. Another firm, RapidMind uses advanced language features of C++ to effectively hide some of the complexities of GPU programming. HPC clusters can be

enhanced with GPUs to achieve tremendous speed ups for problems that fit its model.Fortunately, this usage model is expanding thanks to better programming models, libraries and tools. Windows HPC Server 2008 allows the admins to “tag” nodes that have GPUs and to easily select those machines for jobs that require GPUs.

PARALLEL PROGRAMMING TOOLS AT A GLANCE

Compilers and languages

• Visual C++• Visual C#• Visual Basic• Visual F#

• Intel C++ & Fortran• PGI C++ & Fortran

Debuggers

• WinDbg• VS Debugger (MC & MPI)• Allinea Visual Studio

plug-in (MPI)• MPI/Event Tracing for Windows• PGI MPI Debugger

Profilers

• Visual Studio Profiler• Vtune• CodeAnalyst• MPI/Event Tracing for Windows• PGI MPI Profiler

Analyzers

• Marmot• MPI/Event Tracing for Win-

dows• Vampir• Intel Trace Collector/Analyzer

• Intel Thread Checker

• Utah U MPI model checker

Parallel programming models

• OpenMP• MPI (MS, Intel, HP MPI libs)• MPI.NET• MPI.C++• PFx: Task Parallel Library

• PFx: Parallel LINQ• SOA on Cluster• Intel Thread Build-

ing Blocks

Math Libraries

• Intel MKL• AMD IMSL• Visual Numerics• NAG• Various OSS mathlibs

OSS• Various mathlibs• MPI.NET and MPI.C++• Numerous vertical HPC

apps & libs

© 2007 Microsoft Corporation. All rights reserved. This data sheet is for informational purposes only. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Microsoft Corporation. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft®, and Windows®, Visual Studio®, Windows Vista® and the Windows logo are trademarks of the Microsoft group of companies. Other product and company names herein may be the trademarks of their respective owners.Microsoft Corporation • One Microsoft Way • Redmond, WA 98052-6399 • USA

sched = findResource('scheduler', 'type', 'CCS') j = createJob(sched) createTask(j, @sum, 1, {[1 1]}) createTask(j, @sum, 1, {[2 2]}) createTask(j, @sum, 1, {[3 3]}) submit(j); waitForState(j) results = getAllOutputArguments(j) results = [2] [4] [6] destroy(j)

Analyzing GPU behavior