Upload
lindsay-cobb
View
223
Download
1
Tags:
Embed Size (px)
Citation preview
Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster
2
Overview• Parallelization with Matlab using Parallel
Computing Toolbox(PCT)
• Matlab Distributed Computing Server Introduction
• Benefits of using the MDCS
• Hardware/Software/Utilization @ CBI
• MDCS Usage Scenarios
• Hands-on Training
3
Parallelization with Matlab PCT• The Matlab Parallel Computing Toolbox provides
access to multi-core, multi-system(MDCS), GPU parallelism.
• Many built-in Matlab functions directly support parallelism ( e.g. FFT ) transparently.
• Parallel constructs such as going from for loops to parfor loops.
• Allows handling of many different types of parallel software development challenges.
• MDCS allows scaling of locally developed parallel enabled Matlab applications.
4
Parallelization with Matlab PCT• Distributed / Parallel algorithm characteristics
– Memory Usage & CPU Usage
• Load a 4 Gigabyte file into Memory Calculate averages
– Communication/Data IO patterns
• Read file 1 ( 10 Gigabytes ) Run a function
• Worker B Send data to worker A run a function return data to worker B
– Dependencies
• Function 1 Function 2 Function 3
• Hardware resource contention ( e.g. 16 cores each trying to read /write a set of files, bandwidth limitations on RAM )
• Managing large #’s of small files Filesystem contention
5
Parallelization with Matlab PCT
GPU Cards/External Accelerator Cards
CPU’s, Multi-Cores
Clusters
Applications have layers of parallelism:For optimal solution, must look at the application as a whole.
Scalability: use as many workers as possible in an efficient manner
Matlab PCT + MDCS framework automates much of the complexity in developing parallel & distributed apps
6
Parallelization with Matlab PCT & MDCS
Distributed loops: parfor
Interactive development mode(matlabpool/pmode)
Distributed Arrays(spmd)
CPU’s, Multi-Cores MDCS Cluster
Scale out with the MDCS Cluster in Batch Job Submission Mode
7
MDCS BenefitsMDCS Worker Processes ( a.k.a. “Labs”)
– The workers never request regular Matlab or toolbox licenses.
– The only license an MDCS worker ever uses is an MDCS worker license( of which we have up to 64 ).
– Toolboxes are unlocked to an MDCS worker based on the licenses owned by the client during the job submission process.
– Wonderful parallel algorithm development environment with the superior visualization & profiling capabilities of the Matlab environment.
– Many built-in functions are parallel enabled: fft, lu, svd…
– Distributed arrays allow development of data – parallel algorithms
– Enable the scaling of codes that cannot be compiled using the Matlab Compiler Toolbox.
– Allows you to go from development on a laptop directly to running on up to 64 MDCS Labs. ( Some simulations can go from years of runtime to days of runtime on 64 MDCS Labs)
8
MDCS Structure
9
Hardware/Software/Utilization @ CBI
MDCS worker processes run on 4 physical servers Dell PowerEdge M910: Four x 16 core systems,
4x64GB RAM, 2x Intel Xeon 2.26 Ghz/system with 8 cores per processor
Total of 64 cores, with 256 GB total RAM distributed among systems
Max 64 MDCS worker licenses available Subsets of MDCS workers can be created based on
project needs
10
Usage scenarios Local system: Interactive Use: ( matlabpool /
spmd / pmode / mpiprofile ) – Local system(e.g. one of the Workstations @ CBI ) as part of initial
algorithm development.
MDCS: Non-interactive Use: Job&Task based– 2 main types: Independent vs. Communicating Jobs
• Both types can be used with either the local( on a non-cluster workstation ) or MDCS profile.
11
MDCS Workloads2 main types of workloads can be implemented with the MDCS:
– A job is logically decomposed into a set of tasks. The job may have 1 or more tasks, and each task may or may not have additional parallelism within it.
CASE 1: Independent Within a job the parallelism is fully independent, we have the opportunity to
use MDCS workers to offload some of the independent work units. The code will not make use parallel language features such as parfor, spmd. Note: In many cases, parfor can be transformed into a set of tasks.
– createJob() + createTask(), createTask(), … createTask()
CASE 2: Communicating Within a single job the parallelism is more complex, requiring the workers to
communicate or when parfor, spmd, codistributed arrays(language features are used from Parallel Compute Toolbox).
– createCommunicatingJob(), createTask()
12
MDCS Working Environment
13
MDCS Working Environment
14
Interactive Mode Sample(parfor)For well mapping workloads, parfor can yield exceptional performance improvement
From years to days / days to hours for certain workloads: ideally case are long running jobs with little or no inter-job communication.
Parfor enabled on the MDCS
Standard for loop
15
MDCS Scaling ( Batch Mode )
16
MDCS Scaling( Batch mode )
17
MDCS Scaling ( Batch mode )
18
Summary
• Applied examples of using MDCS in Batch mode available as part of hands-on section or via consulting appointment for more in-depth MDCS usage information.
• We can allocate a subset of MDCS workers on a per project basis.
19
Summary
• Wonderful parallel algorithm design & development environment
• Scale out codes up to 64 Matlab MDCS workers– Both distributed compute & memory
• Standard Matlab+Toolbox license usage minimization
• Many options to approach parallelization of computational workloads.
20
Acknowledgements
• This project received computational, research & development, software design/development support from the Computational System Biology Core/Computational Biology Initiative, funded by the National Institute on Minority Health and Health Disparities (G12MD007591) from the National Institutes of Health. URL: http://www.cbi.utsa.edu
22
Appendix A
23
Local Mode: Matlab Worker Process/Thread Structure
Parallel Toolbox constructs can be tested in local mode, the “lab” abstraction allows the actual process used for a lab to reside either locally or on a distributed server node.
MPI usedfor inter-process communication between “Labs”, Matlab Worker Processes
24
Local Mode Scaling Sample(parfor)
25
Interactive Mode Sample(pmode/spmd)
Each lab handles a piece of the data.
Results are gathered on lab 1.
Client session requests the complete data set to be sent to it using lab2client
26
Local vs. MDCS Mode Compare (parfor)
27
Appendix B: MDCS Access
• Access to MDCS provided via Cheetah Cluster.– On Linux: ssh –Y [email protected]– qlogin – matlab &
28
Appendix B: MDCS Access• Access to MDCS provided via Cheetah Cluster.– On Windows: Using PuTTY + Xming w/X11
forwarding– qlogin – matlab &
29
References[1] http://www.mathworks.com/products/parallel-computing/ ( Parallel Computing Toolbox reference )[2] http://www.mathworks.com/help/toolbox/distcomp/f1-6010.html#brqxnfb-1 (Parallel Computing Toolbox)[3] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Parallel Computing Toolbox )[4] http://www.mathworks.com/products/distriben/supported/license-management.html ( MDCS License Management )[5] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture Overview )[6] http://www.mathworks.com/cmsimages/62006_wl_mdcs_fig1_wl.jpg ( MDCS Architecture Overview: Scalability )[7] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Built-in MDCS support )[8] http://www.mathworks.com/products/datasheets/pdf/matlab-distributed-computing-server.pdf ( MDCS Licensing )[9] http://www.psc.edu/index.php/matlab ( MDCS @ PCS)[10] http://www.mathworks.com/products/compiler/supported/compiler_support.html ( Compiler Support for MATLAB and Toolboxes )[11] http://www.mathworks.com/support/solutions/en/data/1-2MC1RY/?solution=1-2MC1RY ( SGE Integration )[12] http://www.mathworks.com/company/events/webinars/wbnr30965.html?id=30965&p1=70413&p2=70415 ( MDCS Administration )[13] http://www.mathworks.com/help/toolbox/mdce/f4-10664.html ( General MDCE Workflow )[14] http://www.mathworks.com/help/toolbox/distcomp/f3-10664.html ( Independent Jobs with MDCS )[15] http://cac.engin.umich.edu/swafs/training/pdfs/matlab.pdf ( MDCS @ Umich ) [16] http://www.mathworks.com/products/optimization/examples.html?file=/products/demos/shipping/optim/optimparfor.html ( Optimization toolbox example )[17] http://www.mathworks.com/products/distriben/examples.html ( MDCS Examples )[18] http://www.mathworks.com/support/product/DM/installation/ver_current/ ( MDCS Installation Guide R2012a )[19] http://www.psc.edu/index.php/matlab ( MDCS @ PSC )[20] http://rcc.its.psu.edu/resources/software/dmatlab/ ( MDCS @ Penn State )[21] http://ccr.buffalo.edu/support/software-resources/compilers-programming-languages/matlab/mdcs.html ( MDCS @ U of Buffalo)[22] http://www.cac.cornell.edu/wiki/index.php?title=Running_MDCS_Jobs_on_the_ATLAS_cluster ( MDCS @ Cornell )[23] http://www.mathworks.com/products/distriben/description3.html ( MDCS Licensing )[24] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture )
30
References[25] http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/bqxooam-1.html ( Built-in functions that work with distributed arrays )[26] http://www.rz.rwth-aachen.de/aw/cms/rz/Themen/hochleistungsrechnen/nutzung/nutzung_des_rechners_unter_windows/~sxm/
MATLAB_Parallel_Computing_Toolbox/?lang=de ( MDCS @ Aachen University )[27] http://www.mathworks.com/support/solutions/en/data/1-9D3XVH/index.html?solution=1-9D3XVH ( Compiled Matlab Applications using PCT + MDCS)[28] http://www.hpc.maths.unsw.edu.au/tensor/matlab ( MDCS @ UNSW )[29] http://blogs.mathworks.com/loren/2012/04/20/running-scripts-on-a-cluster-using-the-batch-command-in-parallel-computing-toolbox/ ( Batch command )[30] http://www.rcac.purdue.edu/userinfo/resources/peregrine1/userguide.cfm#run_pbs_examples_app_matlab_licenses_strategies ( MDCS @ Purdue )[31] http://www.mathworks.com/help/pdf_doc/distcomp/distcomp.pdf ( Parallel Computing Toolbox R2012a )[32] http://www.nccs.nasa.gov/matlab_instructions.html ( MDCS @ Nasa )[33] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT, MDCS R2012a interface changes )[34] http://www.mathworks.com/help/toolbox/distcomp/createcommunicatingjob.html ( Communicating jobs )[35] http://www.mathworks.com/products/parallel-computing/examples.html?file=/products/demos/shipping/distcomp/paralleltutorial_dividing_tasks.html
( Moving parfor loops to jobs+tasks )[36] http://people.sc.fsu.edu/~jburkardt/presentations/fsu_2011_matlab_tasks.pdf ( MDCS @ FSU: Task based parallelism )[37] http://www.icam.vt.edu/Computing/fdi_2012_parfor.pdf ( MDCS @ Virginia Tech: Parfor parallelism )[38] http://www.hpc.fsu.edu/ ( MDCS @ FSU, HPC main site )[39] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT Updates in R2012a )[40] http://www.mathworks.com/help/distcomp/using-matlab-functions-on-codistributed-arrays.html ( Built in functions available for Co-Distributed arrays )[41] http://scv.bu.edu/~kadin/Tutorials/PCT/matlab-pct.html ( Matlab PCT @ Boston University )[42] http://www.circ.rochester.edu/wiki/index.php/MatlabWorkshop#Example_using_distributed_arrays_for_FFT[43] http://www.advancedlinuxprogramming.com/alp-folder/alp-ch04-threads.pdf[44] http://www.mathworks.com/products/distriben/parallel/accelerate.html[45] http://www.mathworks.com/products/distriben/examples.html?file=/products/parallel-computing/includes/parallel.html[46] http://en.wikipedia.org/wiki/Gustafson%27s_law[47] http://www.mathworks.com/help/distcomp/index.html[48] http://www.mathworks.com/cmsimages/43623_wl_dm_using_paralles_forloops_wl.jpg[49] http://www.mathworks.com/help/distcomp/mpiprofile.html