32
Maximizing Matlab Performance Ed Hall [email protected] Research Computing Support Center Wilson Hall, Room 244 University of Virginia Phone: 243-8800 Email: [email protected] URL: www.itc.virginia.edu/researchers/services.html Maximizing Matlab Performance – p.1/32

Tips Para Mejorar Desempeño MATLAB

Embed Size (px)

DESCRIPTION

Tips Para Mejorar Desempeño MATLAB

Citation preview

  • Maximizing Matlab PerformanceEd Hall

    [email protected]

    Research Computing Support CenterWilson Hall, Room 244University of Virginia

    Phone: 243-8800Email: [email protected]

    URL: www.itc.virginia.edu/researchers/services.html

    Maximizing Matlab Performance p.1/32

  • Outline of Talk

    Profiling for Improving Performance

    Techniques for Improving Performance

    Memory Management

    The Matlab Compiler

    Conclusions and Future Directions

    Maximizing Matlab Performance p.2/32

  • Profiling for Improved Performance

    What is Profiling?

    Identify which functions in your code consume the most time.

    Determine why you are calling them and look for ways to minimizetheir use.

    Profiling uncovers performance problems solved by

    Avoiding unnecessary computation, which can arise from oversight.

    Changing your algorithm to avoid time-consuming functions.

    Avoiding recomputation by storing results for future use.

    Goal is for most time spent on calls to few builtin functions.

    Maximizing Matlab Performance p.3/32

  • The Matlab Profiler

    The Matlab Profiler is a graphical user interface for viewing where the time is beingspent in your M-code. You can use any of the following methods to open theProfiler:

    Select Desktop-> Profiler from the MATLAB desktop.

    Select Tools->Open Profiler from the menu in the MATLAB Editor/Debugger.

    Select one or more statements in the Command History window, right-click toview the context menu, and choose Profile Code.

    Enter profile viewer in the Command Window:

    Running the Profiler on Matlab code generates the following:

    The Profile Summary report presents statistics about the overall execution ofthe function and provides summary statistics for each function called.

    The Profile Detail report shows profiling results for a selected function thatwas called during profiling.

    Maximizing Matlab Performance p.4/32

  • Profiling Process Guidelines

    1. In the summary report produced by the Profiler, look for functions thatused a significant amount of time or were called most frequently.

    2. View the detail report produced by the Profiler for those functions andlook for the lines that use the most time or are called most often. Keepa copy of your first detail report to use as a reference.

    3. Determine whether there are changes you can make to the lines mostcalled or the most time-consuming lines to improve performance, e.g.load inside a loop.

    4. Click the links to the files and make the changes you identified forpotential performance improvement. Save the files and run clear all.Run the Profiler again and compare the results to the original report.

    5. Repeat this process to continue improving the performance.

    Maximizing Matlab Performance p.5/32

  • Profiling for Improved Performance

    Premature optimization can increase code complexity unnecessarily withoutproviding a real gain in performance.

    Your first implementation should be as simple as possible. Then, if speed is anissue, use profiling to identify bottlenecks.

    If you just need to get an idea of how long your program (or a portion of it) takes torun, or to compare the speed of different implementations of a program, you canuse the stopwatch timer functions, tic and toc as shown below.

    tic-- run the program section to be timed --toc

    The Matlab Profiler References:

    www.itc.virginia.edu/research/matlab/#links

    Matlab News and Notes, May 2003

    Maximizing Matlab Performance p.6/32

  • The M-Lint Code Check Report

    Why is writing good code so important?:

    Displays potential errors and problems, as well as opportunities for improvement inyour code.

    Efficient code executes faster and uses fewer resources.

    Maintainable code enables you to review, learn from, reuse, and adapt thecode.

    Matlab M-lint Tool: Displays potential errors and problems, as well as opportunitiesfor improvement in your code.

    Displays a message for each line of an M-file it determines might be improved.

    Suppress unnecessary messages by adding %#ok to the end of the statementin the M-file.

    Further information:www.itc.virginia.edu/research/matlab/#links

    Maximizing Matlab Performance p.7/32

  • Techniques for Improving Performance

    Preallocating Arrays

    for and while loops that incrementally increase the size of a data structure eachtime through the loop can adversely affect performance and memory use.

    You can often improve on code execution time by preallocating the maximumamount of space that would be required for the array ahead of time.

    x = 0;for k = 2:1000

    x(k) = x(k-1) + 5;end

    Preallocate a 1-by-1000 block of memory for x initialized to zero.

    x = zeros(1, 1000);for k = 2:1000

    x(k) = x(k-1) + 5;end

    Maximizing Matlab Performance p.8/32

  • Preallocating Arrays

    Preallocating a Nondouble Matrix

    When you preallocate a block of memory to hold a matrix of some type other thandouble, avoid using the method

    >> A = int8(zeros(1e3));

    This statement preallocates a 100-by-100 matrix of int8 first by creating a fullmatrix of doubles, and then converting each element to int8. This costs timeand uses memory unnecessarily. The next statement shows how to do this moreefficiently:

    >> A = repmat(int8(0), 1e3);

    Preallocating a Cell Array

    >>B = cell(2, 3);>>B{1,3} = 1:3;>>B{2,2} = string;

    Maximizing Matlab Performance p.9/32

  • Techniques for Improving performance

    Vectorization: Vectorization means converting for and while loops to equivalentvector or matrix operations.

    MATLAB is a matrix language, which means it is designed for vector and matrixoperations, vectorizing algorithms that take advantage of this design.

    Simple Example of Vectorizing

    i = 0;for t = 0:.01:10

    i = i + 1;y(i) = sin(t);

    end

    A vectorized version of the same code is

    t = 0:.01:10;y = sin(t);

    Maximizing Matlab Performance p.10/32

  • Array Operations for Vectorization

    Array Operations vs. Matrix Operations

    Use array operations to replace loops that perform only simple arithmeticon scalar data.

    for n = 1:100V(n) = 1/12*pi*(D(n)2)*H(n);

    end

    % Perform the vectorized calculationV = 1/12*pi*(D.2).*H;

    The only difference is the use of the .* and ./ operators. Thesedifferentiate array operators (element-by-element operators) from thematrix operators (linear algebra operators), * and /.

    Maximizing Matlab Performance p.11/32

  • Boolean Array Operations

    With logical array indexing, the index parameter is a logical matrix that is the same size as anarray and contains only 0s and 1s. The array elements selected have a 1 in thecorresponding position of the logical indexing matrix.

    D = [-0.2 1.0 1.5 3.0 -1.0 4.2 3.14];D >= 0ans=

    0 1 1 1 0 1 1Vgood = V(D>=0);

    This selects the subset of V for which the corresponding elements of D are nonnegative.

    There are two vectorized Boolean operators, any and all, which perform Boolean AND andOR functions over a vector. Thus you can perform the following test:

    if all(D < 0)warning(All values of diameter are negative.);return;

    end

    Maximizing Matlab Performance p.12/32

  • Matrix Functions of Two Vectors

    Suppose you want to evaluate a matrix function F of two vectors:

    You need to evaluate the function at every point in vector x, and for each point in x,at every point in vector y. In other words, you need to define a grid of values for F,given vectors x and y.

    You can duplicate x and y to create an output vector of the desired size usingmeshgrid. This allows the use of array processing techniques to compute thefunction.

    x = (-2:.2:2);y = (-1.5:.2:1.5);[X,Y] = meshgrid(x, y);F = X .* exp(-X.2 - Y.2);

    Maximizing Matlab Performance p.13/32

  • Ordering, Setting, Counting Operations

    In the examples discussed so far, any calculations done on one element of a vector havebeen independent of other elements in the same vector. However, in many applications, thecalculation that you are trying to do depends heavily on these other values.

    There are many functions available that you can use to vectorize code:

    diff Acts as difference operator: diff(X), for a vector X, is:[X(2) - X(1), X(3) - X(2), ... X(n) - X(n-1)]

    find Finds indices of the nonzero, non-NaN elementsintersect Finds the set intersectionmax Find largest componentmin Find smallest componentsetdiff Finds the set differencesetxor Finds the set exclusive ORsort Sort in ascending orderunion Finds the set unionunique Find unique elements of a set

    Maximizing Matlab Performance p.14/32

  • Coding Loops in a MEX-File

    Matlab callable C and Fortran files are referred to as Mex-Files.

    Used where code cannot be vectorized and/or computation does not run fastenough in Matlab. Recoded in C of Fortran for efficency.

    Large pre-existing C and Fortran programs can be called from MATLABwithout having to be rewritten as M-files.

    Source code for MEX-file consists of two distinct parts:

    A computational routine that contains the code performing the computationsthat you want implemented in the MEX-file.

    A gateway routine that interfaces the computational routine with Matlab by theentry point mexFunction and its parameters. The gateway calls thecomputational routine as a subroutine.

    Further information:www.itc.virginia.edu/research/matlab/#links

    Maximizing Matlab Performance p.15/32

  • Techniques for Improving Performance

    Changing a Variables Data Type or DimensionChanging the data type or array shape of an existing variable slows MATLAB downas it must take extra time to process this. When you need to store data of adifferent type, it is advisable to create a new variable.

    Operating on Real DataWhen operating on real (i.e., noncomplex) numbers, it is more efficient to useMATLAB functions that have been designed specifically for real numbers.

    reallog Find natural logarithm for nonnegative real arraysrealpow Find array power for real-only outputrealsqrt Find square root for nonnegative real arrays

    Load and Save Are Faster Than File I/O FunctionsIf you have a choice of whether to use load and save instead of the low-levelMATLAB file I/O routines such as fread and fwrite, choose the former. load andsave have been optimized to run faster and reduce memory fragmentation.

    Maximizing Matlab Performance p.16/32

  • Converting Scripts to Function M-Files

    MATLAB provides two ways to package sequences of MATLAB commands:

    Function M-files

    Script M-files

    These two categories of M-files differ in two important respects:

    You can pass arguments to function M-files but not to script M-files.

    Variables used inside function M-files are local to that function; you cannotaccess these variables from the MATLAB interpreters workspace unless theyare passed back by the function. By contrast, variables used inside scriptM-files are shared with the callers workspace;

    Matlab code executes more quickly if it is implemented in a function rather than ascript. Functions are compiled into P-code rather than interpreted one line at atime.

    Maximizing Matlab Performance p.17/32

  • Converting Script to Function M-Files

    To convert a script to a function, simply add a function line at the top of the M-file. Forexample, consider the script M-file houdini.m.

    m = magic(4); % Assign 4x4 magic square to m.t = m . 3; % Cube each element of m.disp(t); % Display the value of t.

    Running this script M-file from a MATLAB session creates variables m and t in the MATLABworkspace. Convert this script M-file into a function M-file by simply adding a functionheader line.

    function houdini(sz)m = magic(sz); % Assign magic square to m.t = m . 3; % Cube each element of m.disp(t) % Display the value of t.

    Running houdini.m creates variables m and t in the function workspace but not in theMATLAB workspace. If it is important to have m and t accessible from the MATLABworkspace, you can change the beginning of the function to

    function [m,t] = houdini(sz)

    Maximizing Matlab Performance p.18/32

  • Memory Management

    Working with Large Data Sets

    When working with large data sets, you should be aware that MATLABmakes a temporary copy of A if the called function modifies its value. Thistemporarily doubles the memory required to store the array.

    One way to avoid running out of memory in this situation is to use nestedfunctions. A nested function shares the workspace of all outer functions,giving the nested function access to data outside of its usual scope.

    function myfunA = magic(500);

    function setrowval(row, value)A(row,:) = value;end

    end

    Maximizing Matlab Performance p.19/32

  • Memory Management Functions

    The following functions can help manage memory use in MATLAB:

    whos shows how much memory has been allocated for variables in the workspace.

    pack saves existing variables to disk, and then reloads them contiguously. This reducesthe chances of running into problems due to memory fragmentation.

    clear removes variables from memory. One way to increase the amount of availablememory is to periodically clear variables from memory that you no longer need.

    save selectively stores variables to the disk. This is a useful technique when you areworking with large amounts of data. Save data to the disk periodically, and then use theclear function to remove the saved data from memory.

    load reloads a data file saved with the save function.

    quit exits MATLAB and returns all allocated memory to the system. This can be usefulon UNIX systems as UNIX does not free up memory allocated to an application (e.g.,MATLAB) until the application exits.

    Maximizing Matlab Performance p.20/32

  • Strategies for Efficient Use of Memory

    To conserve memory when creating variables,

    Avoid creating large temporary variables, and clear temporary variables when they areno longer needed.

    When working with arrays of fixed size, preallocate them rather than having MATLABresize the array each time you enlarge it.

    Allocate your larger matrices first (next slide).Set variables equal to the empty matrix [] to free memory, or clear the variables using theclear function.

    Use lower precision data types, if possible.

    Reuse variables as much as possible

    Changing the data type or array shape of an existing variable slows MATLAB down as itmust take extra time to process this. When you need to store data of a different type, it isadvisable to create a new variable.

    In Unix, turn off Java Virtual Machine (JVM), which shares address space with Matlab. TheJVM implements the GUI-based development environment.

    Maximizing Matlab Performance p.21/32

  • Allocating Large Matrices Earlier

    MATLAB uses a heap method of memory management. It reuses memory as longas the size of the memory segment required is available in the MATLAB heap.

    For example, these statements use approximately 15.4 MB of RAM:

    a = rand(1e6,1);b = rand(1e6,1);

    This statement uses approximately 16.4 MB of RAM:

    c = rand(2.1e6,1);

    These statements use approximately 32.4 MB of RAM. This is because MATLAB isnot able to fit a 2.1 MB array in the space previously occupied by two 1-MB arrays:

    a = rand(1e6,1);b = rand(1e6,1);clearc = rand(2.1e6,1);

    Maximizing Matlab Performance p.22/32

  • Allocating Large Matrices Earlier

    The simplest way to prevent overallocation of memory is to allocate the largestvectors first. These statements use only about 16.4 MB of RAM:

    c = rand(2.1e6,1);cleara = rand(1e6,1);b = rand(1e6,1);

    Maximizing Matlab Performance p.23/32

  • Sparse Matrices

    It is best to store matrices with values that are mostly zero in sparseformat. Sparse matrices can use less memory and may also be faster tomanipulate than full matrices.

    Compare two 1000-by-1000 matrices: X, a matrix of doubles with 2/3 ofits elements equal to zero; and Y, a sparse copy of X.

    >>Y=sparse(X);>>whosName Size Bytes Class

    X 1000x1000 8000000 double arrayY 1000x1000 4004000 double array (sparse)

    Maximizing Matlab Performance p.24/32

  • The Matlab Compiler

    OverviewThe Matlab Compiler takes function M-files as input and creates redistributable,stand-alone applications or software components that are platform specific. TheMatlab Compiler can generate

    Stand-alone applications applications that do not require MATLAB at run-time;they can run even if MATLAB is not installed on the end-users system.

    C and C++ shared libraries (dynamically linked libraries, or DLLs, on MicrosoftWindows). These can be used without MATLAB on the end-users system.Excel add-ins; requires MATLAB Builder for Excel.

    COM objects; requires MATLAB Builder for COMi.The MATLAB Compiler supports nearly all the functionality of MATLAB, includingobjects. Some toolboxes will not compile with MATLAB Compiler 4.Documenation: www.itc.virginia.edu/research/matlab/compiler.html#Overview

    Maximizing Matlab Performance p.25/32

  • The Matlab Compiler

    Advantages of Compiling Matlab M-Files as Stand-alone Executables

    Possible speed-up in program run-time.

    Standalone executable can be run outside Matlab environment.

    No Matlab network licenses required to run program.

    Allows multiple Maltab programs to run concurrently without usingmultiple network licenses.

    Allows multiple instances of the same program with different inputs torun concurrently, e.g. parameter space studies or Monte Carlosimulations. Effectively provides "embarrassing parallel" processing.

    Maximizing Matlab Performance p.26/32

  • The Matlab Compiler

    Stand-alone Executables

    There is a set of functions that is not supported in stand-alone mode. Thesefunctions fall into these categories:

    Functions that print or report MATLAB code from a function. For example, theMATLAB help function or debug functions, will not work.

    Simulink functions, in general, will not work.

    Functions that require a command line, for example, the MATLAB lookforfunction, will not work.

    clc, home, and savepath will not do anything in deployed mode.

    The source code for stand-alone applications consists either entirely of M-files orsome combination of M-files, MEX-files, and C or C++ source code files.

    You can call MEX-files from Compiler-generated stand-alone applications. TheMEX-files will then be loaded and called by the stand-alone code.

    Maximizing Matlab Performance p.27/32

  • Compiling Matlab M-Files

    Configuring Your AccountThese steps are necessary and specific to accounts on the Aspen LinuxCluster to use the Matlab Compiler and to run the resulting standaloneexecutables.

    From within Matlab, run the command mbuild -setup command toinstall a local compiler options file.

    Edit the options file so that the Matlab compiler calls an updatedversion of the gcc compiler.

    Edit your startup file (e.g. .variables.ksh) so that yourLD_LIBRARY_PATH automatically includes the platform specificMatlab libraries needed by the standalone executable.

    www.itc.virginia.edu/research/matlab/compiler.html#Config

    Maximizing Matlab Performance p.28/32

  • Compiling Matlab M-Files

    Using the Matlab CompilerThe Matlab Compiler will only compile function M-files, so it will be necessary to convert ascript M-file to a function M-file as described previously.

    To test the compiler and account configuration, from within Matlab compile the functionmagicsquare.m.

    >> mcc -m -R "-nojvm" -v magicsquare

    The -m option tells the Matlab Compiler (mcc) to generate a C stand-alone application.The -v option (verbose) displays the compilation steps throughout the process.The -R "nojvm" option tell the Matlab Compiler not to use the Java Virtual Machine.

    Additional M-files called by the main M-file would be appended to the compilationcommand line.

    www.itc.virginia.edu/research/matlab/compiler.html#Using

    Maximizing Matlab Performance p.29/32

  • Compiling Matlab M-Files

    Example M-file for Concurrent SimulationsThe following URL provides an example of a function M-file that can becompiled once and then be used to run multiple concurrent simulationsusing different input and output data sets.

    The standalone executable reads both character and numeric data fora user-defined file.

    The standalone executable reads in and writes out data tosimulation-specific named files.

    The following URL provides an example of how the standalone exectuablecan be submitted to run on the Aspen Cluster.

    www.itc.virginia.edu/research/matlab/compiler.html#Example

    Maximizing Matlab Performance p.30/32

  • Maximizing Matlab Performance

    Conclusions

    Profile your code to identify and correct inefficient execution.

    Be aware of vectorization opportunities, use MEX-files if necessary.

    Preallocate arrays, largest first.

    Clear and reuse variables

    Future Directions

    Matlab will likely have continued improvements in processing speed.

    Matlab 7.01 provides 64-bit for Linux on AMD Opteron. Limit on the number ofelements for each matrix (2.1475E+9) will remain, but you will be able to usethe additional memory to store more variables

    Message Passing Parallel Matlab, MatlabMPI, to be available on Aspen.www.ll.mit.edu/MatlabMPI

    Maximizing Matlab Performance p.31/32

  • Upcoming Talks at the RCSC

    March 23, 3:30 PM: "SAS/IML - Interactive Matrix Programming" byKathy Gerber.

    April 6, 3:30 PM: "Finding Global Optima with the Genetic Algorithm/Direct Search Toolbox" by Ed Hall.

    April 27, 3:30 PM: "Objected Oriented Programming with Fortran 95(FTN95)" by Katherine Holcomb.April 20, 3:30 PM: "Statistical Shape Analysis" by Kathy Gerber.

    Further information on these talks can be found at the URLwww.itc.virginia.edu/research/news/newsletterMar05.html#colloquia

    Maximizing Matlab Performance p.32/32

    Outline of TalkProfiling for Improved PerformanceThe Matlab ProfilerProfiling Process GuidelinesProfiling for Improved PerformanceThe M-Lint Code Check ReportTechniques for Improving PerformancePreallocating ArraysTechniques for Improving performanceArray Operations for VectorizationBoolean Array OperationsMatrix Functions of Two VectorsOrdering, Setting, Counting OperationsCoding Loops in a MEX-FileTechniques for Improving PerformanceConverting Scripts to Function M-FilesConverting Script to Function M-FilesMemory ManagementMemory Management FunctionsStrategies for Efficient Use of MemoryAllocating Large Matrices EarlierAllocating Large Matrices EarlierSparse MatricesThe Matlab CompilerThe Matlab CompilerThe Matlab CompilerCompiling Matlab M-Files Compiling Matlab M-FilesCompiling Matlab M-FilesMaximizing Matlab PerformanceUpcoming Talks at the RCSC