Upload
ngoque
View
226
Download
0
Embed Size (px)
Citation preview
1 © 2011 The MathWorks, Inc.
MATLAB Under the Hood
Optimizing Performance and Memory with MATLAB
Tim Mathieu – Sr. Account Manager
Gerardo Hernandez – Application Engineer
Abhishek Gupta – Application Engineer
12
Working with Large Data
Understanding the constraints – RAM, OS, BLAS, LAPACK
Working within MATLAB –data storage and copying
Minimizing memory usage – precision, selective loading and
plotting, stream processing
Speeding Up MATLAB Programs
Leveraging vector and matrix operations
Detecting and addressing bottlenecks
Agenda – MATLAB Under the Hood
13
Working with Large Data
Understanding the constraints – RAM, OS, BLAS, LAPACK
Working within MATLAB –data storage and copying
Minimizing memory usage – precision, selective loading and
plotting, stream processing
Speeding Up MATLAB Programs
Leveraging vector and matrix operations
Detecting and addressing bottlenecks
Agenda – MATLAB Under the Hood
Have you ever had an “Out of Memory” error?
14
How much memory is there on 32 bit OS?
4GB addresses, numbered:
– From 0x00000000 (32 bit)
– To 0xFFFFFFFF (32 bit)
OS reserves addresses for itself
Leaves the rest for processes
Independent of system RAM
Virtualization duplicates upper
addresses for each process
Memory Addresses
Operating System
Process
15
Virtualization
Reuses addresses
across multiple processes
Feature of all modern
operating systems
Managed automatically
Disk page file provides
additional data storage
Total Data Storage =
Physical RAM
+
Page file on disk
16
Memory is Often OS-Bound
32-bit systems have 4GB of addressable process
memory
– Part of it is reserved by the OS, leaving the application < 4GB
Windows XP (default): 2GB
Windows XP with /3gb switch: 2GB + 1GB
Linux/UNIX/Mac: ~3GB
64-bit systems allow 8TB of addressable memory
In MATLAB, data needs to be defined in contiguous
memory
17
What is the largest array you can create in
MATLAB on 32 bit Windows XP (bytes)?
a) 0.5 GB
b) 1.0 GB
c) 1.5 GB
d) 2.0 GB
e) 2.5 GB
Memory Addresses
Operating System
Process
18
What is the largest array you can create in
MATLAB on 32 bit Windows XP (bytes)?
Memory Addresses
Operating System
Process
• Windows 2000/XP/Vista
• Reserve 2 GB of addresses
• Configurable with /3gb switch
• Linux and Mac OSX
• Typically 1 GB of addresses
• Some customization possible
• 300 MB for Overhead
• Java Virtual Machine (JVM)
• Libraries
• 1.7 GB for Arrays
• Typically 1.5 GB Contiguous
• Remaining Fragmented
19
Contiguous Memory
Why do we need contiguous memory?
How much contiguous memory is available?
How can contiguous memory be controlled?
20
MATLAB Performance Technologies
Commercial libraries
– BLAS: Basic Linear Algebra
Subroutines (multithreaded)
– LAPACK: Linear Algebra Package
– etc.
JIT/Accelerator
– Improves looping
– Generates on-the-fly multithreaded code
– Continually improving
21
Memory Fragmentation
Checked available memory
– >> memory (only for Windows)
Preallocate Arrays
– Preallocate large matrices first
Controlled contiguous memory with startup switch
– C:\matlab –shield medium
22
Working with Large Data
Understanding the constraints – RAM, OS, BLAS, LAPACK
Working within MATLAB –data storage and copying
Minimizing memory usage – precision, selective loading and
plotting, stream processing
Speeding Up MATLAB Programs
Leveraging vector and matrix operations
Detecting and addressing bottlenecks
Agenda – MATLAB Under the Hood
23
MATLAB Data Storage Model
How does MATLAB store data?
How much overhead is there for arrays,
structures, and cell arrays?
When are data copies made?
24
How does MATLAB store data? Container overhead
d = [1 2] % Double array
dcell = {d} % Cell array containing “d”
dstruct.d = d % Structure containing “d”
whos
>> overhead.m
25
How does MATLAB store data? Container overhead
d Header (60)
Data
d = [1 2] dcell = {d}
dcell Header (60)
Data
Cell Header (60)
dstruct.d = d
dstruct Header (60)
Data
Element Header (60)
Fieldname (64)
26
How does MATLAB store data? Structures
s.A = rand(4000,3200);
s.B = rand(4000,3200);
sNew = s;
s.A(1,1) = 17;
sNew.B(1,1) = 0;
sNew = s;
.A 100MB
.B 100MB
s
sNew
.A 100MB
.B 100MB
Memory Used
100 MB 200 MB 300 MB 400 MB
>> structmem2.m
27
When is data copied? Function calls
function y = foo(x,a,b)
a(1) = a(1) + 12;
y = a * x + b;
When does MATLAB copy memory upon calling a function?
y = foo(1:3,2,4)
– i.e., x = 1:10000, a = 2, b = 4
Answer: “a” is copied
28
When is data copied? In-Place optimizations
When does MATLAB perform calculations “in-place”?
y = 2*x + 3;
x = 2*x + 3;
in-place
NOT in-place
• Output variable name same as input variable name
• Element-wise computation
>> inplaceEx.m
29
In-place Optimizations
What happens during “in-place” operations?
x = rand(5000);
y = rand(5000);
%% NOT In-Place
y = sin(sqrt(2*x.^5+3*x+4));
%% In-Place
x = sin(sqrt(2*x.^5+3*x+4));
30
Summary of Memory Usage in MATLAB
When is data copied? – “Lazy” copy: only when necessary (copy on write)
– Never, if operation can be performed “in-place”
– “in-place” is faster because memory is not copied
How does MATLAB store data? – Every array has overhead
– Structures and cell arrays are containers which can hold multiple arrays
31
Working with Large Data
Understanding the constraints – RAM, OS, BLAS, LAPACK
Working within MATLAB –data storage and copying
Minimizing memory usage – precision, selective loading and
plotting, stream processing
Speeding Up MATLAB Programs
Leveraging vector and matrix operations
Detecting and addressing bottlenecks
Agenda – MATLAB Under the Hood
32
Use Only the Precision You Need
Numerical data types
– Float: double and single precision (8 and 4 bytes)
– Integer: signed and unsigned (1-4 bytes)
Floating point for math (e.g. linear algebra)
Integers where appropriate (e.g. images)
>> datatypeEx.m
>> datatypeEx2_double.m
>> datatypeEx2_single.m
33
Sparse Matrices
Why use sparse?
– Less Memory
Store only the nonzero elements of the matrix
and their indices
– Faster
Reduce computation time by eliminating
operations on zero elements
When to use sparse?
– < 1/2 dense on 64-bit (double precision)
– < 2/3 dense on 32-bit (double precision)
– Sparse matrices often much sparser
than cutoff limit
34
Using Sparse Matrices
Creation
– S = sparse(i,j,s,m,n)
– A = spdiags(B,d,m,n)
Structure and Efficiency
– Different storage convention from that of full matrices
– If matrix highly rectangular, use tall and skinny, not short and fat
Functions that support sparse matrices
– >> help sparfun
Blog Post: Creating Sparse Finite Element Matrices
– http://blogs.mathworks.com/loren/2007/03/01/creating-sparse-
finite-element-matrices-in-matlab/
35
Copy and Create Only What You Need
Share data between functions
– Nested Functions
– Global (but sparingly)
Understand vectorization tradeoffs
– Limit creating intermediate matrices
Use bsxfun
– Reduce size of array to scalars or smaller blocks
Process with for loops, de-vectorize
Slower runtime but less memory
>> bsxfunEx.m
>> vectorizeEx.m
36
Plot Only What You Need
Every plot independently stores x and y data
>> x = rand(125e4,1); %10MB
>> plot(x) ; %20MB for x and y data
Integers are plotted as doubles
Strategies
– Downsample or resample your data prior to plotting
Built-in functions for resampling your data (e.g. interp1)
imresize from the Image Processing Toolbox for images
– Divide your data into regular intervals and plot values of interest
(e.g. open and close for stock prices, or min/max values)
37
Load Only the Data You Need
ASCII file
– textscan(…)
– Selectively choose columns to load or ignore
– Selectively choose rows to load (i.e. block processing)
Binary file
– memmapfile(…)
– Read and write directly to/from file on disk
– Can access files on disk in the same way it accesses dynamic
memory, overlay address space directly onto file
– MATLAB dynamically shifts address space to handle larger files
e.g. >1.5 GB files on 32 bit Windows can be accessed
38
MATLAB is best at batch processing…
Load the entire file and process it all at once
Source
Batch
Processing
Algorithm
Memory
Memory
… but stream processing is
better for some algorithms Load a frame and process it before moving on to the next frame
MATLAB
Stream
Source
Stream
Processing
42
Working with Large Data
Understanding the constraints – RAM, OS, BLAS, LAPACK
Working within MATLAB –data storage and copying
Minimizing memory usage – precision, selective loading and
plotting, stream processing
Speeding Up MATLAB Programs
Leveraging vector and matrix operations
Detecting and addressing bottlenecks
Agenda – MATLAB Under the Hood
43
Example: Block Processing Images
Evaluate function at grid points
Reevaluate function
over larger blocks
Compare the results
Evaluate code performance
>> blockAvg.m
>> blockAvgRedo1.m
>> blockAvgRedo2.m
44
Summary of Example
Used built-in timing functions
>> tic
>> toc
Used M-Lint to find
suboptimal code
Preallocated arrays
Vectorized code
45
Effect of Not Preallocating Memory
>> x = 4
>> x(2) = 7
>> x(3) = 12
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
4 4
4
7
4
7
4
7
12
x(3) = 12 x(2) = 7
46
Benefit of Preallocation
>> x = zeros(3,1)
>> x(1) = 4
>> x(2) = 7
>> x(3) = 12
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
0
0
0
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
0
0
0
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
4
0
0
4
7
0
4
0
0
7
12
47
Data Storage of MATLAB Arrays
>> x = magic(3)
x =
8 1 6
3 5 7
4 9 2
0x0000
0x0008
0x0010
0x0018
0x0020
0x0028
0x0030
0x0038
0x0040
0x0048
0x0050
0x0058
0x0060
0x0068
See the June 2007 article in “The MathWorks News and Notes”:
http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html
8
3
4
1
5
9
6
7
2
48
Indexing into MATLAB Arrays
Subscripted
– Access elements by rows and columns
Linear
– Access elements with a single number
Logical
– Access elements with logical operations or mask
1 4 7
2 5 8
3 6 9
1,1 1,2 1,3
2,1 2,2 2,3
3,1 3,2 3,3
Linear indexing
Subscripted indexing
ind2sub sub2ind
>> logicalIndex.m
49
MATLAB Performance Technologies
Commercial libraries
– BLAS: Basic Linear Algebra
Subroutines (multithreaded)
– LAPACK: Linear Algebra Package
– etc.
JIT/Accelerator
– Improves looping
– Generates on-the-fly multithreaded code
– Continually improving
50
Other Best Practices for Performance
Minimize dynamically changing path >> addpath(…)
>> fullfile(…)
Use the functional load syntax >> x = load('myvars.mat')
x =
a: 5
b: 'hello'
Minimize changing variable class >> x = 1;
>> x = 'hello';
instead of cd(…)
instead of load('myvars.mat')
51
Summary
Techniques for addressing performance
– Vectorization
– Preallocation
Consider readability and maintainability
– Looping vs. matrix operations
– Subscripted vs. linear vs. logical
– etc.
52
Working with Large Data
Understanding the constraints – RAM, OS, BLAS, LAPACK
Working within MATLAB –data storage and copying
Minimizing memory usage – precision, selective loading and
plotting, stream processing
Speeding Up MATLAB Programs
Leveraging vector and matrix operations
Detecting and addressing bottlenecks
Agenda – MATLAB Under the Hood
53
Example: Fitting Data
Load data from multiple files
Extract a specific test
Fit a spline to the data
Write results to Microsoft Excel
>> testFit.m
>> testFitRedo1.m
>> testFitRedo2.m
>> testFitRedo3.m
54
Summary of Example
Used profiler to analyze code
Targeted significant bottlenecks
Reduced file I/O
Reused figure
55
Interpreting Profiler Results
Focus on top bottleneck
– Total number of function calls
– Time per function call
Functions
– All function calls have overhead
– MATLAB functions often take vectors or matrices as inputs
– Find the right function – performance may vary
Search MATLAB functions (e.g., textscan vs. textread)
Write a custom function (specific/dedicated functions may be faster)
Many shipping functions have viewable source code
56
Classes of Bottlenecks
File I/O
– Disk is slow compared to RAM
– When possible, use load and save commands
Displaying output
– Creating new figures is expensive
– Writing to command window is slow
Computationally intensive
– Use what you’ve learned today
– Trade-off modularization, readability and performance
– Integrate other languages or additional hardware
e.g. emlmex, MEX, GPUs, FPGAs, clusters, etc.
57
Steps for Improving Performance
First focus on getting your code working
Then speed up the code within core MATLAB
Consider additional processing power
59
MathWorks Contact Information
For pricing, licensing, trials and general questions:
Tim Mathieu
Sr. Account Manager
Education Sales Department
Email: [email protected]
Phone: 508.647.7016
Customer Service: [email protected]
508.647.7000
Technical Support: [email protected]
508.647.7000