Restricted © Siemens AG 2015 All rights reserved. FEMAP SYMPOSIUM 2015
Realize Innovation.
NX Nastran Performance
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 2 Siemens PLM Software
Improving NX Nastran Performance
Challenges Increased problem size § 2004 – (1.2 million) DOF (large model) § 2011 – (10 – 20 million) DOF(typical models) § 2015 – (30 – 50 million) DOF (expected)
Solutions q Selecting the right hardware and OS q Utilizing hardware efficiently - Tuning OS settings q Defining appropriate NX Nastran keywords and
parameters for the solve q Take advantage of nastran parallel processing q Select appropriate solution methods to reduce
elapsed time
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 3 Siemens PLM Software
Hardware and OS Selection
q Processors § Prefer faster processors § Choose large L2 or L3 processor cache. Larger caches provide improved performance § Prefer multi-core processors
q Memory § Install as much memory as possible. Unallocated memory will be used by the OS for I/
O cache. q Disk § Increase disk performance by using SSD disks. Faster I/O leads to reduced elapsed
time. § PCIe disks are a new option. Actually outperforms SATA or SCSI hosted SSD § Prefer multiple disks (1 + 4). One for the OS and the remaining disks in RAID0
configuration for Nastran scratch
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 4 Siemens PLM Software
Hardware and OS Selection
q GPU and Intel MIC § GPU processing requires expensive high end card(Firepro W9100 with 16GB) § GPU card requires enough memory to hold Nastran module data in core § GPU processing only helps for special problems(freq response with 5000+ modes) § Technology changing rapidly
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 5 Siemens PLM Software
Hardware and OS Selection
q Priorities for getting the most performance for the least money
§ Maximum number of fast cores with large cache
§ Add as much RAM as possible
§ Maximize I/O bandwidth and disk speed
§ Add GPU processing for some large dynamics problems
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 6 Siemens PLM Software
OS Settings: I/O Cache
q Why? § Reading from and writing to disk are slow on
mechanical drives § Same part of the disk is read several times § Data that is typically written is probably read
back soon
q How § Keeping information in memory instead of
disk will reduce disk seek times § Make use of unallocated memory for buffer
cache § When application needs memory, cache
manager pages memory to disk (Application page or I/O cache page?)
Tota
l Phy
sica
l Mem
ory
O/S
O
ther
Pr
oces
ses
NX
Nas
tran
I/O
Cac
he
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 7 Siemens PLM Software
OS Settings: Enabling Disk I/O Cache
q Read cache is enabled by default on Linux and Windows (superfetch feature) q Enable write cache on Linux using “hdparm” command or equivalent q On windows use “System Properties” advanced settings to enable write-cache
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 8 Siemens PLM Software
I/O Cache and Paging - Windows
q Reasons § As file size becomes larger than system
memory, the OS runs out of memory § OS cache manager will page out memory last
unused memory § Pages from nastran can be paged out to
accommodate I/O cache
q Prevention § Limit windows I/O cache to 25% -50% of
physical memory using “cache_tool” (available on request)
§ Turn off file cache – Add command line option “sysfield=buffio=yes,raw=yes”
Tota
l Phy
sica
l Mem
ory
O/S
O
ther
N
X N
astr
an
I/O C
ache
Page Out
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 9 Siemens PLM Software
NX Nastran Settings: Memory
q Starting with NXN 10 new default settings in rcf file q buffsize=32769 q memory=.45*physical q smem=20.0X q buffpool=20.0X
q More robust settings that are more appropriate for large models and
machines with more memory
q Inspect the F04 file to see if you have optimum settings for your model Note: unless SMEM is large enough to contain all scratch files, it is better to set it to zero. Check F04 file summary.
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 10 Siemens PLM Software
NX Nastran Settings: Memory
*** USER INFORMATION MESSAGE 4157 (DFMSYN) PARAMETERS FOR SPARSE DECOMPOSITION OF DATA BLOCK KLL ( TYPE=RDP ) FOLLOW MATRIX SIZE = 70345 ROWS NUMBER OF NONZEROES = 2701957 TERMS NUMBER OF ZERO COLUMNS = 0 NUMBER OF ZERO DIAGONAL TERMS = 0 CPU TIME ESTIMATE = 78216 SEC I/O TIME ESTIMATE = 25 SEC MINIMUM MEMORY REQUIREMENT = 1364 K WORDS MEMORY AVAILABLE = 32615 K WORDS MEMORY REQR'D TO AVOID SPILL = 12305 K WORDS MEMORY USED BY BEND = 3651 K WORDS EST. INTEGER WORDS IN FACTOR = 87006 K WORDS EST. NONZERO TERMS = 174758 K TERMS
§ Word Size = 8 bytes (ILP-64 – long integers) § Word Size = 4 bytes (LP-64 – short integers)
q Specify enough memory to avoid disk spillover § at least 1.2 to 1.3 times the memory required to avoid spill
q Do not specify more than 50% of the memory for NX Nastran. This will leave the OS more room for I/O cache
q Insufficient memory can affect re-ordering method leading to very slow matrix decomposition. Make sure either BEND or METIS method is selected
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 11 Siemens PLM Software
Memory Available
> Memory Required to Avoid Spill
Memory Available
< Memory Required to Avoid Spill
Memory Available
>> Memory Required to Avoid Spill
NX Nastran Settings: Memory …
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 12 Siemens PLM Software
NX Nastran Settings: Memory
q Even when memory is sufficient for matrix decomposition, other modules such as
MPYAD might make multiple passes when memory is insufficient. Multiple passes translates to more I/O
12:09:45 143:59 5182.9G 0.0 17602.1 0.0 DISPRS 293 SMPYAD BEGN METHOD 1 NT, STORAGE 2, NBR PASSES= 4, EST. CPU= 409.3, I/O= 82.3, TOTAL= 491.6 12:09:45 143:59 5182.9G 4.0 17602.1 0.0 MPYAD BGN P=4 12:12:13 146:27 5206.2G 23821.0 17817.4 215.3 MPYAD PASS= 1 12:14:43 148:57 5228.8G 23199.0 18031.6 214.2 MPYAD PASS= 2 12:17:13 151:27 5251.5G 23190.0 18246.0 214.4 MPYAD PASS= 3 12:19:43 153:57 5274.1G 93414.0 18460.5 858.4 MPYAD END
Number of Passes
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 13 Siemens PLM Software
NX Nastran Settings: scratch directory
It is important to specify the correct location of the scratch file folder – use the “sdirectory” or “sdir” keyword q Scratch folder should point to a fast disk or disks configured
in a RAID array (RAID0) q Prefer local disks over network mounted (using dedicated
GigE or Infiniband connection) fast file systems q Scratch folder pointing to a generic network file system
(NFS) will have significant performance penalties because slow I/O goes over a general shared network
q Set “sdir” keyword in the rcf file
SCRATCH
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 14 Siemens PLM Software
NX Nastran Settings: scratch directory
When running from Femap set File/Preferences to control Femap scratch and Nastran scratch
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 15 Siemens PLM Software
NX Nastran: Parallel Processing
Types of Parallelism q Shared memory (SMP) q Distributed memory (DMP)
SMP DMP Hardware Desktop Desktop/Cluster Operation level Low level
operations are threaded
Higher level. Matrix partitioned at a higher level
Software Open MP and Intel MKL
Message Passing Interface (MPI)
Scalability Tapers off at 8 to12 processors
Highly scalable
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 16 Siemens PLM Software
Shared Memory Architecture
Uniform Memory Access (UMA) q Identical processors.
q Symmetric in geometry. Also known as symmetric multiprocessor (SMP)
q Equal access to memory
q If one processor updates a location in the shared memory, all other processors know about it
q Only one processor can access memory at a given instant
P: Processor C: Cache
P
C
P
C
P
C
P
C I/O
MEMORY
SYSTEM BUS
COMPUTE NODE
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 17 Siemens PLM Software
NX Nastran SMP
q Easy to use. Specify smp=n or parallel=n in nastran command line( Femap Executive and Solution Options)
q Available on all NX Nastran supported platforms q Available in all solution types
q Modules parallelized § Matrix decomposition (DCMP) § Multiply Add (MPYAD) § Forward-Backward Substitution (FBS) § Frequency response (FRRD1) § Driver module for Sol 401 (NLTRD3) § Other modules that indirectly call DCMP, MPYAD, FBS
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 18 Siemens PLM Software
NX Nastran: Distributed Memory Processing
q Program is broken into tasks
q Multiple tasks can reside on the same machine and/or across arbitrary number of machines
q Tasks exchange data through communications by sending and receiving messages (message passing)
q Data transfer requires cooperative operations to be performed by each process
Task 0 data
Task 2 data
Send
Receive
Task 1 data
Task 3 data
Send
Receive
NETWORK
NODE 1 NODE 2
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 19 Siemens PLM Software
NX Nastran DMP
q Available in Sol 101, Sol 103, Sol 105, Sol 108, Sol 111, Sol 112 and Sol 200
q Partitioning of geometry
q Partitioning of frequency
q Partitioning of loads
q Available on Linux x86_64 and on windows.
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 20 Siemens PLM Software
NX Nastran Linear Contact Solutions
2.0mm 1.0mm 0.5mm
Search Distance q Select element iterative solver
• When 3D elements are > 90% of total number of elements
• When solution is linear statics
q Specify proper search distance. Large search distances typically involve more active contacts for the first few iterations
q Adjust the global contact parameters MAXF and/or CTOL to reduce the number of iterations
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 21 Siemens PLM Software
0
50000
100000
150000
200000
250000
300000
350000
1 2 3 4 5 6 7 8 9 10
Num
ber o
f Con
tact Status C
hanges
Itera4ons
Search distance = 2mm Search distance = 1mm Search distance = 0.5mm
NX Nastran Linear Contact Solutions
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 22 Siemens PLM Software
NX Nastran Modal Solution
q Use RDMODES (Recursive modes). Partitions the model into “nrec” partitions • No big triangular solves • No orthogonalization • Reduced I/O • Approximate solution • Used when large number of modes are to be computed • Can be used with SMP, DMP or in Hybrid mode
q Use system cell 462=1 • When large amount of memory is available • Frequency response runs in-core
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 23 Siemens PLM Software
RDMODES Performance
0
100
200
300
400
500
600
1 2 4 8
Elap
sed
Tim
e (m
ins)
Number of Processors
SMP
DMP
DMP_SMP
Hardware Processor Intel Xeon 5690
(3.47 GHz)
L1,L2,L3 cache 32KB, 256KB, 12MB
Cores 6 per socket and 2 sockets
Memory 96GB
Disks 6 x 585 GB disks in RAID0
Engine Block Model
DOF 21945096
CTETRA 2233552
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 24 Siemens PLM Software
Concluding Remarks
q Judicious selection of hardware can improve performance significantly
q Maximize usage of machine resources by making appropriate choices in the OS and solver. § OS Settings § Memory Management § Parallel Processing § Contact settings § Solution Methods
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 25 Siemens PLM Software
OS
SMEM
NX Nastran
Types of I/O Cache
Different levels of I/O cache q Application (NX Nastran) I/O cache § Scratch memory (smem) § Buffer pool (bpool or buffpool)
q OS I/O cache q Device driver I/O cache
q Cache Performance depends on the hardware and on the operating system
q For efficient disks and OS cache, NX Nastran I/O cache (smem, bpool) is expected to be marginal
Disk
BPOOL
Cache
Cache Device Driver
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 26 Siemens PLM Software
Windows – Excessive I/O Cache
q Symptoms: § Machine unresponsive and § Solution takes a long time
Executable Paging
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 27 Siemens PLM Software
Windows – Excessive I/O Cache cont…
Scratch data cached
More Scratch data cached
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 28 Siemens PLM Software
NX Nastran Lanczos Performance Options
q Space saver option − Set system cell 229=1 (default = 0). This will not preserve factor
matrices for later use (used when the lower bound and upper bound frequency ranges are specified in the EIGRL card). This reduces scratch usage
q Sparse solver memory in Lanczos − Set system cell 146 (or FBSMEM) to a value > 1 (2 or 3). Reserves
more memory for factor but reduces amount of memory available for eigenvectors
q I/O Reduction Options − Set system cell 193=1 (result of mass matrix multiply is not saved) − Set system cell 199 = k. Sets memory for mass matrix multiply ( 2 x
k x BUFFSIZE) . Default value is k=1.
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 29 Siemens PLM Software
** MASTER DIRECTORIES ARE LOADED IN MEMORY. USER OPENCORE (HICORE) = 804910800 WORDS
EXECUTIVE SYSTEM WORK AREA = 316925 WORDS
MASTER(RAM) = 78676 WORDS
SCRATCH(MEM) AREA = 268443648 WORDS ( 8192 BUFFERS)
BUFFER POOL AREA (GINO/EXEC) = 268427231 WORDS ( 8189 BUFFERS)
TOTAL NX NASTRAN MEMORY LIMIT = 1342177280 WORDS
NX Nastran: Memory Management
Scratch (RAM)
Master (RAM)
Buffer Pool Area
User Open Core
Executive System Work Area
F04 file
Mem
ory
(from
“mem
” key
wor
d
Mem
ory
for F
ile a
nd
Exe
cutiv
e Ta
bles
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 30 Siemens PLM Software
Shared Memory Processing
q Program is broken into discrete instructions as in serial run
q Parts of the program run in serial. Some of the instructions are then spawned into threads (tasks)
q Each of the thread then can run concurrently on a different processor
q Threads share resources and communicate with each other through global memory (updating address space)
Processor
Processor
Processor
Processor
PRO
BLEM
INSTRUCTIONS
Shared memory processing on a single node
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 31 Siemens PLM Software
Shared Memory Architecture
Non-Uniform Memory Access (NUMA) q Memory is logically and sometimes
physically distributed.
q Processors have access to their own memory and also have access to other memory via bus interconnect
q Not all processors have equal access time to all memories
q Similar to having multiple UMA
C: Cache P: Processor
SYSTEM BUS SYSTEM BUS
P
C
P
C
MEMORY
P
C
P
C
MEMORY
Distributed Shared Memory Network
COMPUTE NODE
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 32 Siemens PLM Software
Hybrid Memory Architecture
q Combines shared memory (NUMA or UMA) and distributed memory architecture
q Communication across nodes uses
MPI
q Intra node uses the shared memory
processing
P C
P C
MEMORY
P C
P C
MEMORY
Distributed Shared Memory Network
P C
P C
MEMORY
P C
P C
MEMORY
Distributed Shared Memory Network
MPI
SMP
SMP
NODE 1
NODE 2
2015-09-23
Restricted © Siemens AG 2015 All rights reserved.
Page 33 Siemens PLM Software
Distributed Memory Architecture
q Processors have their own local memory and resources like NUMA node group
q Because each processor has its own local memory, it operates independently
q Communication between nodes
is through message passing interface (MPI)
q When a processor needs to access to data from another processor, this has to be handled programmatically
I/O
P C
P C
MEMORY
P C
P C
MEMORY
NETWORK INTERCONNECT
I/O
MPI
I/O
NODE 1
NODE 2
SYSTEM BUS
SYSTEM BUS