Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
A Comparative Study of Conformingand Nonconforming High-Resolution
Finite Element SchemesMatthias Möller
Institute of Applied Mathematics (LS3)TU Dortmund, Germany
European Seminar on ComputingPilsen, Czech Republic
June 25, 2012
Montag, 25. Juni 12
Family of AFC schemes Kuzmin et al.
MLu = [K+D]u+ F(u,u)
artificial diffusion operator
antidiffusive correctionlumped mass matrix
Montag, 25. Juni 12
linearized flux-correction
nonlinear flux-correction
Family of AFC schemes Kuzmin et al.
NL-GP NL-FCT NL-TVD Low-order
MLu = MLuL + F (uL, uL)Lin-FCT
F (u) = �DuF (u, u) = [ML �MC ]u�Du F ⌘ 0
MLu = [K+D]u+ F(u,u)
Montag, 25. Juni 12
linearized flux-correction
nonlinear flux-correction
Family of AFC schemes Kuzmin et al.
AFC-LPT. . .
NL-GP NL-FCT NL-TVD Low-order
MLu = MLuL + F (uL, uL)Lin-FCT
F (u) = �DuF (u, u) = [ML �MC ]u�Du F ⌘ 0
MLu = [K+D]u+ F(u,u)
Montag, 25. Juni 12
Review of design principles
■ Jameson‘s Local Extremum Diminishing criterion
IF
THEN local solution maxima/minima do not increase/decrease
positive not negative
miui =X
j 6=i
�ij(uj � ui)
Montag, 25. Juni 12
Review of design principles
■ Jameson‘s Local Extremum Diminishing criterion
IF
THEN local solution maxima/minima do not increase/decrease
■ Semi-discrete high-resolution scheme
positive not negative
miui =X
j 6=i
�ij(uj � ui)
miui =X
j 6=i
(kij + dij)(uj � ui) + �iui +X
j 6=i
↵ijfij
mi =X
jmij
not negative by construction of
diffusion coefficient
controlled byflux limiter
Montag, 25. Juni 12
Review of design principles
■ Jameson‘s Local Extremum Diminishing criterion
IF
THEN local solution maxima/minima do not increase/decrease
■ Semi-discrete high-resolution scheme
■ Edge-based assembly of operators/vectors is feasible and efficient!
positive not negative
miui =X
j 6=i
�ij(uj � ui)
miui =X
j 6=i
(kij + dij)(uj � ui) + �iui +X
j 6=i
↵ijfij
mi =X
jmij
not negative by construction of
diffusion coefficient
controlled byflux limiter
Montag, 25. Juni 12
Bilinear Q1 FE
■ nodal values at cell vertices
Rotated bilinear ~Q1 FE
■ nodal values at cell midpoints
■ integral mean-values at edges
Finite Elements
Montag, 25. Juni 12
■ Uniform 128x128 grid of unit square with 5% stochastic disturbance
Finite Element properties
△(A) MC ML NEQ NA
Q1 8 ✓ ✓ 16,641 82,680
~Q1par 6 ✓ ✓ 33,024 140,483
~Q1np 6 ✓ ✓ 33,024 140,478
~Q1par 6 -7.2E-07 (✓) 33,024 138,204
~Q1np 6 -7.2E-07 (✓) 33,024 138,576
meanvalue
pointvalue
Montag, 25. Juni 12
■ Uniform 128x128 grid of unit square with 5% stochastic disturbance
Finite Element properties
△(A) MC ML NEQ NA
Q1 8 ✓ ✓ 16,641 82,680
~Q1par 6 ✓ ✓ 33,024 140,483
~Q1np 6 ✓ ✓ 33,024 140,478
~Q1par 6 -7.2E-07 (✓) 33,024 138,204
~Q1np 6 -7.2E-07 (✓) 33,024 138,576
meanvalue
pointvalue
Montag, 25. Juni 12
Finite Element properties
Montag, 25. Juni 12
Finite Element properties
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 369
Q1 FE
Montag, 25. Juni 12
Finite Element properties
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 369
Q1 FE
0 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 530
~Q1 FE
Montag, 25. Juni 12
Finite Element properties
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 369
Q1 FE
0 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 530
~Q1 FE
3456789
101112
0 10 20 30 40 50 60 70 80
Number of non-zero entries per matrix row
Number of matrix row
Q1~Q1
Storage format:■ CRS, CCS
Storage format:■ ELLPACK
Montag, 25. Juni 12
Solid body rotation
Pure convection problem
u+r · (vu) = 0 in ⌦ = (0, 1)2
u = 0 on �
inflow
■ Velocity field
■ Grid size
■ Stochastic grid disturbance
■ Time step in Crank-Nicolson
■ Initial = exact solution at
v(x, y) = (0.5� y, x� 0.5)
�t = 1.28 · h
t = 2⇡k, k 2 N
~Q1 FE NL-FCTsim. time t=2π
� 2 {0%, 1%, 5%}
h = 1/2l, l = 5, 6, . . .
Montag, 25. Juni 12
SBR: L2-error (0% mesh disturbance)
0,01
0,1
1
5 6 7 8
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,01
0,1
1
5 6 7 8
~Q1 FE
Refinement level
Montag, 25. Juni 12
SBR: L2-error (5% mesh disturbance)
0,01
0,1
1
5 6 7 8
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,01
0,1
1
5 6 7 8
~Q1 FE
Refinement level
Montag, 25. Juni 12
Rotation of a Gaussian hill
■ Convection-diffusion equation
u+r · (vu� ✏ru) = 0
in ⌦ = (�1, 1)2
■ Velocity field, diffusion coefficient
■ Analytical solution
■ Position of the peak
■ Other parameters as before
v(x, y) = (�y, x), ✏ = 0.001
x(t) = �0.5 sin t
y(t) = 0.5 cos(t)
~Q1 FE NL-FCTtime t=5/2π
u(x, y, t) =1
4⇡✏te
� r2
4✏t
r
2 = (x� x)2 + (y � y)2
Montag, 25. Juni 12
RGH: L2-error (0% mesh disturbance)
0,001
0,01
0,1
1
10
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,001
0,01
0,1
1
10
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
RGH: dispersion-error (0% mesh disturbance)
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
RGH: L2-error (5% mesh disturbance)
0,001
0,01
0,1
1
10
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
0,001
0,01
0,1
1
10
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
RGH: dispersion-error (5% mesh disturbance)
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
Q1 FE
Refinement level
Low-order Lin-FCT NL-FCT NL-GP NL-TVD
-0,5
0,5
1,5
2,5
3,5
4,5
5,5
5 6 7 8 9
~Q1 FE
Refinement level
Montag, 25. Juni 12
3:3conforming
non
conforming
good accuracy good
small numerical diffusion small
smaller #DOFs, #edges larger
irregular sparsity pattern regular
Taxonomy of finite elements
Montag, 25. Juni 12
Can nonconforming finite elements help to improve performance on many-core hardware ?
NVIDIA Kepler GK110 Die Shot (taken from: www.gpgpu.org)Montag, 25. Juni 12
Example: edge-based flux-assembly with Q1 FE
0
10
20
30
40
50
0K 1K 10K 100K 1.000K
Band
wid
th (
GB/
s)
#edges per color group
1 OMP 2 OMP 4 OMP8 OMP 12 OMP Xeon X5680(2x6) CUDA C2070
best GPU performancefor large problem sizes
best CPU performanceif data fits into cache
Montag, 25. Juni 12
Example: edge-based flux assembly with Q1 FE
0
25
50
75
100
125
150
CUDA OMP 6
Acc
umul
ated
tim
e (m
s)
w/ transf.„55 ms“
132 ms
23 ms
5,7x
0
10
20
30
40
50
0K
125K
250K
375K
500K
Band
wid
th (
GB/
s)
Color groups 1-14
OMP 6 on Core i7 EPCUDA on C2070#edges per color group
6-7 GB/s
Fi =N
colorsX
k=1
X
ij2CGk
�ij(Uj � Ui)
Montag, 25. Juni 12
0
25
50
75
100
125
150
175
200
CUDA OMP 6
Acc
umul
ated
tim
e (m
s)
w/ transf.„95 ms“
191 ms
32 ms
5,9x
Example: edge-based flux assembly with ~Q1 FE
0
10
20
30
40
50
60
0K
375K
750K
1.125K
1.500K
Band
wid
th (
GB/
s)
Color groups 1-9
OMP 6 on Core i7 EPCUDA on C2070#edges per color group
6-7 GB/s
Montag, 25. Juni 12
Summary
Nonconforming ~Q1 finite elements
■ can be used within the algebraic flux correction framework
■ are comparable to conforming FEs (accuracy/numerical diffusion)
■ increase number of DOFs as compared to conforming Q1 FEs
■ lead to system matrices with regular structure favorable for HPC
Future plans
■ Apply nonconforming AFC schemes to systems of equations
■ Exploit benefits of nonconforming FEs for many-core architectures: speed up parallel (edge-based) assembly loops, implement more efficient matrix structures (ELLPACK), reduce communication costs in parallelized code
Montag, 25. Juni 12
Acknowledgements
AFC schemes
■ D. Kuzmin (University of Erlangen-Nuremberg)
CUDA/GPU programming
■ D. Göddeke, D. Ribbrock, M. Geveler (TU Dortmund)
Featflow2
■ M. Köster, P. Zajac (TU Dortmund)
Source code freely available at:
http://www.featflow.de/en/software/featflow2.html
Montag, 25. Juni 12