A Comparative Study of Conforming and Nonconforming High ... · Rotated bilinear ~Q 1 FE nodal...

Preview:

Citation preview

A Comparative Study of Conformingand Nonconforming High-Resolution

Finite Element SchemesMatthias Möller

Institute of Applied Mathematics (LS3)TU Dortmund, Germany

European Seminar on ComputingPilsen, Czech Republic

June 25, 2012

Montag, 25. Juni 12

Family of AFC schemes Kuzmin et al.

MLu = [K+D]u+ F(u,u)

artificial diffusion operator

antidiffusive correctionlumped mass matrix

Montag, 25. Juni 12

linearized flux-correction

nonlinear flux-correction

Family of AFC schemes Kuzmin et al.

NL-GP NL-FCT NL-TVD Low-order

MLu = MLuL + F (uL, uL)Lin-FCT

F (u) = �DuF (u, u) = [ML �MC ]u�Du F ⌘ 0

MLu = [K+D]u+ F(u,u)

Montag, 25. Juni 12

linearized flux-correction

nonlinear flux-correction

Family of AFC schemes Kuzmin et al.

AFC-LPT. . .

NL-GP NL-FCT NL-TVD Low-order

MLu = MLuL + F (uL, uL)Lin-FCT

F (u) = �DuF (u, u) = [ML �MC ]u�Du F ⌘ 0

MLu = [K+D]u+ F(u,u)

Montag, 25. Juni 12

Review of design principles

■ Jameson‘s Local Extremum Diminishing criterion

IF

THEN local solution maxima/minima do not increase/decrease

positive not negative

miui =X

j 6=i

�ij(uj � ui)

Montag, 25. Juni 12

Review of design principles

■ Jameson‘s Local Extremum Diminishing criterion

IF

THEN local solution maxima/minima do not increase/decrease

■ Semi-discrete high-resolution scheme

positive not negative

miui =X

j 6=i

�ij(uj � ui)

miui =X

j 6=i

(kij + dij)(uj � ui) + �iui +X

j 6=i

↵ijfij

mi =X

jmij

not negative by construction of

diffusion coefficient

controlled byflux limiter

Montag, 25. Juni 12

Review of design principles

■ Jameson‘s Local Extremum Diminishing criterion

IF

THEN local solution maxima/minima do not increase/decrease

■ Semi-discrete high-resolution scheme

■ Edge-based assembly of operators/vectors is feasible and efficient!

positive not negative

miui =X

j 6=i

�ij(uj � ui)

miui =X

j 6=i

(kij + dij)(uj � ui) + �iui +X

j 6=i

↵ijfij

mi =X

jmij

not negative by construction of

diffusion coefficient

controlled byflux limiter

Montag, 25. Juni 12

Bilinear Q1 FE

■ nodal values at cell vertices

Rotated bilinear ~Q1 FE

■ nodal values at cell midpoints

■ integral mean-values at edges

Finite Elements

Montag, 25. Juni 12

■ Uniform 128x128 grid of unit square with 5% stochastic disturbance

Finite Element properties

△(A) MC ML NEQ NA

Q1 8 ✓ ✓ 16,641 82,680

~Q1par 6 ✓ ✓ 33,024 140,483

~Q1np 6 ✓ ✓ 33,024 140,478

~Q1par 6 -7.2E-07 (✓) 33,024 138,204

~Q1np 6 -7.2E-07 (✓) 33,024 138,576

meanvalue

pointvalue

Montag, 25. Juni 12

■ Uniform 128x128 grid of unit square with 5% stochastic disturbance

Finite Element properties

△(A) MC ML NEQ NA

Q1 8 ✓ ✓ 16,641 82,680

~Q1par 6 ✓ ✓ 33,024 140,483

~Q1np 6 ✓ ✓ 33,024 140,478

~Q1par 6 -7.2E-07 (✓) 33,024 138,204

~Q1np 6 -7.2E-07 (✓) 33,024 138,576

meanvalue

pointvalue

Montag, 25. Juni 12

Finite Element properties

Montag, 25. Juni 12

Finite Element properties

0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 369

Q1 FE

Montag, 25. Juni 12

Finite Element properties

0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 369

Q1 FE

0 10 20 30 40 50 60 70 80

0

10

20

30

40

50

60

70

80

nz = 530

~Q1 FE

Montag, 25. Juni 12

Finite Element properties

0 10 20 30 40 50

0

5

10

15

20

25

30

35

40

45

50

nz = 369

Q1 FE

0 10 20 30 40 50 60 70 80

0

10

20

30

40

50

60

70

80

nz = 530

~Q1 FE

3456789

101112

0 10 20 30 40 50 60 70 80

Number of non-zero entries per matrix row

Number of matrix row

Q1~Q1

Storage format:■ CRS, CCS

Storage format:■ ELLPACK

Montag, 25. Juni 12

Solid body rotation

Pure convection problem

u+r · (vu) = 0 in ⌦ = (0, 1)2

u = 0 on �

inflow

■ Velocity field

■ Grid size

■ Stochastic grid disturbance

■ Time step in Crank-Nicolson

■ Initial = exact solution at

v(x, y) = (0.5� y, x� 0.5)

�t = 1.28 · h

t = 2⇡k, k 2 N

~Q1 FE NL-FCTsim. time t=2π

� 2 {0%, 1%, 5%}

h = 1/2l, l = 5, 6, . . .

Montag, 25. Juni 12

SBR: L2-error (0% mesh disturbance)

0,01

0,1

1

5 6 7 8

Q1 FE

Refinement level

Low-order Lin-FCT NL-FCT NL-GP NL-TVD

0,01

0,1

1

5 6 7 8

~Q1 FE

Refinement level

Montag, 25. Juni 12

SBR: L2-error (5% mesh disturbance)

0,01

0,1

1

5 6 7 8

Q1 FE

Refinement level

Low-order Lin-FCT NL-FCT NL-GP NL-TVD

0,01

0,1

1

5 6 7 8

~Q1 FE

Refinement level

Montag, 25. Juni 12

Rotation of a Gaussian hill

■ Convection-diffusion equation

u+r · (vu� ✏ru) = 0

in ⌦ = (�1, 1)2

■ Velocity field, diffusion coefficient

■ Analytical solution

■ Position of the peak

■ Other parameters as before

v(x, y) = (�y, x), ✏ = 0.001

x(t) = �0.5 sin t

y(t) = 0.5 cos(t)

~Q1 FE NL-FCTtime t=5/2π

u(x, y, t) =1

4⇡✏te

� r2

4✏t

r

2 = (x� x)2 + (y � y)2

Montag, 25. Juni 12

RGH: L2-error (0% mesh disturbance)

0,001

0,01

0,1

1

10

5 6 7 8 9

Q1 FE

Refinement level

Low-order Lin-FCT NL-FCT NL-GP NL-TVD

0,001

0,01

0,1

1

10

5 6 7 8 9

~Q1 FE

Refinement level

Montag, 25. Juni 12

RGH: dispersion-error (0% mesh disturbance)

-0,5

0,5

1,5

2,5

3,5

4,5

5,5

5 6 7 8 9

Q1 FE

Refinement level

Low-order Lin-FCT NL-FCT NL-GP NL-TVD

-0,5

0,5

1,5

2,5

3,5

4,5

5,5

5 6 7 8 9

~Q1 FE

Refinement level

Montag, 25. Juni 12

RGH: L2-error (5% mesh disturbance)

0,001

0,01

0,1

1

10

5 6 7 8 9

Q1 FE

Refinement level

Low-order Lin-FCT NL-FCT NL-GP NL-TVD

0,001

0,01

0,1

1

10

5 6 7 8 9

~Q1 FE

Refinement level

Montag, 25. Juni 12

RGH: dispersion-error (5% mesh disturbance)

-0,5

0,5

1,5

2,5

3,5

4,5

5,5

5 6 7 8 9

Q1 FE

Refinement level

Low-order Lin-FCT NL-FCT NL-GP NL-TVD

-0,5

0,5

1,5

2,5

3,5

4,5

5,5

5 6 7 8 9

~Q1 FE

Refinement level

Montag, 25. Juni 12

3:3conforming

non

conforming

good accuracy good

small numerical diffusion small

smaller #DOFs, #edges larger

irregular sparsity pattern regular

Taxonomy of finite elements

Montag, 25. Juni 12

Can nonconforming finite elements help to improve performance on many-core hardware ?

NVIDIA Kepler GK110 Die Shot (taken from: www.gpgpu.org)Montag, 25. Juni 12

Example: edge-based flux-assembly with Q1 FE

0

10

20

30

40

50

0K 1K 10K 100K 1.000K

Band

wid

th (

GB/

s)

#edges per color group

1 OMP 2 OMP 4 OMP8 OMP 12 OMP Xeon X5680(2x6) CUDA C2070

best GPU performancefor large problem sizes

best CPU performanceif data fits into cache

Montag, 25. Juni 12

Example: edge-based flux assembly with Q1 FE

0

25

50

75

100

125

150

CUDA OMP 6

Acc

umul

ated

tim

e (m

s)

w/ transf.„55 ms“

132 ms

23 ms

5,7x

0

10

20

30

40

50

0K

125K

250K

375K

500K

Band

wid

th (

GB/

s)

Color groups 1-14

OMP 6 on Core i7 EPCUDA on C2070#edges per color group

6-7 GB/s

Fi =N

colorsX

k=1

X

ij2CGk

�ij(Uj � Ui)

Montag, 25. Juni 12

0

25

50

75

100

125

150

175

200

CUDA OMP 6

Acc

umul

ated

tim

e (m

s)

w/ transf.„95 ms“

191 ms

32 ms

5,9x

Example: edge-based flux assembly with ~Q1 FE

0

10

20

30

40

50

60

0K

375K

750K

1.125K

1.500K

Band

wid

th (

GB/

s)

Color groups 1-9

OMP 6 on Core i7 EPCUDA on C2070#edges per color group

6-7 GB/s

Montag, 25. Juni 12

Summary

Nonconforming ~Q1 finite elements

■ can be used within the algebraic flux correction framework

■ are comparable to conforming FEs (accuracy/numerical diffusion)

■ increase number of DOFs as compared to conforming Q1 FEs

■ lead to system matrices with regular structure favorable for HPC

Future plans

■ Apply nonconforming AFC schemes to systems of equations

■ Exploit benefits of nonconforming FEs for many-core architectures: speed up parallel (edge-based) assembly loops, implement more efficient matrix structures (ELLPACK), reduce communication costs in parallelized code

Montag, 25. Juni 12

Acknowledgements

AFC schemes

■ D. Kuzmin (University of Erlangen-Nuremberg)

CUDA/GPU programming

■ D. Göddeke, D. Ribbrock, M. Geveler (TU Dortmund)

Featflow2

■ M. Köster, P. Zajac (TU Dortmund)

Source code freely available at:

http://www.featflow.de/en/software/featflow2.html

Montag, 25. Juni 12

Recommended