21
MLD2P4: a package of parallel algebraic multilevel Preconditioners Pasqua D’Ambra, Institute for High- Performance Computing and Networking (ICAR- CNR), Naples Branch, Italy Bologna, March 2008 t work with ela di Serafino, Second University of Naples atore Filippone, University of Rome “Tor-Vergata”

MLD2P4: a package of parallel algebraic multilevel Preconditioners

  • Upload
    louvain

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Bologna, March 2008. MLD2P4: a package of parallel algebraic multilevel Preconditioners. Pasqua D’Ambra , Institute for High-Performance Computing and Networking (ICAR-CNR), Naples Branch, Italy. joint work with Daniela di Serafino, Second University of Naples - PowerPoint PPT Presentation

Citation preview

Page 1: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

MLD2P4: a package of parallel

algebraic multilevel Preconditioners

Pasqua D’Ambra, Institute for High-Performance Computing and Networking (ICAR-CNR), Naples Branch, Italy

Bologna, March 2008

joint work with Daniela di Serafino, Second University of NaplesSalvatore Filippone, University of Rome “Tor-Vergata”

Page 2: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

2

Overview Motivations

Background Objectives

MLD2P4: Multi-Level Domain Decomposition Parallel Preconditioners Package based on PSBLAS Algorithms and computational kernels Software architecture

Some Results & Applications

Page 3: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

3

Background

Large-scale applications have to solve

bAx The linear system matrix is:

Real or complex and squareLarge and SparseDistributed among parallel processorsMatrix dimensions and entries, conditioning, sparsity pattern and coupling among variables vary along simulations

Page 4: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

4

Background (cont’d)

What is the best method/preconditioner? No absolute winner, experimentation is needed Reliable preconditioners require access to the complete

matrix Parallel implementation is not trivial

Interfacing with application software is required Custom-made interfaces to parallel legacy codes Different interfaces for different

preconditioners/solvers

Page 5: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

5

Objectivesdesigning and implementing a suite of

algebraic preconditioners based on Linear Algebra kernels for parallel sparse matrix computations

Flexibility Different preconditioners by single API

Portability & Efficiency Standard base software for serial kernels and data

communications Simplicity of usage

Modern (OO) Fortran 95 features and auxiliary routines for smooth legacy code integration

Page 6: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

6

MLD2P4Multi-Level Domain Decomposition

Parallel Preconditioners Package based on PSBLAS

Diagonal Block-Jacobi Additive Schwarz

with arbitrary overlap Algebraic

multi-level Schwarz

PSBLASParallel Sparse Basic Linear Algebra Subprograms

mld_prec_build(A,M,…)A, distributed sparse matrix (input)M, distributed sparse preconditioner (output)

mld_prec_apply(M,x,y,…)M, distributed sparse preconditioner (input)x,y, distributed vectors (input/output)

Page 7: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

7

PSBLAS (Filippone et al., http://www.ce.uniroma2.it/psblas/)

Basic Linear Algebra Operations with Sparse Matrices on MIMD Architectures

Iterative Sparse Linear SolversCG, BiCG, CGS, BiCGSTAB,

RGMRES,…

Ap

pl.

MPI

BLACSBasic Linear Algebra

Communication Subprograms

F95

SBLAS (Duff et al.)

Base

sw

Parallel Sparse Matrix Operations

matrix-matrix products, matrix-vector products, … K

ern

elsParallel Sparse Matrix

Managementallocate, build, update,

F77

Page 8: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

8

MLD2P4 DesignAlgorithms

Algebraic multi-level Schwarz preconditioners based on smoothed aggregation

good trade-off between parallelism and convergence optimal scalability for symmetric positive-definite matrices algebraic framework allows general-purpose application

Page 9: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

9

(1-lev) Schwarz: basic ingredients

patternsparsity symmetric nnA Adjacency graph of A

0a :ji,E,n1,2,3,...,W

,EW,G

ij

Ekj, : WkWj

,WW1δ

i

1δi

δi

-overlap partition of W

0-overlap partition of W

W,,...,m, iWi of partition 10

01W

02W12W

11W

1 2 3 4 5 6 7 8 9

123456789

Page 10: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

10

AS: basic ingredients (cont’d)

δii

T jjj

δi Wj ,e,...,e,eR

n21

Tδi

δi RP

Restriction/prolongation operators

Restriction of A

Tδi

δi

δi RARA

1 2 3 4 5 6 7 8 9

123456789

11A

12A

Page 11: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

11

Coarse level correction: basic ingredients

TCC

1C PR ,PADIP

Algebraic coarsening

uncoupled aggregation

otherwise,0

)j .aggr()i (vert. if,1P

WW:P where

ij

C

Smoothed prol./restr.

operators

Coarse-level

matrixC

TC

TCCC ARRAPPA

Page 12: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

12

Multilevel-Schwarz preconditioners & computational kernels

TCCC

1C

TC

C

C

ARRA :matx mat

PADIPR :matx mat

WW:P :aggregate

Abuild

Example: 2-lev hybrid-post

1CH2L MAMIMM

11

11

12 LL

build

δiA build

apply

P. D’Ambra, D. di Serafino, S. Filippone, On the Development of PSBLAS-based Parallel Two-level Schwarz Preconditioners, Applied Numerical Mathematics, 57, 2007.

CAwvx :vetmat

xMw :prec AS 1L12L

yRw :prol

zyA :esolv

vRz :ictrestr

TCC

C

C

Page 13: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

13

MLD2P4 DesignSoftware Architecture

Parallel PreconditionersBJA, ASM, RAS, ASH, ml-additive,

ml-hybridpre, ml-hybridpost, ml-symmhybrid App

l.

Preconditioner Buildprolongation, restriction,

coarse matrix, local sparse ILU and LU

Ker

nelsPreconditioner

Applicationdistributed & serial

coarse matrix solvers

PSBLAS 2.0extended version of PSBLAS 1.0

Base

sw

Page 14: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

14

Performance Results & Comparisons

Different test matrices from various sources

thm matrices: thermal diffusion in solids

kivap matrices: automotive engine design

shipsec matrices: from UF sparse matrix collection

Experiments carried out on different Linux clusters

64 Intel Itanium dual-processor nodes connected by Quadrics QSNetII Elan 4

32 AMD Opteron dual-processor nodes connected by Myrinet

8 AMD Opteron dual-processor nodes connected by InfiniBand

8 Intel Itanium dual-processor nodes connected by Myrinet

16 Intel Pentium IV nodes connected by Fast Ethernet

Comparison with up-to-date related work

Trilinos-MLA. Buttari, P. D’Ambra, D. di Serafino, S. Filippone, 2LEV-D2P4: a package of high-performance

preconditioners for scientific and engineering applications , Applicable Algebra in Engineering,

Communication and Computing, Vol. 18, 2007.

Page 15: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

15

Experimental Setting

MLD2P4: right-preconditioned BiCGSTAB 1-lev Restricted Additive Schwarz preconditioner with ILU(0) (RAS)

2-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec.

Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (2LDI) or with UMFPACK (2LDU) on diagonal blocks

3-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec.

Distributed coarsest matrix: 4 sweeps of block Jacobi with ILU(0) (3LDI) or with UMFPACK (3LDU) on diagonal blocks

60 10rrk

Stopping criterion: or maxitUnit right-hand side and null starting guessRow-block distribution of matrices: # submatrices = # procs

Page 16: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

16

thm matrices: number of iterations

npOV=0

RAS 2LDI 2LDU 3LDI 3LDU

1 613 190 - 70 -

2 705 184 - 72 -

4 761 206 - 74 -

8 688 202 44 67 28

16 748 211 61 70 36

32 766 186 81 69 51

64 809 196 113 86 68

thm1n = 600000

nnz = 2996800

64 Intel Itanium dual-processornodes connected by QSNetII

npOV=1

RAS 2LDI 2LDU 3LDI 3LDU

1 613 190 - 70 -

2 923 183 - 76 -

4 684 178 - 63 -

8 937 191 34 62 27

16 688 172 57 68 33

32 714 181 74 65 45

64 720 180 107 77 62

Page 17: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

17

thm matrices: execution times and speed-ups (OV=1; best execution times:3LDU)

64 Intel Itanium dual-processornodes connected by QSNetII

Page 18: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

18

Application test case

large eddy simulation of incompressible turbulent flows in a bi-periodical

channel main computational kernel

nonsymmetric and singular linear systems arising from elliptic PDE with Neumann b.c.

A. Aprovitola, P. D’Ambra, F. M. Denaro, D. di Serafino, S. Filippone, Application of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Large-Eddy Simulations of Wall-bounded Turbulent Flows: First Experiments, RT-ICAR-NA-2007-02, July 2007.

Page 19: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

19

Experimental Setting

MLD2P4: right-preconditioned RGMRES(30) 1-lev Restricted Additive Schwarz preconditioner with ILU(0) (RAS)

2-lev/3-lev hybrid Schwarz preconditioner, with RAS/ILU(0) as 1-lev prec.

Distributed coarse matrix: 4 sweeps of block Jacobi with ILU(0) (2LDI/3LDI) on diagonal blocks

Stopping criterion: or maxit General row-block distribution

70k 10rr

Pressure linear system

n=201600

nnz=1398600

Reynolds number: 180Computational Grid: 140x32x45 non-uniform in the y direction, time-step 10-4

Page 20: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

20

LES of incompressible wall-bounded flow

16 Intel Itanium dual-processornodes connected by QSNetII

SOR on 1 proc.=9 sec.SOR on 1 proc.=8580 sec.

Page 21: MLD2P4:  a package of parallel algebraic multilevel Preconditioners

Pasqua D'Ambra - Bologna March 2008

21

Work in progress Package available on the web very

soon

More sophisticated aggregation algorithms

Integration of preconditioners and solvers in large-scale applications