View
1
Download
0
Category
Preview:
Citation preview
LISZT!A DSL for Mesh-Based PDEs
Z. DeVito, M. Medina, N. Joubert, M. Barrientos, E. Elsen,S. Oakley, J. Alonso, E. Darve, F. Ham, P. Hanrahan
GPGPU RUNTIME
Liszt Codeval Flux = FieldWithConst[Cell,Float](0.f)while (t < 2.f) {
for(f <- interior_set) { val normal = face_unit_normal(f) val vDotN = dot(globalVelocity,normal) val area = face_area(f) var flux = 0.f val cell = if(vDotN >= 0.f) inside(f) else outside(f) flux = area * vDotN * Phi(cell) Flux(inside(f) : Cell) -= flux Flux(outside(f) : Cell) += flux}for(f <- inlet_set) { val area = face_area(f) val vDotN = dot(globalVelocity,face_unit_normal(f)) Flux(outside(f) : Cell) += area * vDotN * phi_sine_function(t)}
}
val Flux = FieldWithConst[Cell,Float](0.f)def determineInclusions() : Unit = { for(f <- inlet_set) { isInlet_face(f) = 1; } for(f <- interior_set) { isInterior_face(f) = 1; }}for(c <- cells(mesh)) { for(f <- faces(c)) { if(isInterior_face(f) > 0){ val normal = face_unit_normal(f) val vDotN = dot(globalVelocity,normal) val area = face_area(f) val cell = if(vDotN >= 0.f) inside(f) else outside(f) var flux = area * vDotN * Phi(cell) Flux(c) -= flux if(ID(c) == insideID(f)) Flux(c) += flux if(ID(c) == outsideID(f)) } } }for(c <- cells(mesh)){ for(f <- faces(c)){ if(isInlet_face(f) > 0) { if(ID(c) == outsideID(f)) { val area = face_area(f) val vDotN = dot(globalVelocity,face_unit_normal(f)) Flux(c) += area * vDotN * phi_sine_function(t) } } } }}
Liszt GPU Code
IMPLICIT METHODS
ARCHITECTURE
Liszt is a domain specific language that exposes a high-level interface for building mesh-based solvers of PDEs. This frees scientists from architecture-specific implementations and increases programmer productivity tenfold. Current PSAAP solvers are tied to a specific platform, while Liszt solvers are portable across architectures. Our compiler achieves this by using domain knowledge in its program analysis stage to produce high performance code for a variety of platforms.
Liszt has a stable implementation for finite difference methods with a fully functional MPI-based backend. Liszt now supports implicit methods by providing native sparse matrix operations, as used by our implementation of the Joe RANS solver. Program transformations for our GPU runtime are in development, and our preliminary GPU runtime provides explicit finite difference support. A full stack of debugging, visualization, and compiler tools is now available.
OVERVIEW
State of the art finite element and finite difference methods use implicit solvers to provide stability and performance. Implicit methods depend on global solves of sparse matrices. Liszt has added language-level support for solving sparse matrices, and integrates the PETSc solver as a backend.
Sparse matrices are tied to the topology of the mesh, allowing for simple referencing. Implicit formulations of finite difference methods have a regular matrix structure, currently supported by Liszt. Higher order finite element methods require multiple, different submatrices per element in its matrix formulation, currently in development.
The implicit version of Joe has been ported to Liszt, reducing its codebase from 3106 lines to 1520 lines (this disregards the 20 000+ lines of MPI boilerplate code in C++ Joe). MPI performance is comparable for both the explicit and implicit versions of Joe.
The Liszt framework cross‐compiles Scala‐embedded DSL code to C++. Three implementa=ons of the run=me exist: an MPI‐based run=me for clusters, an OpenMP‐based run=me for SMPs and a preliminary GPU backend.
The GPU backend implements gathers and reductions in native NVidia C, and manages mesh and field data on the GPU. The JIT phase for the GPU performs transformations to convert standard scatter-based operations into gathers, allowing arbitrary code to be executed on the GPU.
CURRENT AND FUTURE WORKWe are currently working on:
DSL advances through Polymorphic Embedding GPGPU-specific loop transformations FEM & DG support through canonical elements
Future work: Release private beta at upcoming Codeathon Uncertainty quantification support Transformations between scatters, gathers & reduces A hybrid runtime combining MPI and GPGPU
double *A = new double[ncv][5][5];double *phi = new double[ncv][5];double *rhs = new double[ncv][5];for (int ifa = 0; ifa < nfa; ifa++) { int icv0 = cvofa[ifa][0]; int icv1 = cvofa[ifa][1]; int noc00, noc01, noc11, noc10; getImplDependencyIndex(noc00, noc01, noc11, noc10, icv0, icv1); calcEulerFluxMatrices_HLLC(Apl, Ami); for (int i=0; i<5; i++) for (int j=0; j<5; j++) { A[noc00][i][j] += Apl[i][j]; A[noc01][i][j] += Ami[i][j]; }}int *nbocv_v_global = new int[ncv_g];for (int icv = 0; icv < ncv; icv++) { nbocv_v_global[icv] = cvora[mpi_rank] + icv; updateCvData(nbocv_v_global, REPLACE_DATA);}PetscSolver petscSolver(..., cvora, nbocv_i, nbocv_v, 5);petscSolver.solveGMRES(A, phi, rhs, cvora, nbocv_i, nbocv_v, nbocv_v_global, 5, ...);
val A = new SparseMatrix[Float] ;val phi = new SparseVector[Float] ;val rhs = new SparseVector[Float] ;for ( c <- cells(mesh) ) { for ( f <- faces(c) ) { val Apl = AplMatrixStorageField(f) ; val Ami = AmiMatrixStorageField(f) ; val cc = inside(f) ; A(c,c) += Apl ; A(c,cc) += Ami ; }}phi = A/rhs ;
Liszt Implicit CodeJoe Implicit Code
Scala Compiler
MPI Build GPU Build
Liszt JIT
GPU Codegen
MPI Codegen
SMP Codegen
Program Analysis
Platform-specific Transforms
MPICXXNVCC
GCC
Viz Build
GCC
SMP Build
GCCmpirun
cocoa threads
ptx
SC.class
SC.scala
liszt.cfg
!"
#"
$%"
&'"
(&"
!" #" $%" &'" (&"
Sp
eed
up
over
Scala
r
Number of nodes
Joe Explicit Euler
!"
#"
$%"
&'"
(&"
!" #" $%" &'" (&"
Sp
eed
up
over
Scala
r
Number of nodes
Joe Implicit Euler
LisztViz is an extension of our single‐core run=me that provides mesh visualiza=on of the simula=on system. LisztViz eases debugging by making all symbols visible through watchpoints in the execu=on stream.
The GPU implementa=on demands a separa=on of code into CPU drivers and GPU kernels, manages memory transfers and transforms types. This happens in two passes: “transform” and “codegen”.
PERFORMANCE
Recommended