Scalable Mesh Generation for HPC...

Conclusion

Scalable Mesh Generation for HPC Applications Rajeev Jain, Navamita Ray, Iulian Grindeanu, Danqing Wu, Vijay Mahadevan

Acknowledgements

References

• Computational solvers simulating physical phenomenon on complex domain geometries need well resolved, high-quality meshes to tackle the discrete problem efficiently. !• Mesh generation for HPC applications is a complex process requiring access to geometry data and efficient mesh data-structures in a parallel setting as well as optimization techniques for quality preservation. !• A key component of any such parallel mesh infrastructure is the parallel I/O file system whose performance is vital to any complex application workflow. !• MeshKit, developed as a component of SIGMA tool chain, supports variety of meshing algorithms leveraging the scalable interfaces in SIGMA to geometry data (CGM) and unified parallel data-structures (MOAB) that can be used for HPC applications.

Scalable Interfaces for Geometric and Mesh based Applications(SIGMA)

An open-source toolchain to simplify computational modeling workflow[4]

Common Geometry Module(CGM) Libraries for querying and modifying solid geometry model[1]

Problem Description Mesh Generation Toolkit(MeshKit) Flexible algorithms to generate high-quality meshes[3]

Mesh Oriented datABase(MOAB) Efficient array-based data structures to handle mesh queries[2]

• Provides a collection of meshing algorithms to support mesh generation; a platform to develop mesh generation algorithms.

• Coordination of BREP-based meshing process, mesh smoothing, optimization

• Notable meshing algorithms include embedded boundary meshing, watertight models, mesh based geometry generator

• Exposes mesh manipulation and generation features such as Copy, Move, Rotate and Extrude Mesh. !!!• Uses SIGMA tools CGM and MOAB for accessing geometry model and mesh representation in a parallel setting. !

• Common API and topological model to access geometry representations for a variety of solid modeling engines such as ACIS, Open-CASCADE, Facet-based surfaces, etc.

• Provides parallel geometry model querying and handling; virtual geometry, non-manifold topology.

• Represent unstructured and structured mesh and field data on a mesh efficiently. • Parallel mesh capabilities:

- IO(parallel HDF5 library based) - link to existing state of the art mesh partitioners(ParMetis, Zoltan) - algorithms for resolving entities on shared processor

interfaces(geometric proximity based, global id based) - exchange ghost layers, mesh migration, field data exchange

• Hierarchical mesh generation for unstructured meshes through uniform refinement and quality metrics.

• Multimesh intersection and transfer algorithms.

• Automatic mesh generation describing complex nuclear reactor assemblies and core geometries(templates) using MeshKit

• Reactor Geometry Generator a.k.a., RGG GUI, developed in collaboration with Kitware Inc. !!

MeshKit for Nuclear Engineering

MOAB Weak Scaling Studies • A mini-app GenLargeMesh for generating large meshes to

explore the capability of MOAB’s parallel infrastructure and of the parallel IO file system.

• It creates 3D hexahedral meshes for a rectangular domain in-memory and writes out the mesh to a file in parallel.

• The partitioned mesh is generated on each task locally, and all tasks write to the same file. Approximately 21.6K hexes per task; largest mesh sizes: 221M (Blues) and 1.8B (Vesta).

• Weak scalability studies on Blues for assembly generation along with two levels of degree 2 uniform refinement. Each core 7500 hexes; 60K and 480K after 2 refinement levels; largest mesh size 1.1B hexes.

• Entities on the shared processor interface are resolved using a geometric proximity based vertex-merge algorithm for both core assemblies and after refinement; maintains scalability upto 1K cores.

Machine Details: !Blues: 310 nodes, 16 cores/node (Intel Sandy Bridge), 64 GB of RAM per node !!!Vesta: 2,048 nodes, 16 cores/node(1600 MHz PowerPC A2), 16 GB RAM per node

• Parallel I/O (based on hdf5 library) reads from or writes to a single file which involves indirect referencing to access entities on each partition. This effects the weak scalability of the I/O negatively as seen below.

• Shared entities at the interfaces between partitions is resolved using vertex global ids and crystal router, an efficient gather-scatter algorithm for sparse communication. The interface resolution is highly scalable and maintains efficiency to thousands of processors.

[1] T. J. Tautges, R. Meyers, K. Merkley, C. Stimpson, and C. Ernst, “MOAB: A Mesh-Oriented Database,” Sandia National Laboratories, SAND2004-1592, Apr. 2004. [2] T. J. Tautges, CGM: A geometry interface for mesh generation, analysis and other applications, Engineering with Computers, 17 (2001), pp. 299–314. [3] Rajeev Jain, T.J. Tautges, “Generating Unstructured Nuclear Reactor Core Meshes in Parallel”, In Proceedings of 23rd International Meshing Roundtable, Oct 2014. ![4] http://sigma.mcs.anl.gov/

• SIGMA tools provide necessary components and interfaces for developing advanced mesh generation capabilities.

• The parallel infrastructure in MOAB for resolving shared entities using geometrical proximity and global ids for vertices are highly scalable onto thousands of processors.

• The parallel IO deteriorates in performance and needs further investigation and optimization.

SIGMA&Components/Enabling(strong'application(support(through(loosely(connected(software5

Vijay(Mahadevan5(((((((Navamita(Ray5Iulian(Grindeanu(((((((((Danqing(Wu5Rajeev(Jain5((((((((((((((((((((Evan(Vanderzee55

Paul(Wilson5Patrick(Shriwise5Andy(Davis5

SIGMA:&Simplifying&traditional&computational&modeling&workflow/An(openIsource(simulation(toolchain5

!  Computational(solvers(simulating(physical(phenomena(on(complex'domain'geometries(need(well(resolved,(high'quality'meshes(to(tackle(the(discrete(problem(efficiently.5

!  Mesh(generation(is(a(complex(problem;(SIGMA(tools(simplify(the(process.5

!  SIGMA(provides(interfaces'and'components(to(access(geometry(data,(unified(dataIstructures(to(load(and(manipulate(parallel(unstructured(computational(meshes(for(various(applications.5

!  Leverage(demonstrated(scalability(of(SIGMA(tools(on(petascale(systems((research(for(exascale).5

Unstructured&Mesh&Oriented&Database&(MOAB)/Efficient(arrayIbased(datastructures(to(handle(mesh(queries(in(memory5

Define(Geometry5

Generate(discrete(mesh5

Solve(nonlinear(

PDE(systems5Serialize(and(Checkpoint5

Visualize(and(PostIprocess5

!  CGM(–(Common(Geometry(Module5! MOAB(–(Unstructured'Mesh(Oriented(datABase5!  Lasso&–(Relation(between(geometry(and(mesh(representations5

! Meshkit(–(Library(of(advanced(mesh;generation(algorithms5

!  CouPE(–(Coupled(multi;Physics(Environment5!  PySIGMA(–(Python(interfaces(to(SIGMA(tools5!  DMMoab(–(MeshISolver(interfaces(in(PETSc5!  Notable&applications:(RGG,(Nek5000,(PROTEUS,(Diablo,(MBCSLAM,(MoFEM,(SpaFEDTe5

"  API(provided(for(querying(faceted'geometry(models5"  Field(descriptors(and(scalable(solver'hooks((PETSc)5

#  MOAB(handles(unstructured(mesh(natively(while(PETSc(DM(interfaces(provide(DoF(mapping,(operator(assembly5

#  Utilize(uniform(refinement(to(drive(geometry(multigrid((KSP(or(PC)5"  Efficient(discretization'kernels(for(P(1,2)/Q(1,2)(elements5"  Ongoing(research:(understanding(portable(performance(of(unstructured(meshing(and(handling(algorithms5

hbp://sigma/mcs.anl.gov5

Geometry&and&Solver&aware&tools/Exposure(to(geometry,(mesh(and(discretization5

Common&Geometry&Module&(CGM)/Libraries(for(querying(and(modifying(solid(geometry(models5

$ Implements(the(ITAPS(iGeom(interface(completely5$ Provides(geometry(infrastructure(for(the(CUBIT(mesh(generation(toolkit5$ Parallel(geometry'model'querying(and(handling5

!  Access(geometry(representations(for(a(variety(of(solid(modeling(engines5%  ACIS(–(Geometry(backbone(for(CUBIT((upto(v14.0)(5%  Open;CASCADE((OCC/OCE)(–(Supported(natively5%  Mesh(based(representation((facetIbased(surfaces)5%  Other(BREP(based(CAD(models5

$ Non;manifold(topology(representation(and(detection5$ Support(ray'tracing,'surface'crossing(queries(for(MonteICarlo((MCNP)5

Faceted&geometry&(ITER)/

Common(API(and(topological(model(5

$ Implements(the(ITAPS(iMesh(and(iMeshP'interfaces5$ Represent(unstructured(and(structured(mesh,(and(field(data(on(a(mesh(

efficiently5

%  Represent(most(kinds(of(metadata(often(accompanying(the(mesh((e.g.(material(data,(boundary(conditions,(processor(partitions,(geometric(topology,(solution(data)5

$ Scalable(parallel(mesh(capabilities((verified(upto(512K(processors)5$ Robust(point(location(and(interpolation(in(parallel5$ Consistent(solution(transfer((C0/P1/spectral(basis)(between(computational(

meshes(to(support(accurate(multiIphysics(simulations5$ Uniform(unstructured(mesh(refinement(and(quality(metrics5$ Future(extensions(to(support(AMR(and(surface'reconstructionsL

Data(model(is(simple(yet(powerful5

5e−0

51e−0

42e−0

45e−0

41e−0

32e−0

35e−0

Tet −> Hex RMS Element Coupling Error

Number of Processes

32 256 2048 16384 131072

1 tet : 1000 hex1 tet : 100 hex1 tet : 10 hex1 tet : 1 hex

MeshNgeneration&toolkit&(MeshKit)/Flexible(algorithms(to(generate(highIquality(meshes5

Accurate5

Reactor&Geometry&Generator&(RGG)/

Strong Scaling

Number of Processes

32768 65536 131072 262144 524288

1024^3 gridPerfect scaling Nek5000(CFD(Spectral(element(code5PROTEUS(Neutron(transport(code5

Scalable(and(efficient5

$ Provides(a(collection(of(meshing(algorithms(to(support(mesh(generation((coordination(of(BREPIbased(meshing(process,(mesh(smoothing,(optimization)5

$ Exposes(mesh(manipulation(and(generation(features(such(as(Copy,(Move,(Rotate(and(Extrude(mesh.5

$ Notable(meshing(algorithms(include(embedded(boundary(meshing,(watertight(models,(mesh(based(geometry(generator5

$ Automatic(computational(mesh(generation(describing(complex(nuclear(reactor(assembly(and(core(geometries((templates).5

$ The(RGG(GUI(is(being(developed(in(collaboration(with(Kitware;(Motivated(to(improve(scientific(research(productivity((simplify(reactor(mesh(generation)5

Diablo(thermoImechanics(code5MoFEM(hIp(adaptive(FEA(code5

• Parallel write deteriorates significantly around 1K cores.

This work was supported by UChicago Argonne, LLC, a U.S. Department of Energy Office of Science laboratory, operated under Contract No. DE-AC02-06CH11357.

Scalable Mesh Generation for HPC...

Documents

Exhibitors Prospectussc10.supercomputing.org/files/SC10ExhPros.pdf · 2010-06-09 · corporate managers, CIOs, and IT administrators from universities, industry, and government agencies

SC2010 Poster Presentationsc15.supercomputing.org/.../poster_files/post239s2-file2.pdfSC2010 Poster Presentation Author Rajkumar Kettimuthu Subject Cumulus Created Date 10/15/2015

Computing The SC09 For A Student Cluster Changing Competitionsc09.supercomputing.org/files/SC09ClusterCompetition.pdf · Student Cluster Competition Sponsors: ACM SIGARCH/IEEE Computer

Extreme Fidelity Computational Electromagnetic Analysis in ...sc16.supercomputing.org/sc-archive/tech_poster/poster_files/post... · Extreme Fidelity Computational Electromagnetic

Two Million Constraints - Supercomputingsc13.supercomputing.org/sites/default/files/PostersArchive/tech... · d Peta-scale General Solver for Semidefinite Programming Problems with

I/O Performance Analysis Framework on Measurement Data from Scientific …sc15.supercomputing.org/sites/all/themes/SC15images/src... · 2016-05-10 · I/O Performance Analysis Framework

B. Distributed Application Runtime Environmentsc16.supercomputing.org/sc-archive/tech_poster/... · Abstract We present a distributed computing platform with which significant roadblocks

Large Scale Artificial Neural Network Training …sc15.supercomputing.org/sites/all/themes/SC15images/tech...Large Scale Artificial Neural Network Training Using Multi-GPUs Introduction

Efficient Multiscale Platelets Modeling using Supercomputerssc15.supercomputing.org/.../doctoral_showcase/doc_files/drs101s2-f… · 8 Efficient Multiscale Platelets Modeling using

Poster’PrintSize: ’ Change’Color’Theme:’ STATuner …sc15.supercomputing.org/.../poster_files/post276s2-file2.pdfPoster’PrintSize: ’ This’poster’template’is’36”’high’by’

Multi&GPUGraphAnalytics Yuechao Pan, Yangzihao …sc15.supercomputing.org/sites/all/themes/SC15images/tech_poster/... · Multi&GPUGraphAnalytics Yuechao Pan, Yangzihao Wang, Yuduo

ARGONNE LEADERSHIP COMPUTING FACILITY Molecular ...sc15.supercomputing.org/sites/all/themes/SC15images/tech_poster/poster_files/post274s2...of the recent advances in X-ray, cryo-electron

Performance(Analysis(and(Op?miza?on(of(the(Weather ...sc15.supercomputing.org/sites/all/themes/SC15images/src...poster and save valuable time placing titles, subtitles, text, and graphics

In Situ MPAS-Ocean Image-based Visualizationsc14.supercomputing.org/sites/all/themes/sc14/files/archive/sci... · In Situ MPAS-Ocean Image-based Visualization ... on ParaView Cinema,

Active Global Address Space (AGAS)sc15.supercomputing.org/sites/all/themes/SC15images/doctoral_showc… · This is necessary to implement blocked allocations in AGAS, as that information

High-Performance Tensor Contraction without BLASsc16.supercomputing.org/sc-archive/tech_poster/... · High-Performance Tensor Contraction without BLAS Tensor computations – in particular

Conference Programsc09.supercomputing.org/files/SC09_printed_program.pdffacilitates construction of simulations that are efficient, flexible, and robustly address inquiry in experimental

Bamboo - Preliminary scaling results on multiple hybrid ...sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/... · Bamboo - Preliminary scaling results on multiple

Out$of$coreSortingAcceleration usingGPUandFlashNVMsc15.supercomputing.org/sites/all/themes/SC15images/tech... · 2016-05-10 · 0" 50,000,000" 100,000,000" 150,000,000" 200,000,000"

Distributed NoSQL Storage for Extreme-scale System Servicessc15.supercomputing.org/sites/all/themes/SC15images/... · 2016-05-10 · SQL Databases Large Various - small O(10) ms Very