1
Extreme Scale Computing of Fusion Phyiscs in the Center for Edge Physics Simulation II. Introduction to Tokamak (a 3D Torus) Electromagnetic turbulence capability XGC1 currently can handle perturbative electromagnetic turbulence in a thermal-equilibrium plasma background: called “delta-f” simulation Edge plasma is in non-equilibrium state. The perturbative delta-f capability needs to be upgraded to “full-f.” Improving accuracy in the calculation of turbulent electrical current is the remaining challenge. Needs support for parallel unstructured meshing Enhancement in the accuracy of Edge Localized Mode simulations in an MHD code by evaluating more proper closure terms, through a data coupling (see next section). Inclusion of 3D perturbed field penetration capability in XGC1 to understand the control of Edge Localized Modes We presently have this capability without turbulence Verification of large scale code There is no other codes with the capability of XGC1 for cross-verification Analytic solution is difficult due to nonliear self-organization nature Even a manufactured solution is difficult due to the toroidal mode coupling Verification in a simplified problem must be cleverly designed Uncertainty quantification of large scale code is a challenge Cannot take the usual statistical approach on full scale code Thus, UQ also requires a cleverly designed simplified problem set Achieve good strong scaling to shortest length electromagnetic turbulence-physics grid in ITER on a heterogeneous platform A longer term challenge since we do not forsee immediate necessity for such a study We need to take advantage of advancement in the programing languages Prolongation of XGC1 simulation to experimental edge evolution time scale via a multi-scale time advance technique In memory coupling between coarse and fine grained kernels. Strong collaboration among physics, applied mathematicis, and data management scientistis is a critical necessity Must be supported by performace optimization experts Uncertainty Quantification is another challenge V. Further Development of XGC1 and Challenges C.S. Chang 1 , S. Ku 1 , L. Sugiyama 2 , S. Parker 3 , M. Greenwald 2 , G. Tynan 4 , A. Kritz 5 , D. Stotler 1 , and the EPSI Team 1 Princeton Plasma Physics Laboratory, 2 MIT, 3 University of Colorado, 4 University of California San Diego, 5 Lehigh University III. What Science Are We Studying? Cri2al Edge Physics to Help ITER Edge rotation soruce has been identified as a toroidicity effect, but the inward transport has been identified as turbulence effect. XGC1 has simulated for the first time the nonlinear coherent potential & density structures (“blobs”) across separatrix at outside midplane Torus, not a straight cylinder: plasma is on inhomogeneous physical space (magnetic field) ! complicates physics (& math) through magnetic mirroring, curvature drift, ballooning, toroidal mode coupling, etc. XGC1 is studying all the cri2cal edge physics IV. The XGC1 code Large scale turbulence in the whole volume simulation in DIII-D geometry We normally simulate the whole volume with a coarse grained mesh in the core to capture the large scale turbulence interaction between core and edge. When we confine the simulation to the edge, by placing a core-edge boundary, the turbulence solution gets distorted. Use a realistic BD condition for torus: Φ=a on the wall. L-H transition is being studied. • How does the pedestal grow? • How does the core pressure grow together with edge pressure? • What determines pedestal shape? • How localized the heat load will be on divertor plate? • How does the plasma fueled into core against gradient? • Why is there a rotation source at edge and how does it propagate in? • How are edge localized modes (ELMs) triggered and how to suppress them? • What is the physics for bifurcation from L-mode E r to H-mode E r ? • What is the threshold P L!H ? The Strategy • Solve the non-equilibrium problem from first-principles kinetic equation • Simulate realistic device; including magnetic X-point, sources and sinks • Avoid neglecting important effects Toroidal mode coupling Coupling of radial drift motions to turbulent field variations Variation of background profile (free energy driver) (Neglect of these effects are called “local” approximation.) • We choose a scheme stable to the multiscale dynamics Multiscale dynamics: Orbital motions across a large scale fluctuation of the background field could severly violate CFL condition if 5D PDE • The scheme should also easily handle the odd edge shape ! Lagrangian ODE scheme (particle-in-cell) An approach to solve PDE on a 5D-grid has been difficult. Particle-in-cell approach Extreme scale computing is necessary •ODE based Particle-In-Cell approach on configuration space grid Uses unstructured triangular grid •Aided by PDE and v-space grid approaches when advantageous 5D gyrokinetic equations ODE Time advancement of marker particles Finite difference (PETSc) Partial integro-differential Fokker-Planck collision operator discretized on retangular v-space grid PDE (PETSc) Maxwell’s equations on unstructured triangular x-space grid The usual interpolation issue exists Marker particles to unstructured triangular x-space grid Marker particles to structured retangular v-space grid Edge potential forms spontaneously and the edge pedestal grows, with ionization of wall-recycled neutrals. Inward pinch of cold particles found Agreement with experimental pedestal profile is excellent XGC1 containes all the basic physics compoents Gyrokinetic ions Drift-kinetic electrons: small gyroradius limit Monte-Carlo neutral particle with wall-recycling coefficients (DEGAS2 is built into XGC1) Multi-species impurity particles Plasma heating in the core Torque input in the core Fully non-linear Coulomb collisions (Fokker-Planck-Landau) Logical Debye-sheath: code determines wall sheath from ambipolar loss constraint. Reads in an experimental geometry and plasma data Plans and Challenges Full understanidng of ELMs requires coupling of MHD to particles, which have different physics and computational structures Different time scales, velocities, electric field, etc. Fluid-based MHD solutions are globally coupled, depend on boundary conditions – do not parallelize well on present systems! Particles have more local dynamics, parallelize well. Tight coupling (every few time steps) to resolve physics differences. MHD code computes magnetic field B, passes to particle code Particle code computes particle pressure tensor or current density for MHD momentum equations. Challenge: accurate calculation of small velocity moments from particles Main challenge: How to couple codes efficiently on leadership class HPC between non-scalable and scalable codes? VI. Edge Localized Mode Simulation in M3D A large scale Edge Localized Mode (ELM) could make the pedestal crash and send the plasma energy to the material wall ! Premature damage to wall, potentially severe under fusion-burning conditions Presently, a gyrokinetic code cannot simulate a large scale ELM Many observed properties of ELM crash can be explained with MHD – Addition of kinetic information could be important. Presently, two- fluid information is used, but edge plasma is kinetic. Figure shows an ELM instability event from M3D in a DIII-D plasma – Cut-plane view of density contours (blue-green-purple) shows large 'fingers’ being expelled towards the wall at top and bottom. o Top and bottom have magnetic X-points that form near- Hamiltonian “homoclinic” tangles for small perturbations – Two 3D density contours (blue and purple) show helical striations along equilibrium magnetic field Edge disturbances couple strongly to the plasma interior Velocity streamlines (3 starting values near outer midplane) show instability motion and gradual development of coherent rotation I. Introduction to Fusion II. Introduction to Tokamak III. The Science in the SciDAC EPSI IV. The Fusion Edge Gyrokinetic code XGC1 V. Further Development of XGC1 VI. Edge Localized Mode Simulation in M3D VII. A Representative Achievement: Performance Engineering VIII. Conclusion and Discussion Outline ITER Phases are well8aligned with the US Exascale Compu?ng Phases Burning lasma simulation I. Introduction to Fusion Environmentally safe energy scenario for the world electricity up to the year 2100 (Source: the Research Institute of Innovative Technology for the Earth, Tokyo) ? Fusion With all the resources being considered op?mis?cally, we are s?ll short significantly by 2100! Fusion science is an extremely high payoff research. Plasma physics simulation Plans for Further Performance enhancement VII. An example Achievement: Performance Engineering • Reduce MPI communication to GPUs by transporting scalar electrostatic potential, rather than 3D electric field vector • 2D domain decomposition to partition grid and particles (“poloidal decomposition”) instead of the current 1D domain decomposition. ~10% performance loss by doubling compute nodes in ITER grid We utilize leadership class computers (Titan and Hopper) to prouce science that has not been possible before Strong support by SciDAC Institues and individusl ASCR scientists has been essential The collaboration is anticipated to get even stonger as the science gets elevated to ITER level Developing an in-memory multiscale time advancement technique is a high pay-off challenge (see the other EPSI poster and an EPSI talk on Thursday) – Prolong the high fidelity simulation to experimental edge time scale (~50 ms) – Expensive turbulence simulation may not be needed at all time steps – Reset error accumulation at the same time – Centralized joint activity with Applied Math, Data Management, Performance Optimiztion, and UQ • Verification and validation are challenging and emphasized. Preliminary study on ITER For an efficient simulation of ITER physics, we need to use the heterogeneous Titan to its maximal capability For our coarse-grained physics simulation in the core, we have chosen 4X more number of grid points in ITER than DIII-D A reasonable preliminary scability has been achieved, even without a further optimization in the ITER grid We expect improvement. The electron subcycling time-advance takes ~85% of computing time in XGC1, without external communication Ideal for occupying GPUs while most other routines occupy CPUs Solver spends <5% of total computing time Weak Scaling in Number of Particles Bigger problem size in XGC1 requires more number of particles. XGC1 has not reached the weak-scale-breaking MPI communication limit. #Particles per grid-node is fixed #Grid-nodes, thus #particles, in a compute-node is determined by memory More #particles more #grid-nodes We achieved efficient weak scaling of XGC1 to Maximal Titan capability 4X performance enhancement in <1 year from CPU-only to GPU-CPU Conclusion and Discussion Challenge: Running XGC1’s on full-size heteroheneous Titan Collaborating SciDAC Institue: SUPER SUPER liaison: Patrick H. Worley, Oak Ridge National Laboratory EPSi Science Team Lead for Performance: Worley (20%) Other EPSI team members: E. D’Azevedo, J. Lang, S. Ku, S. Ethier

Extreme Scale Computing of Fusion Phyiscs in the Center ...€¦ · For assistance and to order your printed poster call PosterPresentations.com at 1.866.649.3004 Object Placeholders

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 36”x60” professional poster. It will save you valuable time placing titles, subtitles, text, and graphics. Use it to create your presentation. Then send it to PosterPresentations.com for premium quality, same day affordable printing. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. View our online tutorials at: http://bit.ly/Poster_creation_help (copy and paste the link into your web browser). For assistance and to order your printed poster call PosterPresentations.com at 1.866.649.3004

Object Placeholders

Use the placeholders provided below to add new elements to your poster: Drag a placeholder onto the poster area, size it, and click it to edit. Section Header placeholder Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster.

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Using the template Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 50% the size of the final poster. All text and graphics will be printed at 200% their size. To see what your poster will look like when printed, set the zoom to 200% and evaluate the quality of all your graphics before you submit your poster for printing. Using the placeholders To add text to this template click inside a placeholder and type in or paste your text. To move a placeholder, click on it once (to select it), place your cursor on its frame and your cursor will change to this symbol: Then, click once and drag it to its new location where you can resize it as needed. Additional placeholders can be found on the left side of this template. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on “Layout” to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To make the text fit better in the cells of an imported table, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25 Modifying the color scheme To change the color scheme of this template go to the “Design” menu and click on “Colors”. You can choose from the provide color combinations or you can create your own.

©"2012"PosterPresenta.ons.com"""""2117"Fourth"Street","Unit"C"""""Berkeley""CA""94710"""""[email protected]

Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.

Extreme Scale Computing of Fusion Phyiscs in the Center for Edge Physics Simulation

A Representative Challenge: Multiscale time integration

II. Introduction to Tokamak (a 3D Torus)

• Electromagnetic turbulence capability –  XGC1 currently can handle perturbative electromagnetic turbulence in a

thermal-equilibrium plasma background: called “delta-f” simulation –  Edge plasma is in non-equilibrium state. The perturbative delta-f capability

needs to be upgraded to “full-f.” –  Improving accuracy in the calculation of turbulent electrical current is the

remaining challenge. –  Needs support for parallel unstructured meshing

• Enhancement in the accuracy of Edge Localized Mode simulations in an MHD code by evaluating more proper closure terms, through a data coupling (see next section).

•  Inclusion of 3D perturbed field penetration capability in XGC1 to understand the control of Edge Localized Modes − We presently have this capability without turbulence

• Verification of large scale code –  There is no other codes with the capability of XGC1 for cross-verification –  Analytic solution is difficult due to nonliear self-organization nature –  Even a manufactured solution is difficult due to the toroidal mode coupling –  Verification in a simplified problem must be cleverly designed

• Uncertainty quantification of large scale code is a challenge –  Cannot take the usual statistical approach on full scale code –  Thus, UQ also requires a cleverly designed simplified problem set

• Achieve good strong scaling to shortest length electromagnetic turbulence-physics grid in ITER on a heterogeneous platform –  A longer term challenge since we do not forsee immediate necessity for such a

study –  We need to take advantage of advancement in the programing languages

• Prolongation of XGC1 simulation to experimental edge evolution time scale via a multi-scale time advance technique –  In memory coupling between coarse and fine grained kernels. –  Strong collaboration among physics, applied mathematicis, and data

management scientistis is a critical necessity –  Must be supported by performace optimization experts –  Uncertainty Quantification is another challenge

V. Further Development of XGC1 and Challenges

C.S. Chang1, S. Ku1, L. Sugiyama2, S. Parker3, M. Greenwald2, G. Tynan4, A. Kritz5, D. Stotler1, and the EPSI Team

1Princeton Plasma Physics Laboratory, 2MIT, 3University of Colorado, 4University of California San Diego, 5Lehigh University

III. What Science Are We Studying?

Cri2al0Edge0Physics0to0Help0ITER0

•  Edge rotation soruce has been identified as a toroidicity effect, but the inward transport has been identified as turbulence effect.

XGC1 has simulated for the first time the nonlinear

coherent potential & density structures (“blobs”) across

separatrix at outside midplane

Torus, not a straight cylinder: plasma is on inhomogeneous physical space (magnetic field) ! complicates physics (& math) through magnetic mirroring, curvature drift, ballooning, toroidal mode

coupling, etc.

XGC10is0studying0all0the0cri2cal0edge0physics0

IV.0The0XGC10code0

Large scale turbulence in the whole volume simulation in DIII-D geometry

•  We normally simulate the whole volume with a coarse grained mesh in the core to capture the large scale turbulence interaction between core and edge. − When we confine the simulation to

the edge, by placing a core-edge boundary, the turbulence solution gets distorted.

•  Use a realistic BD condition for torus: Φ=a on the wall.

•  L-H transition is being studied.

• How does the pedestal grow? • How does the core pressure grow together with edge pressure?

• What determines pedestal shape? • How localized the heat load will be on divertor plate?

• How does the plasma fueled into core against gradient?

• Why is there a rotation source at edge and how does it propagate in?

• How are edge localized modes (ELMs) triggered and how to suppress them?

• What is the physics for bifurcation from L-mode Er to H-mode Er?

• What is the threshold PL!H?

The0Strategy0

• Solve the non-equilibrium problem from first-principles kinetic equation • Simulate realistic device; including magnetic X-point, sources and sinks • Avoid neglecting important effects − Toroidal mode coupling − Coupling of radial drift motions to turbulent field variations − Variation of background profile (free energy driver) (Neglect of these effects are called “local” approximation.)

• We choose a scheme stable to the multiscale dynamics − Multiscale dynamics: Orbital motions across a large scale fluctuation of

the background field could severly violate CFL condition if 5D PDE • The scheme should also easily handle the odd edge shape

! Lagrangian ODE scheme (particle-in-cell)

An approach to solve PDE on a 5D-grid has been difficult. Particle-in-cell approach Extreme scale computing is necessary

• ODE based Particle-In-Cell approach on configuration space grid − Uses unstructured triangular grid

• Aided by PDE and v-space grid approaches when advantageous

• 5D gyrokinetic equations

− ODE Time advancement of marker particles

− Finite difference (PETSc) Partial integro-differential Fokker-Planck collision operator discretized on retangular v-space grid

− PDE (PETSc) Maxwell’s equations on unstructured triangular x-space grid

• The usual interpolation issue exists – Marker particles to unstructured triangular x-space grid – Marker particles to structured retangular v-space grid

•  Edge potential forms spontaneously and the edge pedestal grows, with ionization of wall-recycled neutrals.

•  Inward pinch of cold particles found

•  Agreement with experimental pedestal profile is excellent

XGC10containes0all0the0basic0physics0compoents0

•  Gyrokinetic ions •  Drift-kinetic electrons: small gyroradius limit •  Monte-Carlo neutral particle with wall-recycling coefficients (DEGAS2

is built into XGC1) •  Multi-species impurity particles •  Plasma heating in the core •  Torque input in the core •  Fully non-linear Coulomb collisions (Fokker-Planck-Landau) •  Logical Debye-sheath: code determines wall sheath from ambipolar

loss constraint. •  Reads in an experimental geometry and plasma data

Plans and Challenges •  Full understanidng of ELMs requires coupling of MHD to particles,

which have different physics and computational structures –  Different time scales, velocities, electric field, etc. –  Fluid-based MHD solutions are globally coupled, depend on boundary

conditions – do not parallelize well on present systems! Particles have more local dynamics, parallelize well.

•  Tight coupling (every few time steps) to resolve physics differences. –  MHD code computes magnetic field B, passes to particle code –  Particle code computes particle pressure tensor or current density for MHD

momentum equations. Challenge: accurate calculation of small velocity moments from particles

–  Main challenge: How to couple codes efficiently on leadership class HPC between non-scalable and scalable codes?

VI. Edge Localized Mode Simulation in M3D •  A large scale Edge Localized Mode (ELM) could make the pedestal

crash and send the plasma energy to the material wall ! Premature damage to wall, potentially severe under fusion-burning conditions

•  Presently, a gyrokinetic code cannot simulate a large scale ELM •  Many observed properties of ELM crash can be explained with MHD

– Addition of kinetic information could be important. Presently, two-fluid information is used, but edge plasma is kinetic.

•  Figure shows an ELM instability event from M3D in a DIII-D plasma – Cut-plane view of density contours (blue-green-purple) shows large

'fingers’ being expelled towards the wall at top and bottom. o Top and bottom have magnetic X-points that form near-

Hamiltonian “homoclinic” tangles for small perturbations – Two 3D density contours (blue and purple) show helical striations

along equilibrium magnetic field – Edge disturbances couple strongly to the plasma interior – Velocity streamlines (3 starting values near outer midplane) show

instability motion and gradual development of coherent rotation

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 36”x60” professional poster. It will save you valuable time placing titles, subtitles, text, and graphics. Use it to create your presentation. Then send it to PosterPresentations.com for premium quality, same day affordable printing. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. View our online tutorials at: http://bit.ly/Poster_creation_help (copy and paste the link into your web browser). For assistance and to order your printed poster call PosterPresentations.com at 1.866.649.3004

Object Placeholders

Use the placeholders provided below to add new elements to your poster: Drag a placeholder onto the poster area, size it, and click it to edit. Section Header placeholder Use section headers to separate topics or concepts within your presentation. Text placeholder Move this preformatted text placeholder to the poster to add a new body of text. Picture placeholder Move this graphic placeholder onto your poster, size it first, and then click it to add a picture to the poster.

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Using the template Verifying the quality of your graphics Go to the VIEW menu and click on ZOOM to set your preferred magnification. This template is at 50% the size of the final poster. All text and graphics will be printed at 200% their size. To see what your poster will look like when printed, set the zoom to 200% and evaluate the quality of all your graphics before you submit your poster for printing. Using the placeholders To add text to this template click inside a placeholder and type in or paste your text. To move a placeholder, click on it once (to select it), place your cursor on its frame and your cursor will change to this symbol: Then, click once and drag it to its new location where you can resize it as needed. Additional placeholders can be found on the left side of this template. Modifying the layout This template has four different column layouts. Right-click your mouse on the background and click on “Layout” to see the layout options. The columns in the provided layouts are fixed and cannot be moved but advanced users can modify any layout by going to VIEW and then SLIDE MASTER. Importing text and graphics from external sources TEXT: Paste or type your text into a pre-existing placeholder or drag in a new placeholder from the left side of the template. Move it anywhere as needed. PHOTOS: Drag in a picture placeholder, size it first, click in it and insert a photo from the menu. TABLES: You can copy and paste a table from an external document onto this poster template. To make the text fit better in the cells of an imported table, right-click on the table, click FORMAT SHAPE then click on TEXT BOX and change the INTERNAL MARGIN values to 0.25 Modifying the color scheme To change the color scheme of this template go to the “Design” menu and click on “Colors”. You can choose from the provide color combinations or you can create your own.

©"2012"PosterPresenta.ons.com"""""2117"Fourth"Street","Unit"C"""""Berkeley""CA""94710"""""[email protected]

Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.

I.  Introduction to Fusion

II.  Introduction to Tokamak

III.  The Science in the SciDAC EPSI

IV.  The Fusion Edge Gyrokinetic code XGC1

V.  Further Development of XGC1

VI.  Edge Localized Mode Simulation in M3D

VII.  A Representative Achievement: Performance Engineering

VIII. Conclusion and Discussion

Outline

Plans for Further Performance enhancement

VII. An example Achievement: Performance Engineering

• Reduce MPI communication to GPUs by transporting scalar electrostatic potential, rather than 3D electric field vector

• 2D domain decomposition to partition grid and particles (“poloidal decomposition”) instead of the current 1D domain decomposition.

~10% performance loss by doubling

compute nodes in ITER grid

ITER0Phases0are0well8aligned0with0the0US0Exascale0Compu?ng0Phases00

Burning lasma simulation

I. Introduction to Fusion Environmentally safe energy scenario for the world

electricity up to the year 2100 (Source: the Research Institute of Innovative Technology for the Earth, Tokyo)

?

Fusion"

With0all0the0resources0being0considered0op?mis?cally,0we0are0s?ll0short0significantly0by02100!0

Fusion0science0is0an0extremely0high0payoff0research.0"

• We utilize leadership class computers (Titan and Hopper) to prouce science that has not been possible before

•  Strong support by SciDAC Institues and individusl ASCR scientists has been essential – The collaboration is anticipated to get even stonger as the science

gets elevated to ITER level • Developing an in-memory multiscale time advancement technique is a

high pay-off challenge (see the other EPSI poster and an EPSI talk on Thursday) – Prolong the high fidelity simulation to experimental edge time scale (~50 ms) – Expensive turbulence simulation may not be needed at all time steps – Reset error accumulation at the same time – Centralized joint activity with Applied Math, Data Management, Performance

Optimiztion, and UQ • Verification and validation are challenging and emphasized.

Plasma physics simulation

Strong Scaling to Larger size Devices • We also would like to achieve a reasonable strong-scaling to maximal

Titan capability for an efficient simulation of ITER physics •  For our coarse-grained physics simulation in the core, we have chosen

4X more number of grid points in ITER than DIII-D • A reasonable strong scaling has been achieved, even without a further

optimization for the ITER grid ! expect improvement

•  The electron subcycling time-advance takes ~85% of computing time in XGC1, without external communication –  Ideal for occupying GPUs while most other routines occupy CPUs –  Solver spends <5% of total computing time

Weak Scaling in Number of Particles •  Bigger problem size at higher velocity-space resolution in XGC1

requires more number of particles on fixed configuration space grid. –  XGC1’s efficient scalibility has already been proven on 2PB homogeneous Jaguar –  Can XGC1 scale as efficiently on ~10PB heterogeneous Titan?

•  We achieved efficient weak scaling of XGC1 to Maximal Titan capability on DIII-D grid

•  4X performance enhancement in <1 year from CPU-only to GPU-CPU

Conclusion0and0Discussion00

Challenge: Running XGC1’s on full-size heteroheneous Titan Collaborating SciDAC Institue: SUPER SUPER liaison: Patrick H. Worley, Oak Ridge National Laboratory EPSi Science Team Lead for Performance: Worley (20%) Other EPSI team members: E. D’Azevedo, J. Lang, S. Ku, S. Ethier

I.  Introduction to Fusion

II.  Introduction to Tokamak

III.  The Science in the SciDAC EPSI

IV.  The Fusion Edge Gyrokinetic code XGC1

V.  Further Development of XGC1

VI.  Edge Localized Mode Simulation in M3D

VII.  A Representative Achievement: Performance Engineering

VIII. Conclusion and Discussion

Outline

Plans for Further Performance enhancement

VII. An example Achievement: Performance Engineering

• Reduce MPI communication to GPUs by transporting scalar electrostatic potential, rather than 3D electric field vector

• 2D domain decomposition to partition grid and particles (“poloidal decomposition”) instead of the current 1D domain decomposition.

~10% performance loss by doubling

compute nodes in ITER grid

ITER%Phases%are%well.aligned%with%the%US%Exascale%Compu=ng%Phases%%

Burning lasma simulation

I. Introduction to Fusion Environmentally safe energy scenario for the world

electricity up to the year 2100 (Source: the Research Institute of Innovative Technology for the Earth, Tokyo)

?

Fusion'

With%all%the%resources%being%considered%op=mis=cally,%we%are%s=ll%short%significantly%by%2100!%

Fusion%science%is%an%extremely%high%payoff%research.%

• We utilize leadership class computers (Titan and Hopper) to prouce science that has not been possible before

•  Strong support by SciDAC Institues and individusl ASCR scientists has been essential – The collaboration is anticipated to get even stonger as the science

gets elevated to ITER level • Developing an in-memory multiscale time advancement technique is a

high pay-off challenge (see the other EPSI poster and an EPSI talk on Thursday) – Prolong the high fidelity simulation to experimental edge time scale (~50 ms) – Expensive turbulence simulation may not be needed at all time steps – Reset error accumulation at the same time – Centralized joint activity with Applied Math, Data Management, Performance

Optimiztion, and UQ • Verification and validation are challenging and emphasized.

Plasma physics simulation

Preliminary study on ITER

•  For an efficient simulation of ITER physics, we need to use the heterogeneous Titan to its maximal capability

•  For our coarse-grained physics simulation in the core, we have chosen 4X more number of grid points in ITER than DIII-D

• A reasonable preliminary scability has been achieved, even without a further optimization in the ITER grid We expect improvement.

•  The electron subcycling time-advance takes ~85% of computing time in XGC1, without external communication –  Ideal for occupying GPUs while most other routines occupy CPUs –  Solver spends <5% of total computing time

Weak Scaling in Number of Particles •  Bigger problem size in XGC1 requires more number of particles.

–  XGC1 has not reached the weak-scale-breaking MPI communication limit.

•  #Particles per grid-node is fixed #Grid-nodes, thus #particles, in a compute-node is determined by memory More #particles more #grid-nodes

•  We achieved efficient weak scaling of XGC1 to Maximal Titan capability •  4X performance enhancement in <1 year from CPU-only to GPU-CPU

Conclusion%and%Discussion%%

Challenge: Running XGC1’s on full-size heteroheneous Titan

Collaborating SciDAC Institue: SUPER SUPER liaison: Patrick H. Worley, Oak Ridge National Laboratory EPSi Science Team Lead for Performance: Worley (20%) Other EPSI team members: E. D’Azevedo, J. Lang, S. Ku, S. Ethier