1
A GPU Accelerated Continuous-based Discrete Element Method for Elastodynamics Analysis Zhaosong Ma, Chun Feng, Tianping Liu, Shihai Li Institute of Mechanics, Chinese Academy of Sciences, Beijing 100190, China Email: [email protected] Introduction to CDEM Serial Algorithm in CPU Solution Algorithm in Parallel Unlike general CDEM algorithm which uses clone node strategy to ensure node consistency among elements, the parallel algorithm does the same work by means of a “nodal force group” strategy. The difference is out of reason of access conflict. For general CDEM, a node’s force is cloned to the associated nodes just when the node’s force calculation is done. This works well in serial execution, but may cause data access conflict in parallel execution, for various threads may clone their forces to a same node at the same time. Differently, the parallel algorithm uses a data structure to temporarily store the forces calculated. The nodal forces will be redistributed in a separate procedure after all of the node’s force calculation is done. The element calculation procedure is designed according to the specialty of GPU. As is known, a GPU consists of a set of SIMT (single-instruction, multiple-thread) multi- processors, which are mapped to CUDA blocks; each multi-processor has an instruction unit and several scalar processor cores, which are mapped to CUDA threads, and each scalar thread executes independently with its own instruction address and register state. Accordingly, the element calculation processes are divided into parallel CUDA blocks, each of which contains certain number of elements. In this work, a CUDA block contains 32 elements for best performance. The Continuum-based Distinct Element Method (CDEM) is an approach to simulate the progressive failure of geological mass, which is mainly used in landslide stability evaluation, coal and gas outburst analysis, as well as mining and tunnel designing. CDEM is the combination of Finite Element Method (FEM) and Distinct Element Method (DEM), It contains two kinds of elements, blocks and interfaces. A discrete block is consisted of one or more FEM elements, all of which share the same nodes and faces. An interface contains several normal and tangent springs, each connects nodes in different blocks. Inside the block the FEM is used, while for the interface, the DEM is adopted. CDEM is an explicit time history-analysis FEM/DEM approach on finite difference principles and forward-difference approximation is adopted to calculate the progressive process through a time marching scheme. During the calculation, the dynamic relaxation method is used to achieve convergence in a reasonable period time with small time steps, and the convergence is reached when the total magnitude of the kinetic energy is minimized. Parallel Algorithm in GPU ` Nodes Element 1 Nodes Element 2 Block 1 Nodes Element 3 Nodes Element 4 Block 2 Nodes Elem. n-1 Nodes Element n Block m Thread 1 Nodal Calculation …... Thread 2 Nodal Calculation …... Thread 3 Nodal Calculation …... Thread nn Nodal Calculation …... Threads Threads Threads Element Group 1 Group 2 Group 3 Group 4 Element Element Element Element Element Element Element Element ` Threads Threads Threads Block 1 Block 2 Block nb Thread 1 Process Forces Thread 2 Process Forces Thread 3 Process Forces Thread ng Process Forces Thread Division Scheme for Element Calculation Nodal Force Group Case 2: Elastic Field Calculation of Great Wall Beacon 41232 Hex Elements 4000 Steps Time Cost for CDEM (GPU Version) 6.1 Sec Time Cost for CDEM (CPU Version) 95 Min Time Cost for FLAC3D 520 Sec GeForce GTX285 Core (TM) 2 1.83GHz Core Duo 3.0GHz Case 3: Ground Settlement Induced by Deep Excavation 0.86 Million Tet Elements Time Cost for Elastic Calculation 6 Min GeForce GTX480 Case 1: Simple Model Comparison Speed Up Ratio Is 252 / 0.554 = 455 GeForce GTX285 A 1000 elements brick, with bottom fixed and gravity applied CPU Version GPU Version http://english.imech.cas.cn/ Case 4: Elastic Calculation of Landslide Time Cost for CDEM (GPU Version) 12.5 Sec Time Cost for FLAC3D 1380Sec GeForce GTX480 Core Duo 3.0GHz 85800 Hex elements 5000 Steps Element calculation Nodal force redistribution Global synchronization Element calculation Element calculation Nodal force redistribution Nodal force redistribution Preprocessing Copy data to GPU Meet iteration conditions? Post processing End Yes No Interface Force Node Acceleration Node Velocity Node Displacement Block Force Constitutive Equation Node Force Damping Force Damping Equation Is reached the convergence ? Kinetic Energy No Finish Yes Start

A GPU Accelerated Continuous-based Discrete Element Method ... · A GPU Accelerated Continuous-based Discrete Element Method for Elastodynamics Analysis. Zhaosong Ma, Chun Feng, Tianping

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A GPU Accelerated Continuous-based Discrete Element Method ... · A GPU Accelerated Continuous-based Discrete Element Method for Elastodynamics Analysis. Zhaosong Ma, Chun Feng, Tianping

A GPU Accelerated Continuous-based Discrete Element

Method for Elastodynamics

AnalysisZhaosong

Ma, Chun Feng, Tianping

Liu, Shihai

LiInstitute of Mechanics, Chinese Academy of Sciences, Beijing 100190, China

Email: [email protected]

Introduction to CDEM

Serial Algorithm in CPU

Solution Algorithm in Parallel

Unlike general CDEM algorithm which uses clone node strategy to ensure node consistency among elements, the parallel algorithm does the same

work by means of a “nodal force group”

strategy. The difference is out of reason of access conflict. For general CDEM, a node’s force is cloned to the associated nodes just when the node’s force calculation is done. This works well in serial execution, but may cause data access conflict in parallel execution, for various threads may clone their forces to a same node at the same time. Differently, the parallel algorithm uses a data structure to temporarily store the forces calculated. The nodal forces will be redistributed in a separate procedure after all of the node’s force calculation is done.

The element calculation procedure is designed according to the specialty of GPU. As is known, a GPU consists of a set of SIMT (single-instruction, multiple-thread) multi-

processors, which are mapped to CUDA blocks; each multi-processor has an instruction unit and several scalar processor cores, which are mapped to CUDA threads, and each scalar thread executes independently with its own instruction address and register state. Accordingly, the element calculation processes are divided into parallel CUDA blocks, each of which contains certain number of elements. In this work,

a CUDA block contains32 elements for best performance.

The Continuum-based Distinct Element Method (CDEM) is an approach to simulate the progressive failure of geological mass, which is mainly used in landslide stability evaluation, coal and gas outburst analysis, as well as mining and tunnel designing.

CDEM is the combination of Finite Element Method (FEM) and Distinct Element Method (DEM), It contains two kinds of elements, blocks and interfaces.

A discrete block is consisted of one or more FEM elements, all of which share the same nodes and faces. An interface contains several normal and tangent springs, each connects nodes in differentblocks. Inside the block the FEM is used, while for the interface, the DEM is adopted.

CDEM is an explicit time history-analysis FEM/DEM approach on finite difference principles and forward-difference approximation is adopted to calculate the progressive

process through a time marching scheme. During the calculation, the dynamic relaxation method is used to achieve convergence in a reasonable period time with small time steps, and the convergence is reached when the total magnitude of the kinetic energy is minimized.

Parallel Algorithm in GPU

`

Nodes

… Element 1

Nodes

Element 2

Block 1

Nodes

Element 3

Nodes

Element 4

Block 2

Nodes

Elem. n-1

Nodes

Element n

Block m

Thread 1 Nodal Calculation

…...

… Thread 2 Nodal Calculation

…...

Thread 3 Nodal Calculation

…...

Thread nnNodal Calculation

…...

Threads Threads Threads

Element

Group 1 Group 2

Group 3 Group 4

Element Element Element

Element Element

Element Element Element

`

Threads Threads Threads

Block 1 Block 2

Block nb

Thread 1

Process

Forces

Thread 2

Process

Forces

Thread 3

Process

Forces

Thread ng

Process

Forces

Thread Division Scheme for Element Calculation

Nodal Force Group

Case 2: Elastic Field Calculation of Great Wall Beacon

41232 Hex Elements4000 Steps

41232 Hex Elements4000 Steps

Time Cost for CDEM (GPU Version) 6.1 Sec

Time Cost for CDEM (CPU Version) 95 Min

Time Cost for FLAC3D 520 Sec

GeForce

GTX285

Core (TM) 21.83GHz

Core Duo3.0GHz

Case 3: Ground Settlement Induced by Deep Excavation

0.86 Million Tet

Elements0.86 Million Tet

Elements

Time Cost for Elastic Calculation 6 Min GeForce

GTX480

Case 1: Simple Model Comparison

Speed Up Ratio Is 252 / 0.554 = 455 GeForce

GTX285

A 1000 elements brick, with bottom fixed and gravity appliedA 1000 elements brick, with bottom fixed and gravity applied

CPU

Version

GPU

Version

http://english.imech.cas.cn/

Case 4: Elastic Calculation of Landslide

Time Cost for CDEM (GPU Version) 12.5 Sec

Time Cost for FLAC3D 1380Sec

GeForce

GTX480

Core Duo3.0GHz

85800 Hex elements 5000 Steps85800 Hex elements 5000 Steps

Element calculation

Nodal force redistribution

Global synchronization

Element calculation Element calculation…

Nodal force redistribution … Nodal force redistribution

Preprocessing

Copy data to GPU

Meet iteration conditions?

Post processing

End

Yes

No

Interface Force

Node Acceleration

Node Velocity

Node Displacement

Block Force

Constitutive Equation

Node Force

Damping ForceDamping

Equation

Is reached theconvergence ?

Kinetic Energy

No

FinishYes

Start