[American Institute of Aeronautics and Astronautics 31st Aerospace Sciences Meeting - Reno,NV,U.S.A. (11 January 1993 - 14 January 1993)] 31st Aerospace Sciences Meeting - A data parallel

AIAA 93-0595 A Data Parallel Finite Element Explicit Method For Computational Heat Transfer

d

31 st Aerospace Sciences Meeting & Exhibit

Raju R. Narnburu and Farzad Rostam-Abadi U S Army TARDEC Warren, MI

A DATA PARALLEL FINITE ELEMENT EXPLICIT METHOD FOR COMPUTATIONAL

HEAT TRANSFER

Raju R. Narnburu

Army High Perfornmce Computing Research Center/

Computer Sciences Corporation

and

Farzad Rostarn-Abadi

u S A m y TARDEC

US Army Tank-Automotive Command

Attn. AMSTA-RYT. Bldg. 215

Warren, MI 48397-5000

ABSTRACT

Vi An explicit flux-based finite element method has been implemented on the Connection Machine system CM-5 for evaluating transient thermal response of structures. To minimize communications. and to improve the efficiency of the data parallel algorithm. we implemented the algorithm based on nodal temperature anays and elemental heat flux load arrays. In this study data parallel language CM Fortran is implemented with virtual processor constructs with :SERIAL and :PARALLEL LAYOUT directives for arrays. For the explicit method. the communications involve extraction of element nod4 temperatures from global temperature vector and dispersion o f element thermal b a d vectors into global thermal load vector at each time step. Fortran 90 constructs and Connection Machine Scientific Software Library

: communications primitives. gather and scatter are used for both regular and irregular finite element mestes. Performance comparisons for these communication routines is also cam'ed out. Numerical computations for an unstructured meshes, such as a wing structure of an aircraft is presented to demonstrate the applicability of the proposed developments for computational fransient k a t transfer.

1. Introduction

For the past three decades, serial computers have dominated computer architecture. Further improvements in computational speed on these single processing units is limited by its technological limit. Hence, for large scale computations of future complex engineering systems and practical engineering applications parallel computers are becoming inevitable. In addition to the hardware and communication network designs. these parallel computers require development o f new algorithms. operating systems and programming methodologies.

Copyright 0 American Institute o f Aeronautics and Astronautics, Inc. All rights reserved.

Structural response induced by thermal effects is an imponant concern in many structural designs of advanced space transportation vehicles. large space structures. combat vehicles, automobile engines, and manufacturing. For example, extreme aerodynamic heating on advanced aerospace vehicles may produce severe thermal stresses that can reduce operational performance or even damage structures. Numerical simulation of these complex structures requires large order finite element models and excessive computational demands in both computational speed and data management. Parallel computers have brightened the future needs of these large scale computations. The implementation o f finite element method on a data parallel machines requires careful analysis of data structures and algorithms. For efficient algorithmic implementation on parallel computers, load balancing and communications between thc processors are very impomnt. Some of the other difficulties associated with the implementation of the finite element method on a data parallel machine include unsmctured meshes. various types of elements and interpolation functions on the same domain, etc. Implementation of finite element methods and data structures for Single-Instruction-Multiple-Data (SIMD) architectures have been done by Jhonson and Mathur [1-3]. Farhat et al. [4] and Belytschko et al. [5-61 among others. Cline et al. [71 have discussed implementation of iterative solver for 3D implicit thermal analysis on the Connection Machine CM-2. More recently Namburu et al. [SI discussed data parallel implementation o f an explicit procedure for computational structural dynamics on the Connection Machine CM-5.

This paper discusses the data parallel implementation of an explicit finite element procedure on the Connection Machine system CM-5 for evaluating thermal response o f structures. The two most important parallel implementation criteria that influence the efficiency of the algorithm are communication and load balancing. To minimize communications, and to improve the efficiency of the data parallel algorithm, we implemented the algorithm based on

nodal temperatures, elcmental teinpentures and elemental heat flux loads. As opposcd to EXCHANGE approach discussed by Belytschko et al. [5-61 for an explicit procedure, this approach involves GET or gather operation. that is, extraction of element nodal temperatures from global temperature vector and SEND or scatter operation. that is. dispersion of element thermal load vectors into global thermal load vector. Funher. these GET and SEND operations can be implemented using general CM-5 communication utilities and Fortran 90 constructs for both regular and irregular finite element domains.

An outline of the paper is as follows. In section 2. the energy equation and an explicit flux based finite element representation will be discussed. Section 3 discusses data parallel implementation and related issues. Section 4 discusses the performance studies of the connection machine for a one-dimensional problem, and a practical application problem, namely. thermal analysis of advanced aerospace vehicle’s wing due to aerodynamic heating. Finally conclusions are drawn in section 5

v

2. Governing Equations and F i n i t e E l e m e n t R e p r e s e n t a t i o n s

The governing classical heat conduction equation in domain n. can be written as

. q . . = 1.1 Q (1)

where p is the material density. c is the specific heat capacity, 0 is the temperature. Q is the heat generated per unit volume and q i is the heat flux. In the above a comma indicates partial differentiation with respect to the Canesian coordinate xi. Heat flux is related to the gradient of the temperature and is given by Fourier’s law as -

. k.. e . ( 2 ) IJ J qi =

Generally we require to solve a mixed boundary value problem with governing equation ( I ) being subjected to the following boundary and initial conditions

e = 0s on r , ( 3 4

q p i = q s - q h - q , o n r g (3b)

e(t=o) = eo (4)

where BS is the specified temperature condition on pan r A of the boundary and Eq. (3’5) is the specified flux condition on

represents the outward normal at any point on the boundary r B, q, is the surface heating rate per unit area. qh is the rate of heat flow per unit area due to convection,~q, is the rate of heat

flow per unit area due to radiation. and eo is the initial temperature.

Performing spatial discretization by standard finite element techniques leads to the following semidiscrete equation system

pan r of the boundary. Obviously r A + r = r and ni

v

C e + K O = Q ( 5 )

where C is the capacitance matrix. K is the conductivity matrix, which consists of contributions from conduction. convection and radiation. Q is the total heat load vector. which consists of contributions from heat generation. specified surface temperature. surface heating. surface convection, and surface radiation.

The set of ordinary differential equations ( 5 ) can be solved using one of the many recurrence schemes for time integration [ 9 ] . To avoid iteration and solving system of equations within each time step here we consider a simple explicit time integration procedure. A disadvantage of explicit integration is that it is only conditionally stable. Due to its inherent advantages [ 10-12]. Lax-Wendroff type of finite element explicit formulations are implemented in this paper. The finite element time discretized representations based on Lax-Wendroff type of finite element flux based formulations are expressed a

C A T ” + ’ = FY + F; + F’; ( 6 )

where

Tn++I . Tn (7) and

F2 = - A t Na ND dTe ( q i n i ) ( I O )

In the above Euler forward time integration representation(Eq. (6) ) , A T is the incremental temperature. At is the time increment and the vector C is related to the lumped heat capacitance. The solution Q. (6) is marched in time starting from initial conditions at time n=O at each time step At. until the total duration of the transient response is calculated. The vector F I represents heat transfer due to conduction. Material thermophysical propenies are introduced through the vector qi without disturbing the evaluation of element integrals. The quantity F2 physically represents the boundary term. Natural boundary conditions can be directly introduced through the vector (qini) without disturbing the evaluation of the element integral. Thus, an imposed surface heat flux, convection and radiation effects are naturally introduced in F2 via ( q i n i ) . The quantity F3 involves contribution from internal heat sources.

3. Data P a r a l l e l I m p l e m e n t a t i o n

This section discusses data parallel implementation of the explicit algorithm on the Connection Machine Chi-5. The

data parallel algorithm implenieiited in this paper is shown in Chan 1. Similar to Von Neumann approach the data is partitioned into elemental and nodal arrays. Funher. this approach distributes load equally on the processors and also retains the generality of the finite element method. Note that one has to be careful in load balancing. if one uses different types of elements. bsundary conditions and different incegration procedures on the same finite element domain. During the computations, the elemental m a y and nodal array are assigned to the processing nodes of the Connection Machine CM-5. The mapping of arrays to the memory ofthe processing nodes is controlled by the LAYOUT directives.

For large scale applic3rions. number of elements or nodes of the finite element mesh exceeds number of processing nodes on the Connection Machine system. As a consequence, several dements or nodes are assigned to each processing node by the CMF compiler. Funher. every processing node of the CM-5 system is equipped with four vector units. Assigning of more than one element or node to the processing node leads to the notion of vinual processing, that is. each processing node performs operations on 3 cenain number of elements or nodes of the finite element mesh in a serial fshion for each vector length.

The a m y s of the explicit formulation are allocated in to the memory of processing nodes by using CM Fonran. No communication between processing nodes is required in referencing array elements to the same processing node. However. for gather and assembly operations of the present explicit method involve two communication operations. First to extract element nodal temperatures from global temperature vector and second to disperse element thermal load vectors into global thermal load vector at each time step. Taking advantage of CM-5 architecture and communication networks. gather and scatter operations are implemented employing CM Fortran and CMSSL (Connection Machine Scientific Software Library) Scatter and g a t h e r utilities [13-14] for arbitrary communications. These communications routines very well

v

V' utilities gather and scatter routines and this inisal setup operation stores the trace of the routing activity. This setup operation is needed only once and the trace can be used again and again i f the connectivity and layout of the finite element mesh remains same. The implementation and performance of these CMSSL utilities for finite element applications on Connection Machine system CM-2 is given in [ 151.

Char t 1

1. Initial Conditions: e(t=o) = eo 2. Loop over time steps

3. GATHER operation

4.

(a) Ch.tk element nodal temperature from global temperature array

Perform element computations in pad^ 0 ) Lumped capacitance vector

(a) Evaluate nonlinear material properties based on the element nodal temperatures.

(b) Evaluate lumped elemental capacitance arrays. Note that for linear problems

this lumped capacitance vector need not be evaluated at every time step.

(a) Evaluate nonlinear material propenies based on the element nodal temperatures.

( i i ) External thermal load vector

W

(b) Evaiuare and add elenimt m a j s for thermal load.

5 . Perform nodal comoutations in ~arallel (a) 'Solve for te'mperatures employing

Es. (6) 6 . End time steps

~

4. Numerical Examples:

In this section, we denlonstrate the applicability of our proposed formularions on the data parallel computer staning from relatively simple test problems primarily to illustrate thc implementation, data structures, communications and funher to validate the proposed developments.

Test Model 1: Fin with Sharp Gradient '

A fin of uniform cross section of one and length of 10 has one of its ends maintained at T=O while a sudden temperature T=IO is applied at the other end. It is desired to evaluate temperature distribution across length ofrhc fin. The problem is modeled using 1000 linear two-noded conduction elements. Fig. I shows the temperature distributions across the length of the fin. This simple example is chosen primarily to study implementation and communications on data parallel computer CM-5. Fig2 shows the total time spent on the Connection Machine system for various routines in the explicit scheme for this one dimensional model. The time spen! for inpuUoutput routines are not shown here. The timings reported in Fig. 2 are for the example with loo000 two-noded conduction elements and 256 processing node partition size on the CM-5. CMSSL utilities scatter and gather are used for communications. As expected, the time spent on these

communication routines is substantially more than that spent for actual computation, namely thermal load vector for this linear problem

Test Model 2: Hypothetical Wing Structure with Aerodynamic Heating'

To establish confidence that the structural components are not overheated beyond the design temperature. analysis of the full wing of an advanced aerospace vehicle is desirable. As a preliminaty step. a hypothetical wing without any ribs and cross members is considered in this study. Funher, the wing is assumed to be exposed to severe aerodynamic heating. The top and bottom skins of the wing are subjected to aerodynamic heating rates as shown in Fig. 4. The wing section is modeled by combination of two dimensional quadrilateral and triangular elements. The finite element mesh for the wing considered here consists of 14236 elements and 14464 nodes. Triangular elements are implemented in the same array structures as quadrilateral elements. As the

' The results are based upon a test version of the software where the

emphasis was on providing functionality and the 1001s necessary to

begin resring the CMS vector unirs. This software release has no8 had

the benefits of oprimizalion or performance runing and. consequently. 1s

no1 necessarily represenlarive of the performance of h e full version o i

this software

number of additional operations involved for evaluating quadrilateral elements over triangular elements are not substantial, load balancing is not an issue. Fig.5 shows the finite element mesh for the wing structure considered f n th!s example. The initial temperature of the wing section is assumed to be zero. Fig. 6 shows the CM time for 5000 !ime 4' steps with 256 processing node partition size for var~ous

routines of the explicit scheme. Similar to the earlier example. the time spent for communications for the explicit scheme is substantially more than that spent for the actual computations. The Table 1 shows the time spent for communicalions using CM Fortran statements and CMSSL gather and sca t t e r utilities operating in scalar mode. Also, note that,the time spent for setup operation for the CMSSL utilities is inexpensive. From Table 1. it is evident that specialized and optimized communication routines are required to improve the performance of the explicit algorithm.

Tab le 1 Communicat ion t ime for 256 processing node partition using CMF and CMSSL

Setur, CM BUSY CM Elapsed CM Fortran (Gather) 0.0s 31.930s 32.842s

CMSSL (gather) 0.054s 13.094s 13.358s

CM Fonran (Scatter) 0.0s 43.505s 44.590s

CMSSL (scatter) 0.042s 19.907s 20.305s

5. Conclusions

W

The paper described implementation of an explicit flux based finite element method using data parallel language for solving transient thermal response of structures. In addition to its inherent advantages flux based representations with Fortran 90 constructs simplified the implementation process on the data parallel machine CM-5. It was observed that communication time dominates the overall efficiency of the explicit scheme discussed in this paper. For the example considered i n this paper, communication primitives gather and scatter provided by the Connection Machine Software Scientific Library improves the overall performance by a fanor of 5 compared to the Fortran 90 constructs for the examples considered in this oaoer. Note that the communication pnmitivc\~reponeJ ii tkis stud) are not vector operaions and us c3n expect 3 significdnt impro\emsni in the performance once u e s w n using wctor \enions ofihe ChlSSL pnmitives.

Acknowledgments The authors are pleased to acknowledge of this

research in part. to the Army Research Office contract number DAAL03-89-C-0028 with the University of Minnesota Army High Performance Computing Research Center, Minneapolis, Minnesota. Acknowledgments are also due to the U. S . Army Tank Automotive Reseach Development and Engineering Center, Warren, Michigan, Computer Sciences Corporation, Falls Church, Virginia. and to the Minnesota Supercomputer Institute, Minneapolis, Minnesota.

6. References I .

2.

3 .

4 .

5.

6.

7 .

8 .

9 .

10

11.

12.

13.

14.

Jhonson. S.L., and Mathur. K.K.. "Experience with Conjugate Gradient Method for Stress Analysis on a Data Parallel Supercomputer". Int. J. Num. Meths. Eng.. Vol. 27. pp 523-546 (1989).

Jhonson. S.L.. and Mathur. K.K.. "Data Structures and Algorithms for the Finite Element Method on a Data Parallel Supercomputer". Int. I. Num. Meths. Eng., Vol. 29, pp. 881-908 (1990).

Mathur. K.K.. and Jhonson, S.L.. "The Finite Element Method on a Data Parallel Computing System", Int. I. High Speed Computing. Vol. I . pp. 29-44 (1989).

Farhat, C., Sobh. N.. and Park. K.C.. "Transient Finite Element computations on 65,536 processors: The Connection Machine." Int. J . for Nume. Meths. in Engng.. Vol. 30, pp. 27-55 (1990).

Belytschko, T.. Plaskacz. E.J.. Kennedy. J.M., and Greenwell. D.L.. "Finite Element Analysis on the Connection Machine". Comp. Meths in Appl. Mech. and Eng.. Vol. 81. pp. 229-254 (1990).

Belytschko, T.. Plaskacz, E.J.. "SIMD Implementation of a Non-Linear Transient Shell Program with Partially Structured Meshes". Int. J . Num. Meths. Eng., Vol. 33, pp. 997-1026. (1992).

Cline, Jr. R.E., Boghosian, B.M.. and Nemnich. B., "Implementation of a 3D Thermal Analpis on the CM-2 Connection Machine Computer. In Proc. 1988 International Conference on Parallel Processing . pp.257-263, IEEE Computer Society. Aug. 1988.

Namburu, R.R., Turner, D.. and Tamma, K.K., "An Effective Data Parallel Implementation of Explicit Velocity Based Algorithm for Computational Structural Dynamics on the Connection Machine," presented at the 1992 ASME Winter Annual Meeting. Anahiem. CA (1992).

Hughes, T.J.R., The Finite Element Merhod: Linear Sratic and Dynamic Finire Element Analysis. Prentice- Hall, Englewoods Cliffs, NJ (1987).

Tamma. K.K.. and Namburu, R.R., "Explicit Lax- WendroffiTaylor-Galerkin Second-Order Accurate Formulations Involving Flux Representations for Effective Fmite Element Thermal ModelinglAnalysis." AIAA-89-0518, Reno, Nevada, January (1989).

Thornton. E.A., and Balakrishnan. N., "A Finite Element Solution Algorithm for Nonlinear Thermal Problems with Severe Gradients." AIAA-89-0520, Reno, Nevada, January (1989).

N a m b u r u , R.R.. a n d Tamma. K . K . , "ApplicabilitylEvaluation o f Flux-Based Representations for Linearfigher-Order Elements for Heat Transfer in Structures: Generalized yT-farnily," AIAA-91-0159.29th Aerospaces Sciences Meeting. Reno, Nevada. January (1991).

Thinking Machines Corporation, 245 First Street. Cambridge, MA 02 142, The Connecrion Machine CM-S Technicai Summary, 199 I .

CMSSL Release Notes for the CM-5, Version 3.0 6, TMC 1992.

15. Mathur, K.K., "On the Use of Randomized Address Maps in Unstructured Three-Dimensional Finite Element Simulations." Thinking Machines Corporation Technical Reporr CS90-4, Thinking Machines Corporation. Cambridge, MA (1990).

4

T = IO T = O

m ..# E - 4 E u

I= 0.1s l- ll!!-- 0 2 0 2 4 Distance 6 0 1 0

1. Tempera ture distribution o v e r length of fin with sharp gradient

2. Relat ive timings among var ious routines: two-noded linear e lements

3. Wing of an airplane

13600 w/m* F- Time, s

4. Finite element model of the wing

. .. c

a c 2 -. 1

CM Time

N

0 ScatterKMSSL

e - Gatther'CMF

- -. 3 -. 1

% r

CM Busy time

N 0 A 0

p1,

3 Gather 2 os

Time Integration

€ 9 a 3

% 0 5

0 s

Documents

[American Institute of Aeronautics and Astronautics 31st Aerospace Sciences Meeting - Reno,NV,U.S.A. (11 January 1993 - 14 January 1993)] 31st Aerospace Sciences Meeting - A data parallel