10
Pergamon Computers & Geosciences Vol. 22, No. 5, pp. 569-578, 1996 Copyright 0 1996 ElsevierScienceLtd 00!48-3004(95)00133-6 Printed in Great Britain. All rights reserved 009%3004/96 $15.00 + 0.00 AN OCEAN MODEL CODE FOR ARRAY PROCESSOR COMPUTERS D. J. WEBB James Rennell Division,* Southampton Oceanography Centre, Empress Dock, Southampton SO14 3ZH, (e-mail: [email protected]) (Received 26 July 1994; accepted 7 Augusr 1995) Abstract-This paper presents a new ocean general circulation model code designed for use with array processor computers. Present research on ocean circulation and climate prediction is limited by the available computer power, and it is expected that in the future, array processor codes will be used for most high-resolution studies of the ocean and climate. The new code is based on the GFDL modular ocean model, but it has been restructured so that the ocean can be partitioned among the different processors. Copyright 0 1996 Elsevier Science Ltd Key Words: Ocean models, Geophysical fluid dynamics, Array processors, Parallel processing, FORTRAN, Modular Ocean Model. INTRODUCTION The study of the ocean and its effect on climate has been restricted for many years now by the limited power of available computers. The problem arises from the large size of the ocean, the small scale of key physical features, such as the Gulf Stream, and the long integration times needed for both oceanographic and climate research. In order to make progress, it has been necessary for oceanographers to make effective use of the most powerful computers available. In this tradition this paper reports on a new code developed to run efficiently on the new generation of array processor computers. There have been two further major developments of the code. Cox (1984) reorganized the model to run efficiently on the Cyber 205 computer (whose vector processing units had a long start-up time). He did this by treating each slab of data as a single long vector. He also introduced a scheme for adding a number of model options. More recently, Pacanowski, Dixon and Rosati (1990) developed the GFDL (Geophysical Fluid Dynamics Laboratory, Princeton) modular ocean model (MOM) code. This is designed for Unix based computers and uses the Unix C compiler pre-processor to provide a wider range of model options. A large number of the models used for ocean circulation studies are of finite difference type, using fixed levels in the vertical and an Arakawa-B grid (defined later) in the horizontal. The first such model was developed by Bryan (1969) and most later models are related to this. In the late 1980s the improved size and speed of vector processing computers increased greatly the size of model that could be developed. This resulted in a dramatic increase in model realism (Bryan and Holland, 1989; The FRAM Group, 1991; Semtner and Chervin, 1992), and has led to a myriad of new developments in our understanding of ocean circulation. The first major revision was that of Semtner (1974). He introduced Takano’s (1974) scheme for including islands and reorganized the code so that it ran efficiently on the early vector processing computers. The main memory of these early machines was limited, so that most of the ocean model variables were stored on disk. Semtner used a series of slabs, each of which contained all the main ocean variables for each line of latitude in the model. These were buffered in and out of main memory asynchronously with the main computation, so that the main process- ing unit was in constant use. Even greater increases in computer power and model realism are expected in the next decade, but this is expected to come less from increases in raw CPU speed and more from the use of large arrays of similar processors. Each processor will consist of a central processing unit connected to main storage by a fast bus, with slower links to neighboring pro- cessors. Such an array can be used in a number of ways, but with finite difference models the most efficient structure is likely to be one in which each processor takes responsibility for its own volume of ocean, and exchanges information on boundary points with a limited number of neighbors. *A laboratory of the Natural Environment Research The slab organization scheme, introduced by Semt- Council. ner, is not well-suited for such a scheme. The simplest 569

An ocean model code for array processor computers

  • Upload
    dj-webb

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: An ocean model code for array processor computers

Pergamon Computers & Geosciences Vol. 22, No. 5, pp. 569-578, 1996

Copyright 0 1996 Elsevier Science Ltd

00!48-3004(95)00133-6 Printed in Great Britain. All rights reserved

009%3004/96 $15.00 + 0.00

AN OCEAN MODEL CODE FOR ARRAY PROCESSOR COMPUTERS

D. J. WEBB James Rennell Division,* Southampton Oceanography Centre, Empress Dock, Southampton SO14 3ZH,

(e-mail: [email protected])

(Received 26 July 1994; accepted 7 Augusr 1995)

Abstract-This paper presents a new ocean general circulation model code designed for use with array processor computers. Present research on ocean circulation and climate prediction is limited by the available computer power, and it is expected that in the future, array processor codes will be used for most high-resolution studies of the ocean and climate. The new code is based on the GFDL modular ocean model, but it has been restructured so that the ocean can be partitioned among the different processors. Copyright 0 1996 Elsevier Science Ltd

Key Words: Ocean models, Geophysical fluid dynamics, Array processors, Parallel processing, FORTRAN, Modular Ocean Model.

INTRODUCTION

The study of the ocean and its effect on climate has been restricted for many years now by the limited power of available computers. The problem arises from the large size of the ocean, the small scale of key physical features, such as the Gulf Stream, and the long integration times needed for both oceanographic and climate research.

In order to make progress, it has been necessary for oceanographers to make effective use of the most powerful computers available. In this tradition this paper reports on a new code developed to run efficiently on the new generation of array processor computers.

There have been two further major developments of the code. Cox (1984) reorganized the model to run efficiently on the Cyber 205 computer (whose vector processing units had a long start-up time). He did this by treating each slab of data as a single long vector. He also introduced a scheme for adding a number of model options. More recently, Pacanowski, Dixon and Rosati (1990) developed the GFDL (Geophysical Fluid Dynamics Laboratory, Princeton) modular ocean model (MOM) code. This is designed for Unix based computers and uses the Unix C compiler pre-processor to provide a wider range of model options.

A large number of the models used for ocean circulation studies are of finite difference type, using fixed levels in the vertical and an Arakawa-B grid (defined later) in the horizontal. The first such model was developed by Bryan (1969) and most later models are related to this.

In the late 1980s the improved size and speed of vector processing computers increased greatly the size of model that could be developed. This resulted in a dramatic increase in model realism (Bryan and Holland, 1989; The FRAM Group, 1991; Semtner and Chervin, 1992), and has led to a myriad of new developments in our understanding of ocean circulation.

The first major revision was that of Semtner (1974). He introduced Takano’s (1974) scheme for including islands and reorganized the code so that it ran efficiently on the early vector processing computers. The main memory of these early machines was limited, so that most of the ocean model variables were stored on disk. Semtner used a series of slabs, each of which contained all the main ocean variables for each line of latitude in the model. These were buffered in and out of main memory asynchronously with the main computation, so that the main process- ing unit was in constant use.

Even greater increases in computer power and model realism are expected in the next decade, but this is expected to come less from increases in raw CPU speed and more from the use of large arrays of similar processors. Each processor will consist of a central processing unit connected to main storage by a fast bus, with slower links to neighboring pro- cessors. Such an array can be used in a number of ways, but with finite difference models the most efficient structure is likely to be one in which each processor takes responsibility for its own volume of ocean, and exchanges information on boundary points with a limited number of neighbors.

*A laboratory of the Natural Environment Research The slab organization scheme, introduced by Semt- Council. ner, is not well-suited for such a scheme. The simplest

569

Page 2: An ocean model code for array processor computers

570 D. J. Webb

scheme that can be used with the Semtner code is to within it. In ocean models, temperature is represented make each processor responsible for a series of slabs. usually in the form of potential temperature (relative However, with a quarter-degree global model and 256 to a pressure of one atmosphere), because this processors, each processor is responsible for only five remains constant under adiabatic changes in press- slabs. Two of these lie on the boundary and so must ure. Salinity is a composite variable defined by an be exchanged with neighboring processors. As a international standard, which represents the com- result, 40% of the model data has to be sent along the bined effect of the different dissolved salts in the slow processor-to-processor communication paths. ocean.

A more efficient scheme is to split the ocean up in such a way that each volume contains fewer bound- ary points, so that a smaller percentage of the model data needs to be transferred between processors. Most ocean models have many more grid points in the horizontal than in the vertical, so a natural scheme is one in which each region is compact in the horizontal, and extends from the surface to the bottom of the ocean.

This paper reports on an ocean model code in which this conversion has been made. In developing the code, the individual processors are assumed to be either scalar processors or to have a limited vector capability. The code also has been kept as close as possible to MOM, so that it can make use of many of the MOM code routines and options without change. (Many of the latter are themselves derived from the routines of Semtner and Cox.)

The time evolution of such a system is defined using a momentum equation to give the change in velocity, and advectiondiffusion equations for the changes of temperature and salinity. The system also needs a continuity equation, an equation of state, and boundary conditions to be specified.

In ocean models, three important approximations are made to reduce the computational load. The first is to assume, in the continuity equation, that the ocean is incompressible. The second is to assume, in the vertical momentum equation, that the vertical velocity is small and that the terms involving it can be neglected. The third is to assume, in the horizontal momentum equations, that small changes in density can be neglected except where they affect the horizon- tal pressure gradient.

The present generation of compilers is not able to compile a model like this for an array processor computer without additional directives on how the calculation should be split. For some systems it is also necessary to introduce specific subroutine calls to pass messages between processors.

The resulting equations often are called the “primi- tive equations” (Bryan 1969). The horizontal momen- tum equation is,

au/at + tu .vju + waujaz

+fx u = -(l/p,)Vp +o,+F,. (1)

The present code does not include any such machine-specific changes, but it is structured so that such directives are needed only in the main program and in a single subroutine (subroutine “step”). The code has been used for two such specific implemen- tations using the Parallel Virtual Machine (PVM) message-passing software developed by Geist and others (1993). The first version, developed initially for an array of DEC Alpha workstations, uses a series of rectangular domains (Stevens and others, in prep- aration). The second, developed for a Cray T3D, allows domains of any shape (Webb and others, 1996).

The three-dimensional equations are,

advection-diffusion

as/at + cu .v)s + wasiaz = D, + F,, (2)

az-/at + cu .V)T + waT/az = D, + F,, (3)

and the pressure (or vertical momentum equation), incompressibility, and density equations are,

The next main section of the paper gives a brief overview of the model as developed by Bryan and Semtner (Bryan, 1969; Semtner, 1974). The following two sections are concerned with the major changes introduced in the new code, and report on the validation of the code. The final section describes problems that arise when implementing message- passing with the present code, and how the message- passing might be organized to minimize its effect on CPU performance.

THE OCEAN MODEL

The state of the ocean is described normally by the velocity, temperature, and salinity at each point

pg = - appiaz (4)

V.U + aw/az = 0 (5)

P = PV, s, PI. (6)

The main prognostic variables are u the horizontal velocity, w the vertical velocity, T the potential temperature, and S the salinity. The other variables, the pressure p, the density p, and the vertical velocity w, can be calculated from the prognostic variables.

In these equations t is time and f is the Coriolis term (equal to 222 sin(Q)), where R is the Earth’s rotation rate and 0 is the latitude. The terms D

represent diffusion and F the forcing. The horizontal velocity u is zero on solid sidewall

boundaries. The gradients of potential temperature and salinity normal to all solid boundaries (including the bottom) are also zero. The upper surface bound- ary conditions are used to specify the exchange of heat and fresh water at the air-sea interface and the stress acting on the ocean due to the wind.

Page 3: An ocean model code for array processor computers

Ocean model code 571

Bryan’s (1969) scheme for solving these equations splits the ocean up into a large number of boxes along lines of constant latitude, longitude, and depth. In a coarse resolution ocean model the individual boxes will have a width typically of 2” in the horizontal, and a thickness ranging from 20 m for layers near the ocean surface to 500m at depth in the ocean.

The state of the ocean is represented by the values of potential temperature, salinity, and horizontal velocity at the intersections of a three-dimensional grid. Topography is represented by assuming that the box surrounding each tracer (temperature and sal- inity) grid point is either a sea box or a land box.

The previous equations are then integrated over each box to give an equation in which the advection and diffusive terms are replaced by the fluxes through the boundaries of the box, and the other terms written in terms of averages over the box. The equations are then cast in finite-difference form with gradients calculated from values at neighboring boxes.

The model uses an Arakawa-B grid in the horizon- tal (Messenger and Arakawa, 1976). In this the velocity grid points lie at the corners of the tracer grid boxes. The advantage of the scheme is that it is more accurate than other schemes in representing the propagation of oceanic Rossby waves, in which the model grid-spacing is greater than the Rossby radius of the ocean. The Rossby radius is typically 25 km for the first internal mode of the ocean and becomes smaller for the higher modes. Even with high-resol- ution ocean models, which use grids of 20 km or less, the Arakawa-B grid is still preferred because of its better performance in representing the higher vertical modes.

The model is stepped forward in time in a series of discrete timesteps. It is most efficient computationally when the timestep is as large as possible. However, for stability reasons the timestep must be less than the time it takes a wave to move from one grid box to the next.

The fastest wave in the ocean, the external gravity wave, has a speed of about 250 m s-i. This is about 100 times faster than the speed of the next fastest wave, the first internal mode. As the external gravity wave has little effect on the large-scale circulation, it may be removed by filtering, enabling the timestep to be increased by a factor of 100 and speeding up the model greatly.

Bryan (1969) filters out the external gravity wave by placing a rigid lid on the ocean. He also splits the horizontal velocity field into its vertical mean (the barotropic velocity) and the difference from the mean (the baroclinic velocity). The latter is affected only by the slowly moving internal modes of the ocean, and so the corresponding momentum equation can be solved using a long timestep.

The rigid lid allows the momentum equation for the barotropic velocity to be converted into a stream function equation. It also means that the barotropic

CAGE0 22/S-E

velocity field is only affected by slowly moving Rossby waves and so, when solving the stream func- tion equation, a long timestep can be again used.

Bryan uses the leapfrog scheme (Messenger and Arakawa, 1976) to step the equations forward in time. This is efficient computationally and propagates waves correctly without any change in amplitude. Unfortunately the scheme is unstable when used with diffusive terms and so, for these, a Euler forward timestep is used. It can lead also to a splitting of the solution on even and odd timesteps. The latter error is controlled by introducing a single timestep using a different method, usually the Euler backward scheme, every 50 timesteps.

Further details about the model and the numerical schemes used are given by Bryan (1969) Semtner (1974), and Cox (1984). Good reviews of the properties of the different finite-difference schemes are given by Messenger and Arakawa (1976) and Roach (1976).

Organization of the Semtner code

In the code developed by Semtner, the main pro- gram initializes the model and then enters a loop which, on each pass calls subroutine step and takes the model forward by one full timestep.

Subroutine step contains the main loop over the latitude slabs. It organizes the asynchronous transfer of the slab data between disk and main memory, and calls routines clinic and tracer to solve the momentum and advection-diffusion equations for each grid point in the slab. Within clinic and tracer the code is organized as a series of loops over the horizontal and vertical indices, the horizontal index forming the inner loop. Equation (4) is used within clinic to calculate the pressure field, and Equation (5) is used by both clinic and tracer to calculate the vertical advective velocities.

The other major routine in this section of code is the equation of state routine. This calculates the density, Equation (6), for a full slab of data at a time. It is called first by clinic, when calculating the press- ure field, and it is called a second time, at the end of tracer, to check for the possibility that the updated temperature and salinity fields produce convective overturning of the ocean.

Once all slabs have been processed, step calls subroutine relax to solve the stream function equation, the barotropic forcing terms having pre- viously been calculated by routine clinic.

THE NEW CODE

Major changes

‘The new code is organized so that the ocean can be split up readily into columns of fluid, each the responsibility of a single processor. This is done by ensuring that all the loops over latitude and longitude occur in a single subroutine, subroutine step. This

Page 4: An ocean model code for array processor computers

572 D. J. Webb

calls further routines, like clinic and tracer as before, but these have been rewritten so that on each call these now work on a single vertical column of grid points.

The code assumes that all data resides in the main memory of one of the processors. The storage of three-dimensional arrays is reorganized also so that the index corresponding to the vertical coordinate is the innermost, fastest-varying index.

The assumption that all the data is held within main memory is reasonable given the design of array processors. It also means that the code is no longer constrained by the problems associated with disk input-output.

The result of this reorganization is that any changes which have to be made for a particular machine, message-passing scheme, or method of split- ting the ocean, should only affect the initialization of the model in the main program and subroutine step.

An additional advantage of the scheme is that because the vertical coordinate is used as the inner- loop variable, calculations can stop at the sea floor. When slabs are used, exactly the same computations need to be repeated for land points and for grid points below the sea floor.

One other significant change is the way in which the velocity field is stored. In Semtner’s scheme, the velocity stored in the slabs is the baroclinic velocity, that is, the velocity with the vertical mean removed. This was written to disk before the barotropic (or vertical mean) velocity was calculated, and it was inefficient to read and write all the slabs again in order to add the two velocities together and store the full velocity on disk. Instead the addition was left until the slabs needed to be read in, either at the beginning of a later timestep or prior to analysis.

The new code makes use of the fact that all variables are stored now in memory, to add the two velocity fields at the end of each timestep. This prevents the repeated calculation of the full velocity at later timesteps and means that the archived fields used for analysis now contain the full velocity.

The barotropic momentum equation

As discussed earlier, the barotropic part of the momentum equation is transferred usually into a stream function equation. In the standard code, subroutine relax solves this equation using a relax- ation scheme. The latter is used because it works equally well with regular and irregular coastlines and also it allows islands to be present.

The main loop of subroutine relax is over the latitude and longitude indices. In an array processor, this could be partitioned in the same way as the loops of subroutine step. However, when dealing with islands, an integration has to be made around the perimeter of each island. The latter may involve a number of processors, which would make the logic complex. It is also difficult to vectorize well, because

many of the island boundary points are not adjacent in memory.

To overcome the island problem, two alternative schemes have been proposed. The first is to replace the stream function equation by a pressure equation (Smith, Dukowicz, and Malone, 1990). This has the advantage that the island boundary condition does not involve a contour integration.

A second method, which also does not involve contour integration, is to solve the full barotropic momentum equation (or free-surface equation, Killworth and others, 1991). This has the advantage also that for the very large problems for which the present code is designed, the number of barotropic timesteps per baroclinic timesteps is fixed, whereas the number of iterations of the other schemes would increase as the number of grid points in the horizon- tal increases.

The improved free -surface co&

The present model uses a free-surface scheme which is an improved version of that of Killworth and others (1991). The original scheme included baro- tropic viscosity in the free-surface code to give stab- ility. The computational cost of this was high, and it was found that the damping introduced by the Euler backward timestepping scheme by itself was sufficient. If a,, is the barotropic velocity and h, is the surface elevation, then from Equation (I),

where,

duo/at = gVh, + F, (7)

F =

= -(I/~o)V~'+~,+Fuldz,

and

p’(f) = odw(T, S,pJ. s :’

F is calculated once each baroclinic timestep by subroutine clinic.

Integrating Equation (4) vertically,

ah,/at = v(huo), (8) where h is the ocean depth. Equations (7) and (8) are cast into finite-difference form on an Arakawa-B grid and timestepped using a Euler backward scheme. Because of the difference in the wave speeds in the ocean, about 100 barotropic timesteps are needed per baroclinic timestep. The code uses a Euler backward timestepping scheme for the barotropic fields. This damps out high-frequency waves and thus, prevents aliasing problems caused by the use of two different timesteps.

Within the free-surface code, pointers are used to identify the sections of the storage arrays containing data for the previous, current, and next timesteps. The scheme is similar to that used by the MOM code for the main timestepping loop. For consistency, the

Page 5: An ocean model code for array processor computers

Ocean model code 573

pointers for the latter scheme are updated now within the subroutine step.

Vertical advection of momentum

Webb (1995) has shown that at velocity points, the rounding errors associated with the finite-difference form of Equation (6) may be substantial and produce spurious vertical transfers of horizontal momentum. The effect is large for narrow ocean currents where the grid spacing and the scale of the current are similar.

The rounding error can be reduced by modifying the finite-difference scheme. Webb (1995) proposed one in which the vertical flux at velocity points is the average of the flux at the four surrounding tracer points. This scheme has been used in the present code.

The calculation of pressure

Computational efficiency is important in ocean models, and for this reason the scheme developed by Semtner makes extensive use of temporary arrays.

These store derived quantities, such as the pressure and north-south fluxes, and ensure that they do not have to be recalculated when moving from one slab to the next. Because it is designed for use with array processors, the present code make no assumptions about the order in which the vertical columns of grid points are processed. For this reason it is not possible to use small temporary arrays in the way that Semtner does. Instead, if a derived variable is to be stored, an extra full three-dimensional array must be allocated in the main memory.

The most critical of these fields is the pressure field. It is expensive computationally to calculate and if temporary storage is not used, the pressure field must be recalculated four times. Because the compu- tational saving is considerable and the extra storage requirement is modest, an option to precalculate the pressure field of each timestep is included in the new code.

In the case of the horizontal fluxes, the compu- tational saving is modest and the extra storage requirement is large. Four variables are involved (temperature, salinity, and the two components of

horizontal velocity) and two directions, so temporary storage of these variables requires eight additional three-dimensional arrays. A further two are needed for the horizontal volume fluxes used in the vertical advection calculations. The basic model uses twelve three-dimensional fields, so the flux temporary stor- age arrays would almost double the storage require- ment of the model. For this reason the present model does not use temporary arrays, but recalculates the fluxes when required.

Compatibility with the MOM code

The present code uses the same Unix C pre-pro- cessor directives as the GFDL MOM code and uses similar naming conventions. At present it contains none of the standard MOM code options and includes only a limited diagnostic calculation. How- ever, it is possible to add these without much difficulty.

One important difference between the two codes is that the MOM code makes widespread use of state- ment functions in the timestepping equations. This provides a convenient way of modifying the finite- difference scheme when required. Unfortunately, many compilers are inefficient at optimizing code which includes statement functions and so, as they come in the innermost loops of the model, they are not used in the present code.

There are also differences in the organization of some of the common blocks. These are associated mainly with replacing the stream function scheme by the free-surface model.

VALIDATION AND PERFORMANCE

An initial comparison was made between the results obtained from the GFDL MOM code with those from a stream function version of the present code, using the standard vertical momentum advec- tion scheme. The only differences observed were at the rounding level. This validated all the new code except the new advection scheme and the free surface model.

Table 1. Code performance. Table shows CPU time required for each model timestep when using free surface and stream function versions of the new code. For comparison, results are shown also for standard GFDL MOM code with and without “skipland” option. Tests were carried out on SUN Sparcstation 10 workstation with 128 Mbytes of memory, using 3” x 4” elobal model arid extending from 72”N to 72’S with MOM test global coastlines and &ean topography. Free surface code result is for 100 barotropic timesteps per baroclinic timestep. Other tests show times for 40

and 100 iterations of stream function solution

100 iterations 40 iterations

new code (stream fn) new code (free surface) GFDL MGM code* ’ GFDL MOM f + skioland)

2.8 set 6.0 set 9.3 set 7.6 set

2.4 set

8.0 set 6.4 set

*GFDL MOM code options were “rigidlid, oldrelax, diskless, cyclic, islands, constvmix, consthmix, restorst, timing.”

Page 6: An ocean model code for array processor computers

574 D. J. Webb

The new advection scheme was then added. As expected this reduced the r.m.s vertical velocity at velocity points, but otherwise its effect was negligible. The free-surface code was then added, and again its effect on the final steady-state solution was negligible.

The comparative speeds of the different versions of the code, carried out on a single Unix workstation using the 4” by 3” global test model provided as part of the MOM package, are shown in Table 1. The stream function version of the present code runs much faster than the MOM code. This is partly because the present model does not use statement functions. Also because it works with columns of points, instead of large latitudinal slabs, the percent- age of cache hits during the calculation will be larger than with the MOM code. Finally the present code does not have the overhead of the many options available with MOM.

stepping the free-surface scheme. The main loop is one over the large number of short barotropic timesteps. Within this, the timestepping pointers are updated and the program enters a loop which updates the free-surface model variables for all the ocean points. When more than one processor is used, this loop must again include checks that data have arrived from neighboring processors. In addition, once the new boundary variables have been calculated, these will need to be sent to neighboring processors which require them.

When the full number of barotropic timesteps have been carried out, the routine enters a final loop which adds the new barotropic and baroclinic velocities. When more than one processor is used, this loop should also be used to send the new boundary values of temperature, salinity, and velocity to processors which require them.

The free-surface scheme is slower, primarily because each barotropic timestep requires more float- ing point operations than a single iteration of the stream function code. The difference would be reduced for large models where more iterations of the stream function solution are required.

enter step Y start loop over columns

4 3D

USE ON ARRAY PROCESSING COMPUTERS

The present code purposely does not include mess- age-passing directives or MPP Fortran constructs, but it should be possible to adapt it for most software and hardware systems. The simplest system to im- plement would be one in which the compiler handles the passing of messages between processors, with domain decomposition either left to the compiler or implemented with high-level directives.

step baroclinic velocity

The problem with this approach is that, unless it is very sophisticated, the compiler does not know which variables need transferring between processors until just before they are required. As a result, much time can be spent waiting for data to arrive.

4 2D

+ 2D

The alternative scheme is to use message-passing software which ensures that the data is in place before it is needed. The present code is structured so that the key calls to such routines would all occur in subrou- tine step. The structure of this routine is shown in Figure 1, the code is given in the Appendix. The full model code may be downloaded by anonymous FTP from the IAMG server at IAMG.ORG.

I

The routine is called once each baroclinic timestep. After setting pointers for the current baroclinic timestep, it enters a loop over the vertical columns of ocean points. This loop calls routines to set the vertical boundary conditions, and to solve the baro- clinic momentum equation and the advection diffu- sion equations for temperature and salinity. When more than one processor is used, this loop will be over points allocated to the current processor and it will need to include a check that boundary data have arrived from other processors before they are used.

start loop over columns -c of sea points

I add baroclinic and

barotropic velocities

) 3D

Figure 1. Flow diagram for subroutine step. Dotted lines indicate where, in message-passing implementation, bound- ary data need to be sent from one processor to another, and tests made to check that data have arrived. 2D refers to boundary data from two-dimensional fields of free-surface model, and 3D to boundary data from main three-dimen-

The next section of code is involved with time- sional fields of temperature, salinity, and velocity.

Page 7: An ocean model code for array processor computers

Ocean model code 575

Tests carried out with the Dresent code using PVM Processes: Proceedings “Aha Huliko” a 89, Hawaii

message-passing software (Geist and others: 1993) Institute of Geophysics, Hawaii, 362 p.

have shown that the critical region of code is con- Bryan, K., 1969, A numerical method for the study of the

cerned with the free-surface timestepping loop. A circulation of the world ocean: Jour. Computational Physics, v. 4. no. 3. D. 347 -376.

large number of short messages need to be sent and. Cox, M. D., 1984, A primitive equation, 3-dimensional model of the ocean: GFDL Ocean Group Technical Rept No. 1, Geophysical Fluid Dynamics Labora- tory/NOAA, Princeton Univ., Princeton, NJ, variously paged.

especially if these are not s&t asynchronously, the set-up times and interference between messages can cause significant delays.

Two groups have developed message-passing ver- sions of the present code. The first (Beare and Stevens, in preparation) use the free-surface version of the present code, PVM message-passing software and split the ocean into rectangular regions, each the responsibility of one processor. The resulting code has been run on a cluster of DEC Alpha workstations.

The second group (Webb and others, 1996) use the present free-surface version of the code also with

Geist, A., Beguelin, A., Dongarra, J., Manchek, R., and Sunderam. V.. 1993. PVM 3 user’s guide and reference manual: Gak .Ridgc National laboratory, Oak Ridge, Tennessee, 108 p.

Killworth, P. D., Stainforth, D., Webb, D. J., and Paterson, S. M., 1991, The development of a free-surface BryanCox-Semtner ocean model: Jour. Physical Oceanography, v. 21, no. 9, p. 1333-1348.

Messenger, F., and Arakawa, A., 1976, Numerical methods used in atmospheric models: GARP Publication series, No. 17, World Meteorological Organisation, Geneva, 64 D.

PVM. The scheme used is very general in that it Pacanowski, R. C., Dixon, K., and Rosati, A., 1990, The

allows the ocean to be split into irregularly-shaped GFDL modular ocean model users guide: version 1.0,

domains and, as far as possible, the message passing GFDL Group Technical Rept No. 2, Geophysical Fluid

is handled asynchronously. This version of the code Dynamics Laboratory/NOAA, Princeton Univ., Prince- ton, NJ. variouslv oaged.

also allows meteorological data to be inout. and Roach; P. J:, 1976, Cbmp&ational fluid dynamics: Hermosa _ archive and diagnostic information to be merged from the different processors. The scheme has been run on a cluster of SUN workstations and on a Cray T3D.

Acknowledgments--I wish to acknowledge the important contributions of K. Bryan, A. J. Semtner, M. D. Cox, and the GFDL MOM group to the development of this pro- gram. Thanks also to M. Ashworth at POL for discussions on array processor architectures and codes and to B. de Cuevas and A. C. Coward at IOS for their support.

REFERENCES

Beare, M. I., and Stevens, D. P., in preparation, A general purpose parallel modular ocean model.

Bryan, F. O., and Holland, W. R., 1989, A high resolution

Publishers, Albuquerque, NM, 446 p: Semtner, A. J., 1974, A general circulation model for the

world ocean: Technical Rept No. 9, Department of Meteorology, Univ. of California, Los Angeles, 99 p.

Semtner, A. J., and Chervin, R. M., 1992, Ocean general circulation from a global eddy-resolving model: Jour. Geophys. Res., v.97, no. C4, p. 5493-5550.

Smith, R. D., Dukowicz, J. K., and Malone, R. C. 1992, Parallel ocean genera1 circulation modelling: Physica D, v. 60, no. 14, p. 3861.

Takano, K., 1974, A general circulation model of the world ocean: numerical simulation of weather and climate: Technical Rept No. 8, Department of Meteorology, University of California, Los Angeles, 47 p.

The FRAM Group, 1991, An eddy-resolving model of the Southern Ocean: EOS, v. 72, no. 15, p. 169-174.

Webb, D. J., 1995, The vertical advection of momentum in ocean models: Jour. Physical Oceanography, v. 25, no. 12, o. 31863195.

simulation of the wind- and thermohaline-driven circula- Webb, D. J., Coward, A. C., de Cuevas, B. A., and Gwillam, tion in the North Atlantic ocean, in Muller, P., and C. S., 1996, A multi processor ocean general circulation Henderson, D., eds., Parameterization of Small-Scale model using message passing (in preparation)

APPENDIX

Subroutine srep

subroutine step C C===___--======_____--==== --_-- -------

c step is called once per timestep. it includes all the main c loops over ic and jc and calls to the main routines. C c====------=============== ------

C

C

#include “param.h” C

#include “sca1ar.h” #include “switch.h” #include “timelvb” #include “s1abs.h” #include “freesh” C

#include “cdiag.h”

Page 8: An ocean model code for array processor computers

D. J. Webb

C

c ____________ ______ ______ _____ ______.______________ _____ _______ ____ ___ ____-______ ___ _____

c update pointers for new value of itt. c nnp, nnc and nnm are not changed during a timestep c np, nc, nm may be modified during a forward or Euler backward c timestep. c __ _ __ ___________--______ _ __-------- .____ _ __ ___________ -_____ __ ___ ____-----________- ----.

C

nnc = np nnm = nc nnp = nm np = nnp nc = nnc nm = nnm

C C ___________ _______ __ ____ __ ______---__ ________ ____ _ _____ ________________________________.

c adjust various quantities for normal/mixing timesteps c ______________________________________________________________________________________-

C

mxpas2 = .false. eots = .true. if (mixts) then

if (eb) eots = .false. nm = nnc c2dtts = dtts c2dtuv = dtuv

else c2dtts = c2*dtts c2dtuv = c2*dtuv

endif C C _______________---_____________----____________________________________________________-

c return here for second pass of Euler backward timestep c -------------- --- ----------------- -- ______________-_---__________--~~~----___________-~_

C

100 continue # ifdef presetp

C

C _______ __ _________ ___ ______ _ _______. ___________________________________________________-

C

c precalculate the baroclinic part of the pressure field c for use by subroutine clinic C C _____ __ _____________________________ ___________________________________________________-

C

do 150 jc = 1,jmt do 150 ic = 1,imt

call setp (ic,jc) 150 continue

# endif C C _________ _ ________ _ _______ __ ________ ___________________________________________________-

c main baroclinic timestep loop over grid cells, from south c to north and from west to east c 1. set vertical boundary conditions (surface & bottom) c 2. calculate internal mode velocities c 3. calculate tracers c ____~~~~~~~~~~~~__ _ _______~~~~~~~~~_ ______ ____________ _______ ____________ _________ _____ _

C

do 200 jc = 2,jmtml do 200 ic = 2,imtml

call setvbc (ic,jc) call clinic (ic,jc) call tracer (ic,jc)

200 continue C C _______~~~~~~~~~~~__ ___ ___~~~~~~~~~_ ___________ _______ ____________ _______ _______________

c run free surface model (except during the second part c of a baroclinic Euler backwards timestep). c first initialize pointers c ~~~~~~~~~~~~~~~~~___~~~~~~~~~~~~~~~~_ _____ ________ __ __________________ _________________

Page 9: An ocean model code for array processor computers

Ocean model code

C

iQ.not.mxpas2) then do 600 It = I,ntbt

nnc0 = np0 nnm0 = nc0 nnp0 = nm0 np0 = nnp0 nd) = rind) nm0 = rind) frpasl = .true.

C

c _______--______________________-~~~_________~~~-________-~~____________________________

C use a Euler backward scheme. this requires two passes. c -______ - - - - - - _ ______ - --_ _ _ _ __ _ _ _ - -_ ___ ____ _ _ _ -_-___ _ __ _ _ _ . _ _ __ ___ _ _ _ __ _______ ___ ___ ___ _ _

C

do 500 lb= 1.2 C

c -- - ____- - - - - - - - _ ____--- -- - - _ _ _ _ _ -- - - -- -__ ____ --- -- - __ _ _ _ _ _ - - - - _ -_ _ __ ___ _-_ _____ __ __-____ _

c main free surface model loop to carry out a c partial timestep for each model point c _______~_~~_________~~~~________~~~~~~_______~~~~~_______~~~~~________~~___________~_____

C

do 300 jc = 2,jmtmI do 300 ic = 2,imtmI

call frees(ic.jc) 300 continue

C

c ______ _ ____________ _ _________________________--_______________________--___________-____

c set cyclic boundary conditions for the free surface model c ~______---_~~~______---~~~~~____~-~~~~~~~____---~~~~~~~___---~~~~~____~--~~~~~_____--~~~

C

do 400 jc = 2,jmt-I hO( I,jc,npO) = hO(imtml,jc,npO) hO(imt,jc,npO) = hO( 2,jc,npO) uO( l,jc,npO) = uO(imum I, jc,npO) uO(imu, jc,npO) = uO( 2, jc.npO) vO( l,jc,npO) = vO(imumI,jc,npO) vO(imu,jc,npO) = vO( Z,jc,npO)

400 continue C

c ____ _ __ ____ ______ _ ___ _ _ _ __ _ __ ___ _______ _ _ _ _ _ _ _ _ _ _ __ _ ____ _ ___________ __ _ ______ _ _ _ _ _ _ _ _ _ _ _

c reset pointers at end of first pass of the free surface model c ~~______-_~~~__~____--~~~~~~~___~---~~~~~____~----~~~~~___-------~~~___------~~~~__~----

C

if(frpasI)then frpasl = false. nc0 = nnp0 np0 = nnm0

endif 500 continue 600 continue

endif C

c -_ __ _ _ _ _ --- --__ _ _ _ __ - -- - - - _____ __ - - - - - _ _ _ _ _ __ _- - - -- ______ __ -- - -____ _ __ _ - - - - - _ _ _ _ _ _ _ _- - - - _

c end of free surface model c now add barotropic velocities to baroclinic velocities c ___ ____ _ - - _ _ _ _ _ _ _ ___ ______ _ __ _ _ _ _ _ _ _ __ _ _ _ _ _ _ ________ __ _ __ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _

C

do 700 jc = 2,jmtml do 700 ic = 2,imtmI

call addv(ic, jc) 700 continue

C

C _____________________~~~~~_______~~~~~________~~_~__~______~~~_________~~________________

c set cyclic boundary conditions for the baroclinic model c - - _ _ _ _ _ _------_ __ _ __ _ -- -- __ ____ __- - - - _ _ _ _ _ _ ___-- ___________ --_ _ __ __ _ _ _ _ - _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _

577

Page 10: An ocean model code for array processor computers

578 D. J. Webb

c do 800 jc = Z,jmt-1 do 800 k = 1,km

u(k, 1, jc,np) = u(k,imuml, jc,np) u(k,imu,jc,np) = u(k, 2, jc,np) v(k, ljc,np) = v(k,imuml,jc,np) v(k,imu,jc,np) = v(k, 2, jc,np) do 800 n = 1,nt

t(k, l,jc,n,np) = t(k,imtml,jc,n,np) t(k,imt,jc,n,np) = t(k, 2,jc,n,np)

800 continue C

c ____ _ __ ___________ _____ _ _ _ ___ _ __ ___________ __ __- _ ___________________ __ -___ __ _ __ _ _--

c if this is the end of the first pass of a Fuler backward c timestep then set the pointers for the second pass. c ______________ _ ________________________________________~~~~_______~~~~________~~~__

C

if (mixts.and.eb)then eots = .true. nc = nnp np = nnm mixts = .false. mxpas2 = .true. go to 100

endif C

c --___________________~~~~______~~~~~~_____~~~~~~____~~~~~~~~____~~~~~~~_____~~~~~~~

c collect timestep statistics c -- _______ - -- - __ ______ _---_______ __ --________----___ __ ___- --_________---____________

C

if(prntsi.and.eots) then

900

910

920

ektot = c0 do 900 n = 1,nt tddt(n) = CO dtabs(n) = d) tvar(n) = CO continue do 910 jc = 2,jmtml do 910 ic = 2,imtml call diag(ic, jc) continue ektot = ektot/volume do 920 n = 1,nt tddt(n) = tddt(n)/volume dtabs(n) = dtabs(n)/volume tvar(n) = tvar(n)/volume continue

endif C

c -__ __ _________________ __ _ _____ _____ ______ __ _ ___ __ _____________ ____ __ ___ ___ _ __ ___ ___ _

c if this is the end of either a forward or Euler backward c timestep then reset the pointers c -___ __ ____________________ ___ _ _ ______________ ___ __ _________________ __ _ __ _______ __ ___

C

if(mxpas2)then nc = nnc nm = nnp

endif if(mixts)then

nm = nnm endif

C

return end