14
88 FUJITSU Sci. Tech. J.,33,1,pp.88-101(June 1997) UDC 551.509: 681.3.06 Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF vD. Dent vG-R. Hoffmann vP.A.E.M. Janssen vA.J. Simmons (Manuscript received May 19,1997) 1. Introduction Numerical weather prediction has been close to the forefront in its use of the most advanced available computing facilities for almost fifty years following the pioneering forecasting experiments .. reported by Charney, Fortoft and von Neumann. 1) Until recently, the computational requirement was determined principally by the need to model the complex behaviour of the atmosphere with as fine a resolution as possible, in order to produce a re- alistic simulation of the evolution of weather sys- tems within an elapsed time of at most a few hours. Use of finer resolution has been made possible both by substantial increases in computational power and by development of more efficient nu- merical techniques. Much improved and more com- prehensive representations of the dominant physi- cal processes have also been introduced. New types of observation have become available and better ways have been developed for processing obser- vations to define the starting state for the atmo- spheric model. The resulting increases in accuracy have been dramatic both for short-range forecasts (from a few hours to a few days ahead) and for medium-range forecasts (from a few days to a week or two ahead). For example, the level of accuracy reached on average by three-day forecasts in the first medium-range prediction experiments by Miyakoda et al. 2) in 1972 was reached typically by five-day forecasts in 1979/80 during the first win- ter of operational forecasting at ECMWF. A mea- sure of the improvement in ECMWF forecasts since 1980 is presented in Fig. 1. This indicates a further two-day increase in the forecast range at which a given level of accuracy is reached. The first operational ECMWF forecast model used finite-differences and covered the globe with a regular 1.875 o grid. In the vertical, atmospheric On 18 September 1996 the operational numerical weather forecasting activities of the European Centre for Medium-Range Weather Forecasts (ECMWF) were transferred to run on a Fujitsu VPP700/46 computer system, after many years of use of shared-memory Cray systems. The range of current forecasting activities is described, and further information is given on the computational design and performance of the atmospheric forecast model, and on the operation of the VPP700. Developments which will utilize planned enhancements of the Fujitsu computer system at ECMWF are summarized. Fig.1— Correlation between forecast and analyzed anomalies in the height of the 1000 hPa pres- sure surface for forecast ranges of one, three, five and seven days. Plots are based on monthly averages of values computed daily from 1 Janu- ary 1980 to 31 March 1997 for the extratropics. An annual running mean has been applied.

Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

88 FUJITSU Sci. Tech. J.,33,1,pp.88-101(June 1997)

UDC 551.509: 681.3.06

Use of the Fujitsu VPP700 for WeatherForecasting at ECMWF

vD. Dent vG-R. Hoffmann vP.A.E.M. Janssen vA.J. Simmons(Manuscript received May 19,1997)

1. IntroductionNumerical weather prediction has been close

to the forefront in its use of the most advancedavailable computing facilities for almost fifty yearsfollowing the pioneering forecasting experiments

..reported by Charney, Fortoft and von Neumann.1)

Until recently, the computational requirement wasdetermined principally by the need to model thecomplex behaviour of the atmosphere with as finea resolution as possible, in order to produce a re-alistic simulation of the evolution of weather sys-tems within an elapsed time of at most a few hours.Use of finer resolution has been made possibleboth by substantial increases in computationalpower and by development of more efficient nu-merical techniques. Much improved and more com-prehensive representations of the dominant physi-cal processes have also been introduced. New typesof observation have become available and betterways have been developed for processing obser-vations to define the starting state for the atmo-spheric model. The resulting increases in accuracyhave been dramatic both for short-range forecasts(from a few hours to a few days ahead) and formedium-range forecasts (from a few days to a weekor two ahead). For example, the level of accuracyreached on average by three-day forecasts in the

first medium-range prediction experiments byMiyakoda et al.2) in 1972 was reached typically byfive-day forecasts in 1979/80 during the first win-ter of operational forecasting at ECMWF. A mea-sure of the improvement in ECMWF forecasts

since 1980 is presented in Fig. 1. This indicates afurther two-day increase in the forecast range atwhich a given level of accuracy is reached.

The first operational ECMWF forecast modelused finite-differences and covered the globe witha regular 1.875o grid. In the vertical, atmospheric

On 18 September 1996 the operational numerical weather forecasting activities of theEuropean Centre for Medium-Range Weather Forecasts (ECMWF) were transferred torun on a Fujitsu VPP700/46 computer system, after many years of use of shared-memoryCray systems. The range of current forecasting activities is described, and furtherinformation is given on the computational design and performance of the atmosphericforecast model, and on the operation of the VPP700. Developments which will utilizeplanned enhancements of the Fujitsu computer system at ECMWF are summarized.

Fig.1— Correlation between forecast and analyzedanomalies in the height of the 1000 hPa pres-sure surface for forecast ranges of one, three,five and seven days. Plots are based on monthlyaverages of values computed daily from 1 Janu-ary 1980 to 31 March 1997 for the extratropics.An annual running mean has been applied.

Page 2: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

89FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

variables were represented at 15 levels and soilvariables at 3 levels. A ten-day forecast took be-tween four and five hours to produce on a single-processor Cray-l vector computer. The effectivecomputation rate was about 40 million floating-point operations each second (40 Mflops) includ-ing operational post-processing overheads, andabout 50 Mflops without overheads. A model us-ing the spectral transform technique was intro-duced with T63 truncation in 1983, and multi-pro-cessing was introduced when the horizontalresolution was increased to T106 in 1985 on a two-processor Cray X-MP machine. The computationrate of this model exceeded 300 Mflops on a four-processor X-MP in 1986, and the Gflops range wasreached when an eight-processor Cray Y-MP sys-tem was installed in 1990. The numerical formu-lations used operationally over this period havebeen reviewed in 3), and computational aspects ofthe multi-tasking spectral model discussed in 4).

Ritchie et al.5) have described the numericalformulation of the forecast model introduced op-erationally at ECMWF in 1991, and presentedsome computational details of its performance onthe Centre’s 16-processor Cray Y-MP/C90 com-puter. The horizontal resolution was increased toT213, vertical resolution was increased to 31 lev-els (L31), and a semi-Lagrangian treatment ofadvection was introduced with a time-step of 15minutes. A ten-day forecast could be produced inabout 75 minutes when run without overheadsusing 16 processors, at a computation rate ap-proaching 6 Gflops. The main operational forecastwas multi-tasked over 14 processors, and withpost-processing jobs and some other general workrunning alongside, was typically completed inaround two hours.

A stage has now been reached at which re-finement of the model resolution is no longer thepredominant direct reason for requiring highercomputing performance, at least for medium-range prediction. Better estimation of the initialstate offers an important path to more accurateindividual (deterministic) forecasts, although

there is certainly still scope for benefit frommodel improvement. A very promising, butcomputationally demanding, method for betterexploitation of observational data, particularlynewer types, is four-dimensional variational dataassimilation.6) Additionally, the use of forecast in-formation in the later medium range is hinderedby variations in forecast quality which reflect bothuncertainty in initial conditions and the varyingdegrees of predictability of different flow regimes.This has led to development of ensemble (proba-bilistic) prediction systems based upon multiplelow-resolution integrations from perturbed initialstates.7) It was principally to satisfy the computa-tional needs of advanced assimilation methodsand ensemble prediction that ECMWF installeda 46-processor Fujitsu VPP700 system in summer1996, providing about a five-fold increase in com-putational power over its C90 system. Enhance-ments of the Fujitsu system will have provided afurther increase by a factor of around five or moreby early 1999.

2. The operational ECMWF forecastingsystem on the VPP700

2.1 The T213L31 forecast modelECMWF has continued to use the T213L31

resolution for its primary forecast model sincetransferring its operational activities to theVPP700. A ten-day medium-range forecast is pro-duced daily from initial conditions valid for12UTC, and a three-day forecast is produced dailyfrom 00UTC data to provide boundary conditionsfor the limited-area short-range forecast modelsof a number of ECMWF Members States. Thesame T213L31 model is used in the process of as-similating observations described in section 2.2,and a lower-resolution version is used in the En-semble Prediction System (EPS) described in sec-tion 2.3. An account of the adaptation of this modelto run on the VPP700 and of its computationalperformance is given in section 3.

The distribution of the 31 levels at which at-mospheric variables are represented is illustrated

Page 3: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

90 FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

in Fig. 2. Coordinate surfaces rise up over oro-graphic features at low levels, and flatten withincreasing height to become surfaces of constantpressure with 20 hPa spacing for the uppermoststratospheric levels. The lowest level is located ata height of about 30 m and the lowest five levelslie within about the lowest kilometre of the atmo-sphere, where higher resolution is needed to modelturbulent exchange. Vertical resolution is finerthan 1 km over most of the troposphere. The up-permost level is at a pressure of 10 hPa, whichoccurs at a height of around 30 km.

Two different representations are used todescribe the horizontal variation of model fields.Series expansions in terms of spherical harmon-ics truncated at total wavenumber 213 are usedto represent vorticity, divergence, virtual tempera-ture and the natural logarithm of surface pres-sure, all of which are prognostic variables. Thefixed model orography is also represented spec-trally. Legendre and Fourier transforms are usedto evaluate fields at model grid-points, includingthe horizontal winds derived from vorticity anddivergence and the required horizontal deriva-tives. From these values, the (diagnostic) verticalvelocity and the horizontal pressure gradients arecalculated, and basic dynamical and thermody-namical tendencies evaluated, using a semi-Lagrangian scheme for the advection process.Additional variables are represented by their val-ues at the model grid-points. The three-dimen-sional fields are specific humidity, cloud water/ice

content and fraction of grid volume occupied bycloud, and the semi-Lagrangian advection schemeis used for these fields also. Soil moisture and tem-perature are predicted for four soil layers, andsnow depth is a further prognostic variable. Otherfields held constant during the forecast, such assea-surface temperature, vegetation and land-sur-face roughness, are also defined at the grid-points.The basic tendencies are augmented by sets oftendencies representing “parametrized” physicalprocesses not explicitly represented in the govern-ing “primitive” meteorological equations. Theseinclude the effects of incoming and outgoing ra-diation, latent heat release and other changesassociated with stratiform precipitation and con-vection, turbulent mixing, unresolved orographi-cally-forced wave motion, land-surface processesand wind-speed dependent ocean roughness. Thevariables represented at grid-points are thenupdated, and transformations made back to spec-tral space to update the fields represented byspherical-harmonic expansions. Further detailsand references are given in 5).

Examples of the model grid, indicating thecategorization of points as land or water, are pre-sented in Fig. 3. The points lie along lines of lati-tude which are located to facilitate the Legendretransform and which at T213 resolution are al-most uniformly spaced with a separation of 0.56o

or 62km. Points are distributed in the east-westso as to maintain a similar spacing in terms ofdistance, although the numbers of points aroundlatitude circles are constrained to be suitable foran efficient Fast Fourier Transform, and pointsare spaced slightly closer together near the poles.It can be clearly seen in Fig. 3 that points do notline up along lines of longitude (except at theGreenwich Meridian), but this does not cause anyfundamental computational problem. Further dis-cussion of this “reduced” grid is given in 8) and 9).

The horizontal resolution with which vari-ables are represented spectrally is lower than thegrid resolution. The truncation limit at totalwavenumber 213 corresponds to a smallest-re-

Fig.2— Distribution of the 31 model levels, showing thedependence on a variation in surface pressure.

Page 4: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

91FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

solved half-wavelength of 94 km, and the effec-tive resolution is reduced a little more by applica-tion of a strongly scale-selective horizontal diffu-sion. The finer grid-point resolution is used inspectral models to give a largely alias-free resultwhen the non-linear terms in the governing equa-tions are projected back into the truncated spec-tral space, and is essential for computational sta-bility in models using the traditional Eulerianadvection scheme. An impression of the coarserspectral resolution can be gained from the mapsof the model orography presented in Fig.4.

The T213L31 model is run on 18 processorsof the VPP700 to produce the operational ten-dayforecast in an elapsed time of about one hour. Theremaining processors are active at the same timecarrying out product-generation tasks, computingperturbations to the analyzed initial conditionsfor use in the EPS and subsequently beginning toexecute the EPS forecasts.

2.2 Variational data assimilationThe processing of observations to define the

starting conditions (the initial “analysis”) for theforecast model is carried out in six-hourly cycles.It uses observations made at or close to the cur-rent analysis time and a six-hour forecast fromthe previous analysis as “background” informa-tion. The “three-dimensional” variational analy-sis method (3D-Var) involves determining themodel state x at time t1

that minimizes a scalar“cost” function J(x) , given the background (fore-cast) estimate x

b of x and the set of observations o

for time t1. J is of the form:J = J

o + J

b + J

c

where Jo is a measure of the difference betweenobserved values and corresponding mode1 esti-mates (appropriate functions of x), J

b is a mea-

sure of the deviation of x from xb, and J

c is a safety

term which penalizes any changes that would lead

Fig.4— Distribution of the T213 model orography for thesame regions as shown in Fig. 3. Shading startsat a height of 150m, and colour changes occurwith a height interval of 300m.

Fig.3— Distribution of model grid-points for T213 resolu-tion over northern Europe and eastern Asia. Theblue, open circles indicate points treated as wa-ter or ice, and the green, solid circles indicatepoints treated as land.

Page 5: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

92 FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

to unrealistic inertia-gravity oscillations in thesubsequent forecast.

Jo is given generally by Jo =12 (H (x) - o)T 0-1 (H

(x) - o), where O denotes the covariance of obser-vation errors. H(x) maps model variables to ob-served variables, for example transforming modeltemperatures and humidities to radiance as mea-sured by satellite. J

b is of the form J

b =1

2 (x - xb) T B-1

(x - xb), where B denotes the covariance of errors

of the background forecast. Knowledge of B is farfrom complete, and the dynamical and statisticalbasis for the construction of J

b is an important el-

ement of variational data assimilation systems.The operational version of the ECMWF systemimplemented in January 1996 is described in 10);subsequent important developments for theVPP700 have been a new pre-analysis screeningand variational quality control of observations anda new method of estimating the background-er-ror variances.11) An improved formulation of J

b has

recently been introduced.The solution of the variational analysis prob-

lem requires knowledge of the gradient of J withrespect to x at each iteration of the algorithm usedfor the minimization. The gradient is computedefficiently using adjoint techniques, but the dimen-sion of x is nevertheless of the order of 6 × 106 fora model with T213L31 resolution for a combinedanalysis of wind, temperature and humidity. Theminimization problem is reduced to a more man-ageable size by formulating the problem in termsof the deviation of x from its background value,and solved only at lower horizontal resolution (cur-rently T63). Figure 5 presents an example of howthe cost function is reduced over the iterations ofthe minimization. The reduction comes from theJ

o term; J

b , and to a much lesser extent J

c , inevi-

tably increase as x is modified away from the back-ground value x

b. The sharp decrease of J

o after

about 30 iterations arises because of the introduc-tion of quality control of observations at this stageof the minimization.

Isaksen and Barros12) have discussed theparallelization of variational data assimilation. A

wide range of types of observation are used op-erationally, including in-situ measurements fromland-stations, balloons, buoys, ships and aircraft,and several types of remotely-sensed data fromsatellites. Their processing poses particular prob-lems for parallelization as they are distributed farfrom uniformly in space, and the amount of calcu-lation required, for example in computing the “ob-servation operator” H (x) , varies considerably fromone type to another. However, the current opera-tional configuration of 3D-Var is not especiallydemanding computationally, and the main six-hourly analysis task is parallelized over six pro-cessors and completes in an elapsed time of 30-40minutes. Other tasks (executed on either one orsix processors) require a further 15 minutes or so.

2.3 Ensemble predictionAn ensemble prediction system (EPS) has

been run daily at ECMWF since May 1994, fol-lowing successful results in thrice-weekly trialsstarted in December 19927). The principal aims areto provide an estimate of the skill of the deter-ministic T213L31 forecast, to provide possible al-ternative evolutions of the atmospheric circula-tion pattern, and to provide probabilities of theoccurrence of specific weather events. The systementails execution of a set of numerical forecastsusing a lower-resolution model, starting from suit-ably perturbed initial conditions. T63 horizontalresolution and 19-level vertical resolution were

Fig.5— Change in the cost function and its componentsover the iterations of a 3D-Var minimization.

Page 6: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

93FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

used in the original implementation of the sys-tem. A control forecast was run starting from atruncated and interpolated form of the high-reso-lution operational analysis, and a further 32 fore-casts were run from the perturbed initial condi-tions.

Acquisition of the VPP700, and algorithmicimprovements in model efficiency, enabled a sub-stantial improvement to be made in the configu-ration of the EPS in late 1996. The 31-level verti-cal resolution of the main T213 forecast wasadopted for the EPS forecast model, and the gridof this model was refined to be that appropriatefor a conventional Eulerian T106 model, with aspacing of about 125 km. The so-called “linear-grid”option was introduced, exploiting the fact that fora given grid-resolution, semi-Lagrangian modelscan run stably with a higher spectral resolutionthan can be used in Eulerian models. The spec-tral truncation was increased to 159, this linear-grid resolution being referred to as TL159. In ad-dition, the number of forecasts from perturbedinitial conditions was increased to 50.

Execution of the set of forecasts is the mostcomputationally demanding part of the EPS andis ideal for effective use of parallel processing. Thememory requirement is such that each perturbedforecast can be run in just two processors of theVPP700. The control forecast and 50 forecasts fromperturbed initial conditions are allocated 34 pro-cessors, and these processors can be kept well-oc-cupied working on the forecasts in batches of 17two-processor forecasts run in parallel. A net timeof around 2.5 hours is needed to complete the fore-casts.

The initial perturbations for the EPS are con-structed from the “singular vectors” that grow mostrapidly as deviations from the control forecast dur-ing the first two days of the forecast range.13) Thesingular vectors are combined linearly to produceinitial perturbations for the EPS that have a wide-spread geographical distribution and magnitudesconsistent with estimates of analysis error. ALanczos algorithm is used to compute the singu-

lar vectors, and it requires repeated evaluationsof an operator obtained by a forward two-day in-tegration of a model version which is linearizedabout the control forecast (and has simplifiedphysical parametrizations) followed by backwardintegration of the adjoint of this model. Some 70or so iterations are needed to obtain a suitableset of singular vectors, and the cost is thus equiva-lent to that of several hundred days of forwardintegration of the forecast model. At T42L31 reso-lution (as implemented operationally on eight pro-cessors) the calculation requires about 30% of thecomputation needed to execute the ten-dayT213L31 forecast.

2.4 Ocean-wave forecastingTen-day global forecasts of ocean-wave con-

ditions are produced daily using a wave modeldriven by the surface winds from the atmospheric-model forecast. The wave model is the third-gen-eration WAM model.14) It predicts the two dimen-sional wave spectrum and is based on an explicitdescription of the physical processes governingwave evolution, which include wind input, dissi-pation by white-capping and bottom friction, andnonlinear wave-wave interactions. The initialwave analysis is based on observations from thealtimeter on the European Space Agency’s ERS-2satellite and on background wave informationproduced by the wave model using winds from thebackground forecast of the atmospheric data as-similation. The method of optimum interpolationis used to adjust the background state towardsobserved values.15)

The move to the VPP700 enabled a signifi-cant increase in the spatial resolution of the wavemodel. Currently, the predicted wave spectrum has12 directions and 25 frequencies so there are 300degrees of freedom per grid-point. The grid is anirregular latitude-longitude grid, which has a con-stant latitudinal increment but adjusts the sizeof the longitudinal increment in such a way thatthe distance between grid-points is almost con-stant, much as in the atmospheric model. The reso-

Page 7: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

94 FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

lution is about 55 km and involves about 120,000grid-points. Hence, the WAM model solves about35,000,000 equations per time-step of 15 minutes.

The WAM model is coded so that the inner doloop is over the grid-points, to take full advantageof vectorisation. This is achieved by mapping thetwo dimensional grid into a one dimensional ar-ray, ignoring land points. On one processor theinner do-loop would have a length of 120,000, andthe elapsed time of a ten-day forecast would bethree hours or more. Speed-up has been achievedby distributing the grid-points over several pro-cessors and introducing message passing. Sincethe advection of wave energy is done by a simplefirst-order up-winding scheme, the overlap withneighbouring sub-domains may be precalculatedand used to determine the information needed inthe messages sent to the other processors. Fourprocessors are used operationally. The elapsedtime is about 50 minutes and the efficiency ofspeed-up is between 92% and 95%.

3. Implementation and computationalperformance of the ECMWFatmospheric model on the VPP700Migration of the atmospheric model from the

Cray shared-memory platforms required sourcecode developments to remove non-standard fea-tures from the Fortran language, and to imple-ment parallelism in a distributed memory envi-ronment. Fortran 90 features were utilized toreplace Cray-specific code features, and the MPImessage-passing technique was adopted. The codeis now fully portable to any platform with a For-tran 90 compiler. Although the message-passingapproach is rather manual and requires non-trivial effort from experienced programmers, therewards are a portable parallel code with poten-tially excellent parallel performance. The alter-native technique of using a compiler-specific setof directives to achieve a data parallel executionis much easier to implement but is unlikely to beportable or efficient on a range of parallel plat-forms.

The key to creating an efficient message-pass-ing version lies in the identification of the inher-ent parallelism. Calculations in the different al-gorithmic steps of the spectral model involve datadependencies in different directions. For example,the Fourier transforms involve dependent zonal-wavenumber coefficients and grid-point values,but can be carried out independently for differentmodel variables, levels and lines of latitude. Evalu-ation of a Legendre transform requires data forone particular zonal-wavenumber coefficient forall latitudes. In grid-point space, the physical com-putations involve vertical data coupling only. How-ever, the semi-Lagrangian advection scheme re-quires access to data from nearby grid-columns,since for a particular grid-column, departurepoints at the beginning of a time-step (and corre-sponding field values) have to be computed for theair-parcels which arrive at that column at the endof the time-step. A degree of local communicationis thereby introduced.

A transposition strategy is used to handle theproblem of accessing distributed data. With thisapproach, the relevant data is redistributed amongthe processors at various stages of the algorithmso that the arithmetic computations between anytwo consecutive transpositions can be performedwithout any interprocessor communication. Suchan approach is feasible because data dependen-cies exist only within one coordinate direction foreach algorithmic component. An overwhelmingpractical advantage of this technique is that themessage-passing logic is localized in a few rou-tines. These are executed prior to the appropriatealgorithmic stage, so that developers working withthe computational routines (which constitute thevast bulk of the model code) need have no knowl-edge of this activity.

The distribution of work has to be done care-fully to achieve good load balancing16). For ex-ample, in grid-space the total array of grid-col-umns (138346 for T213 resolution) can be evenlydistributed among any number of available pro-cessors such that each receives an almost equal

Page 8: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

95FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

amount of work. For a partition of 100 processors,54 will receive 1383 of the T213 data points, and46 will receive 1384 data points. Each processorworks on a sub-domain of the sphere whose shape(latitudinal and longitudinal extent) maybe ad-justed for optimal performance.

Such load balancing is “static” and is inde-pendent of the particular weather situation. Adynamic imbalance also exists due to variationsin the amount of computation for particular grid-columns arising from the meteorological condi-tions. For example, computations for a grid-col-umn near to the equator may have to account forsubstantial amounts of convective activity, whilethose for a polar column may have to deal withnone. This dynamic imbalance is much more dif-ficult to handle since, in principle, its reductionrequires a redistribution of grid-points to proces-sors at each model time step.

Special treatment of the reduced grid isneeded to achieve good vector performance. Thesimplest approach of vectorising over a completeline of latitude (a row) would result in a vectorlength varying from 640 for equatorial rows to only16 at the poles for the case in which each proces-sor is allocated one or more complete rows. To pre-vent inefficiencies due to short vectors, a buffer isprovided for each processor into which can bepacked data from any of the complete or partialrows assigned to that processor. The resultant “su-per-row” is presented to the grid-point computa-tional routines for processing. The buffer size isdefined at run time and this allows the user tooptimise vector length at the expense of thememory needed for temporary work-space. Thesuccess of this technique on the VPP700 may bejudged by utilizing the profiling tool. This indi-cates that the application achieves a total sus-tained floating point computational rate of about770 Mflops on each processor (35% of peak).

Figure 6 quantifies message-passing and load-imbalance inefficiencies when benchmark versionsof the model utilize 46 processors of a VPP700. Itcan easily be seen that these inefficiencies are

small compared to the computational time. Use ofa message-trace facility makes it possible to ob-tain detailed information about any message traf-fic. Most of the interprocessor traffic within themodel proceeds at close to the hardware speed ofmore than 500 Mbytes/s.

Performance of the T213L31 model using dif-fering numbers of processors of a VPP700 is shownin Fig. 7. The parallel speed-up is 15.4 with 16processors and 35.5 with 40 processors. This isbased on measurements of total elapsed time, andconfirms that the message-passing overheads arenot excessive.

4. Operation of the VPP700The Fujitsu VPP configuration was selected

by ECMWF in August 1995 to replace its CrayC90 and T3D systems, after careful preparationsand a thorough evaluation of responses to an In-vitation to Tender. Details of this process are givenin 17). The VPP700/46 was delivered and installedtowards the end of June 1996 and passed its Pre-liminary Acceptance Test on 2 July 1996. Duringits subsequent Performance Test it proved that itcould run a benchmark version of the ECMWFmodel at least five times faster than the Cray C90,both for a single T213L31 forecast and for a set of

Fig.6— Breakdown of the time spent in computation andin the message-passing associated with the trans-positions and semi-Lagrangian communication,and the loss due to load imbalance, for three ver-sions of the model using 46 VPP700 processors.

Page 9: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

96 FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

70% level, with usage by the operating systemholding more-or-less steady at well under the 10%level.

5. Future developments5.1 Data assimilation

The main current development in data assimi-lation is that of four-dimensional variational as-similation (4D-Var), the principal application forwhich the VPP700 was acquired. The formulationof 4D-Var may be set out in a way similar to thatof 3D-Var. The variational problem is to determinethe model state x at time t0 that minimizes thecost function J given observations over the time-range t0 ≤ t ≤ t1. The observation operator H(x) nowmaps model variables at time t0 to variables ob-served at time t , and involves a model integra-tion from t0 to t , followed by a transformation frommodel to observed variable as in 3D-Var. The back-ground state x

b for 4D-Var is the optimal forecast

valid at the end of the previous assimilation pe-riod.

Le Dimet and Talagrand18) showed how ad-joint methods can be used to calculate the requiredgradient of J with respect to x . The J

o term is the

most expensive computationally, and its gradientis computed in the following way. First, the fullmodel is integrated forward in time from t0 to t1,starting from the latest estimate of x , and storingthe forecast state at each time-step. Then, theadjoint of the model equations linearized aboutthe forecast states stored during the forward in-tegration is integrated backwards from t1 to t0. Theadjoint model includes forcing by a term whichdepends on the differences between the observa-tions and model estimates based on the storedforecast states. The backward integration startswith all fields set to zero, and the result at time t0gives the required components of the gradient ofJ

o with respect to x.

In practice, the “incremental” variationalmethod6) has been adopted, as in 3D-Var. Alower-resolution model (with simpler physicalparametrizations) is used to compute increments

T63L19 forecasts as used in the first version ofthe EPS. The VPP700 passed Final Acceptance on8 August 1996 after having successfully completeda 30-day Reliability Test. It became the main op-erational machine of ECMWF on 18 September1996. Its integration with the other componentsof the ECMWF computing environment is shownin Fig. 8.

The VPP700 system has proven to be verystable and no major outages have occurred. Theweekly mean availability of the system to usershas been 164.3 hours over the period since Au-gust 1996. Of the other 3.7 hours of the week, 2.3have been used on average for software mainte-nance and system sessions, and 1.2 have been lostdue to software crashes. Time used for hardwaremaintenance or lost due to hardware crashes ac-counts for the very small remainder.

The growth in utilization of the Fujitsu sys-tem has been similarly encouraging. Figure 9

shows the CPU usage of the complete system (in-cluding control and I/O processors) on a week-by-week basis since the beginning of October 1996.The VPP700 is now generally kept well suppliedwith streams of work based on operational andresearch versions of the forecasting system, andthese streams occupy the processors quite effec-tively. Net CPU usage has increased to above the

Fig.7— Number of days that can be simulated per day ofwall-clock time by the T213L31 model as a func-tion of the number of VPP700 processors allo-cated to the model.

Page 10: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

97FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

to the high-resolution initial state that reduce thedeviation from observations in the subsequenthigh-resolution forecast. A 4D-Var system using

a 6-hour assimilation period and T63 for the lower-resolution calculations is currently undergoingextensive testing, and is giving encouraging re-sults. Figure10 presents an example of a signifi-cantly improved three-day forecast of the positionand intensity of a storm over the northern Atlan-tic Ocean when the T213L31 forecast model is runfrom a 4D-Var rather than a 3D-Var analysis(Rabier, personal communication).

Future development of the system will includeextension of the assimilation period, enhancementof the resolution and parametrizations of thelower-resolution model, and extension of the typesof observation (particulaly satellite data) that areassimilated. An important and substantial addi-tional development will be that of a simplifiedKalman filter to provide dynamical estimates of

Fig.8— Computer configuration of ECMWF in October 1996.

Fig.9— Weekly CPU utilization of the VPP700/46.

Page 11: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

98 FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

the analysis and forecast errors within the assimi-lation period.11) Better knowledge of these is vitalnot only for improving the error covariances usedin the variational assimilation, but also for pro-viding more realistic initial perturbations for usein the EPS.

5.2 ModellingModel development is currently directed prin-

cipally towards using the linear-grid option for themain forecast model to increase spectral resolu-tion from T213 to TL319. This change is expectedto be accompanied by use of an improved basicdataset for the generation of the model orographyand related fixed fields. Later resolution changesare expected to entail a significant increase in thevertical extent of the model, with levels up to 0.05hPa, and a finer resolution of the planetary bound-ary layer. Enhancement of the Fujitsu computersystem in 1999 will permit a substantial increasein horizontal resolution, which may be taken toas high as TL639, with a 31 km computational grid.There will be ongoing refinements and revisionsof the treatments of the existing parametrizedprocesses, which may involve additional prognos-tic variables, for example separating cloud-waterand cloud-ice contents and introducing turbulentand convective sub-grid-scale energies. Ozone willalso be added as a prognostic variable, requiringits own parametrization of generative and destruc-tive processes.

5.3 Ensemble prediction systemImprovements to the EPS are expected from

a number of sources. Particular attention is beingconcentrated on the construction of more repre-sentative initial perturbations, especially throughtaking into account the estimated analysiserror and introducing more complete physicalparametrizations in the singular-vector calcula-tions. Accounting for uncertainty in the model for-mulation as well as in the initial conditions isanother issue for examination. Further increasesin model resolution and ensemble size will be pos-sible when the Fujitsu system is enhanced, andextension of the range of the EPS beyond ten daysis likely. As a relatively new type of forecastingactivity, continuing emphasis has to be placed onexploring ways in which the wealth of data pro-duced by the EPS can be processed to form sets ofuseful end-products.

Fig.10— Maps of mean-sea-level pressure for 12UTC 20February 1997. The upper plot shows the 4D-Var analysis. The corresponding 3D-Var analy-sis is similar. Forecasts started from 4D- and 3D-Var analyses for 12UTC 17 February 1997 areshown in the lower two plots. The contour inter-val is 5hPa.

Page 12: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

99FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

5.4 Wave predictionIt is planned to introduce a direct coupling of

the atmospheric and ocean-wave models in thenear future. At each time-step the surface windsfrom the atmospheric model will be used to up-date the wave model, which in turn will provide anew estimate of the surface roughness over sea tothe atmospheric model. In developing the softwarefor this, two different approaches to distributingthe amount of work per processor have been mar-ried. This was possible because the interactionbetween the two models, although occurring ev-ery time step, involves only a few fields. Makingthe global version of these fields available on ev-ery processor allows the coupling of wind andwaves to be accomplished in an efficient way. Useof the coupled models for the EPS forecasts willenable production of probabilistic forecasts ofocean waves. Other work in wave forecasting willbe directed towards improving the representationof the physics of wave generation and feedbackon the atmosphere, and increasing the use of wavedata from satellites. Such data will improve di-rectly the definition of the initial ocean-wave stateand improve indirectly the definition of the atmo-spheric initial state through use of the coupledsystem in data assimilation.

5.5 Experimental seasonal forecastingWork is also being undertaken on prediction

on the time-scale of one or more seasons. Such fore-casts will inevitably relate to the statistics of theweather within a season, rather than specificweather events, and will be probabilistic in na-ture, at least for the extratropics. As much of thepotential for seasonal weather prediction dependson an ability to predict relatively slowly-develop-ing anomalies in sea-surface temperature,19) de-velopment of a seasonal forecasting system re-quires coupling of an atmospheric forecast modelwith an oceanic circulation model, and implemen-tation of a data assimilation system to determinethe oceanic initial conditions. Such a system hasbeen assembled at ECMWF, and experimental

runs are being made in real time. It is expectedthat this activity will continue, with the systembeing run regularly, monitored and improved. Thisdevelopment may also be beneficial for the exten-sion of the EPS beyond ten days, since some rep-resentation of the evolution of the sea-surface tem-perature is thought likely to be needed to realizethe full potential for skilful prediction at the ex-tended medium range.

6. ConclusionThe Fujitsu VPP700 was delivered, installed

and accepted according to plan. After some initialdifficulties with getting to know the new archi-tecture, in particular its I/O behaviour, the sys-tem performance has reached acceptable levels ofCPU utilization, and has been well received bythe ECMWF user community. The VPP700 hasproven to be a reliable system which has enabledearly enhancement of ECMWF’s operational ac-tivities in ensemble prediction and ocean-waveforecasting. It is providing a good throughput forthe research experiments needed to test futureenhancements, in particular for the developmentof the four-dimensional variational assimilationsystem which is planned for operational imple-mentation later in 1997.

AcknowledgementAn efficient and effective range of operational

forecasting activities on the Fujitsu VPP700 hasbeen achieved through the dedicated efforts of verymany members of the Research and OperationsDepartments of ECMWF, and of the staff of Fujitsuin Europe and Japan, to whom we express ourgratitude.

References1)

..Charney, J.G., Fortoft, R.and von Neumann,J.: Numerical integration of the barotropicvorticity equation. Tellus, 2, pp. 237-254, 1950.

2) Miyakoda, K., Hembree, G.D., Strickler, R.F.,and Shulman, I.: Cumulative results of ex-tended forecast experiments. I: Model perfor-

Page 13: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

100 FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

mance for winter cases. Mon. Wea. Rev., 100,pp. 836-855 (1972).

3) Simmons, A.J., Burridge, D.M., Jarraud, M.,Girard, C., and Wergen, W.: The ECMWF me-dium-range prediction models: Developmentof the numerical formulations and the impactof increased resolution. Meteorol. Atmos.Phys., 40, pp. 28- 60 (1989).

4) Simmons, A.J. and Dent, D.: The ECMWFmulti-tasking weather prediction model.Comp. Phys. Rep., 11, pp. 165-194 (1989).

5) Ritchie, H., Temperton, C., Simmons, A.,Hortal, M., Davies, T., Dent, D., and Hamrud,M.: Implementation of the semi-Lagrangianmethod in a high resolution version of theECMWF forecast model. Mon. Wea. Rev., 123,pp. 489-514 (1995).

6) /Courtier, P., Thepaut, J-N., and Hollingsworth,A.: A strategy for operational implementationof 4D-Var, using an incremental approach.Quart. J.R. Meteorol. Soc., 120, pp. 1367-1387(1994).

7) Molteni, F., Buizza, R., Palmer, T.N., andPetroliagis, T.: The ECMWF ensemble predic-tion system: Methodology and validation.Quart. J. R. Meteorol. Soc., 122, pp. 73-119(1996).

8) Hortal, M., and Simmons, A.J.: Use of reducedGaussian grids in spectral models. Mon. Wea.Rev., 119, pp. 1057-1074 (1991).

9) Courtier, P. and Naughton, M.: A pole prob-lem in the reduced Gaussian grid. Quart. J.R. Meteorol. Soc., 120, pp. 1389-1407 (1994).

10) Courtier, P., Andersson, E., Heckley, W.,Pailleux, J., Vasiljevic, D., Hamrud, M.,Hollingsworth, A., Rabier, F., and Fisher, M.:The ECMWF implementation of three dimen-sional variational assimilation (3D-Var). PartI: Formulation. Submitted for publication.

11) Fisher, M., and Courtier, P.: Estimating thecovariance matrices of analysis and forecasterror in variational data assimilation.ECMWF Tech. Memo., No. 220, 26pp.

12) Isaksen, L., and Barros, S.R.M.: IFS 4D-varia-

tional analysis - Overview and parallel strat-egy. In: Coming of Age, Proceedings of the SixthECMWF Workshop on the Use of Parallel Pro-cessors in Meteorology, Eds. Hoffmann, G-R.,and Kreitz, N., World Scientific, Singapore, pp.337-351 (1995).

13) Buizza, R. and Palmer, T.N.: The singular-vec-tor structure of the atmospheric general cir-culation. J. Atmos. Sci., 52, pp. 1434-1456(1995).

14) Komen, G.J., Cavaleri, L., Donelan, M.,Hasselmann, K., Hasselmann, S., andJanssen, P.A.E.M.: Dynamics and Modellingof Ocean Waves, Cambridge UniversityPress,Cambridge, 532pp., 1994.

15)..

Lionello, P., Gunther, H., and Janssen,P.A.E.M.: Assimilation of altimeter data in aglobal third-generation wave model. J .Geophys. Res., C97, pp. 14453-14474 (1992).

16) Barros, S.R.M., Dent, D., Isaksen, L., andRobinson, G.: The IFS model -Overview andparallel strategies. In: Coming of Age, Proceed-ings of the Sixth ECMWF Workshop on the Useof Parallel Processors in Meteorology, Eds.Hoffmann, G-R., and Kreitz, N., World Scien-tific, Singapore, pp. 303-318, 1995.

17) Hoffmann, G-R.: Distributed Memory Sys-tems For ECMWF. In: Supercomputer 1996.

..H-W Meuer (Hrsg.). Munchen, ISBN 3-598-22413-3, pp. 69-77, 1996.

18) F-X. Le Dimet, F.-X. and Talagrand, O.: Varia-tional algorithms for analysis and assimila-tion of meteorological observations. Tellus,38A, pp. 97-110 (1986).

19) Palmer, T.N., and Anderson, D.L.T.: The pros-pects for seasonal forecasting- A review.Quart. J. R. Meteorol. Soc., 120, pp. 755-793(1994).

Page 14: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF · D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF solved half-wavelength of 94 km, and the effec-tive

101FUJITSU Sci. Tech. J.,33,1,(June 1997)

D. Dent et al.: Use of the Fujitsu VPP700 for Weather Forecasting at ECMWF

David Dent received the B.Sc degreein Mathematics from Sussex University,England, in 1967. He joined the UK Me-teorological Office in 1967, and movedto the Rutherford Laboratory in 1971.He joined ECMWF in 1976, and nowholds the position of Principal Analystin the Research Department. Hespecialises in the optimisation of codefor vector and parallel architectures andin computer benchmarking.

Geerd-R. Hoffmann graduated asDiplom-Mathematiker at the University

..of Tubingen, Germany, in 1969. Fol-lowing research work in artificial intelli-gence, he became Senior Lecturer for

..Computer Science at Tubingen in 1972and Head of the Operating SystemsGroup of the university computing cen-tre. After two years as Fellow at theData Handling Division of CERN inGeneva, he became Head of the Com-

puter Division of ECMWF in 1980. His main scientific interestsare now supercomputers, parallel systems, networks and datahandling. In October 1992 he was appointed Visiting Professorin the Department of Electronics and Computer Science atSouthampton. He has participated in the management of vari-ous Esprit projects since 1989. He is the Chairman of the bien-nial international workshops on “Use of Parallel Processors inMeteorology” and Chairman of the European RAPS (Real Appli-cations on Parallel Systems) consortium. From 1 July 1997, hewill be Head of Department at Deutscher Wetterdienst inOffenbach.

Peter Janssen received the M. Sc. de-gree in Theoretical Physics fromEindhoven University of Technology,The Netherlands, in 1975. In 1979 hereceived a Ph.D. on a subject in PlasmaPhysics from the same University. Hethen joined the Royal Netherlands Me-teorological Institute to work on thephysics of ocean waves until 1992. Dur-ing this period he enjoyed severalleaves of absence at CalTech(California

Institute of Technology), at the University of Rochester and atECMWF. He has worked at ECMWF since 1992, first as a se-nior scientist and later as Head of the Optional Wave Project.He is a member of the European Physical Society, the New YorkAcademy of Sciences and a fellow of the Royal MeteorologicalSociety.

Adrian Simmons pursued undergradu-ate and postgraduate studies in math-ematics at the University of Cambridge,England. He received a Ph. D. degreefrom there in 1972 for work on the dy-namical meteorology of the strato-sphere. He then carried out post-doc-toral research for six years with the UKUniversities’ Atmospheric ModellingGroup based at the University of Read-ing. He joined the Research Depart-

ment of ECMWF in 1978. As Head of the Numerical AspectsSection and later as Head of the Model Division he was closelyinvolved in the development of the Centre’s operational modelsuntil 1995, when he was appointed Head of the Data Division.His current areas of responsibility include development of thedata assimilation system, the use of satellite data, and ensembleand seasonal prediction. He has been active on scientific com-mittees of the World Meteorological Organization, and is a fel-low of the Royal Meteorological Society.