14
Increasing the efficiency of GEOS-Chem Adjoint model runs using a Python ensemble manager. Andre Perkins Academic Affiliation, Fall 2012: University of Wisconsin - Madison SOARS ® Summer 2012 Science Research Mentor: Daven Henze Writing and Communication Mentor: Jana Milford ABSTRACT Methane is a powerful greenhouse gas with uncertainty regarding the strengths and trends of various individual sources. These uncertainties make it difficult for researchers to determine the exact reasons behind methane’s variable annual growth rate and the stabilization of the atmospheric concentration over the past three decades. It is possible to estimate individual methane source emission values using satellite measurements and inverse modeling techniques, although data quality limits how well individual sources are resolved. GEOS-Chem Adjoint is the combination of an atmospheric chemical transport model (GEOS-Chem) with an adjoint model, and it can be used to test the emissions source resolving power for actual and theoretical satellite retrievals of methane. In order to test the resolving power, the mathematical calculations require a large number of individual simulations to be run, but currently in the standard version of GEOS-Chem Adjoint each simulation needs to be manually set up and initiated. To overcome the need for manual setup and executions of model runs, a manager script was created using the Python programming language. The ensemble manager script, PyEnsemble, automates the process of creating multiple unique simulations, and can run variable numbers of simulations on many different types of clustered computer systems. PyEnsemble marks the first step in a larger project of testing the current accuracy of methane surface emission estimates, and helping to develop ways to help further constrain them. This work was performed under the auspices of the Significant Opportunities in Atmospheric Research and Science Program. SOARS is managed by the University Corporation for Atmospheric Research and is funded by the National Science Foundation, the National Oceanic and Atmospheric Administration, the Cooperative Institute for Research in Environmental Science, the University of Colorado at Boulder, and by the Center for Multiscale Modeling of Atmospheric Processes.

Increasing the efficiency of GEOS-Chem Adjoint model runs

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Increasing the efficiency of GEOS-Chem Adjoint model runs

Increasing the efficiency of GEOS-Chem Adjoint model runs using a Python ensemble manager.

Andre Perkins

Academic Affiliation, Fall 2012: University of Wisconsin - Madison

SOARS® Summer 2012

Science Research Mentor: Daven Henze Writing and Communication Mentor: Jana Milford

ABSTRACT

Methane is a powerful greenhouse gas with uncertainty regarding the strengths and trends of various individual sources. These uncertainties make it difficult for researchers to determine the exact reasons behind methane’s variable annual growth rate and the stabilization of the atmospheric concentration over the past three decades. It is possible to estimate individual methane source emission values using satellite measurements and inverse modeling techniques, although data quality limits how well individual sources are resolved. GEOS-Chem Adjoint is the combination of an atmospheric chemical transport model (GEOS-Chem) with an adjoint model, and it can be used to test the emissions source resolving power for actual and theoretical satellite retrievals of methane. In order to test the resolving power, the mathematical calculations require a large number of individual simulations to be run, but currently in the standard version of GEOS-Chem Adjoint each simulation needs to be manually set up and initiated. To overcome the need for manual setup and executions of model runs, a manager script was created using the Python programming language. The ensemble manager script, PyEnsemble, automates the process of creating multiple unique simulations, and can run variable numbers of simulations on many different types of clustered computer systems. PyEnsemble marks the first step in a larger project of testing the current accuracy of methane surface emission estimates, and helping to develop ways to help further constrain them. This work was performed under the auspices of the Significant Opportunities in Atmospheric Research and Science Program. SOARS is managed by the University Corporation for Atmospheric Research and is funded by the National Science Foundation, the National Oceanic and Atmospheric Administration, the Cooperative Institute for Research in Environmental Science, the University of Colorado at Boulder, and by the Center for Multiscale Modeling of Atmospheric Processes.

Page 2: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 2

1. Introduction Atmospheric methane (CH4) is an important long-lived greenhouse gas (LLGHG) under investigation because of uncertainties regarding its atmospheric sources. In the 2007 Intergovernmental Panel on Climate Change Fourth Assessment Report (IPCC AR4), Forester et al. (2007) presented the global atmospheric concentrations of CH4 at 1774.62 ± 1.22 ppb with a radiative forcing of approximately 0.48 ± 0.05 Wm-2. This account placed CH4 second in the ranking of LLGHGs in the atmosphere, right behind CO2. Methane concentrations increased by a factor of approximately 2.5 since the pre-industrial era, (Prather et al., 2001) and increased 30% in the last 25 years (Forester et al., 2007). These levels are believed not to have occurred at any point in the last 650,000 years (Spahni et al., 2005). This increase in atmospheric concentration of CH4, and the consequent radiative forcing have spurred investigation of anthropogenic impacts on CH4 concentrations and the potential long-term climate impact. In order to estimate the impact of CH4 concentrations and emissions on the future climate, a current atmospheric budget is necessary with well-constrained source and sink terms. Sources of CH4 in the atmosphere are predominantly biogenic, including wetlands, ruminant animals, rice agriculture, and biomass burning, but also are tied to other anthropogenic activities such as fossil fuel mining, and waste handling (Forester et al., 2007). The loss of CH4 is dominated by its reaction with hydroxyl (OH) free radicals in the atmosphere, which accounts for over 80% of the losses (Prather et al., 2001). In total, the magnitude of the atmospheric CH4 source term is relatively constrained but the trend and strength of most of the distinct source categories remains a more difficult problem (Forester et al., 2007). Uncertainty in source terms of CH4 obscures the factors leading to the stabilization of methane concentrations from 1999 to 2006, and subsequent growth in 2007 and 2008 (Figure 1a) (Duglokencky et al., 2009). Fluctuations in the growth rate of CH4 concentrations (Figure 1b) show that in the past few decades that rates have peaked at 14 ppb yr-1, but have also dipped below 0 in 2001, 2004 and 2005 (Forester et al., 2007). This variation clearly represents an imbalance between the sources and sinks of CH4 in the atmosphere but the exact cause behind this annual and inter-annual variability is still uncertain (Prather et al., 2001, Forester et al., 2007, Duglokencky et al., 2009). To further constrain the CH4 sources and gain a better general knowledge of the CH4 budget, measurements with increased accuracy and spatial coverage are being performed across the globe. Estimates of CH4 emissions from natural and human activity are extrapolated to a global scale, forming the basis of ‘bottom-up’ inventories of atmospheric CH4. However, Bergasmachi et al. (2005) and Villani et al. (2010) have both stated that the overall sparsity of the surface networks combined with the highly variable nature of surface methane emissions reduce the accuracy of CH4 budgets constructed using only surface data. With satellite data, source parameters can be further constrained by performing a ‘top-down’ inventory utilizing the retrieved atmospheric concentrations of CH4 in inverse models (Bergasmachi et al., 2007, 2009; Frankenberg et al., 2005, Wecht et al., 2012). By combining methods utilizing satellite and

Page 3: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 3

surface observational data, inverse modeling techniques are making strides in enhancing the accuracy of CH4 source attribution for different regions.

Figure 1. (a) Globally averaged CH4 concentrations (solid line) with a deasonalized trend line fitted to the curve (dashed line). (b) Instantaneous CH4 growth rate derived from the globally averaged CH4 concentrations (solid line) with ±1σ (dashed line). (Duglokencky et al., 2009 graphic adapted for Heimann, 2011)

Inverse models used to determine CH4 emissions utilize mathematical optimization techniques to estimate values of emissions for model grid cells. However, the nature of the inverse equations and data used limits the accuracy of estimated emission values. One measure of accuracy for the CH4 emissions determined by inverse modeling is the independence of the emission values. The independence is a measure of the resolving power for data from different types of satellite retrievals. Low independence would indicate that calculated emission values in a model grid cell are tied to the emission values in one or more other cells. This diminishes the ability to separate between natural and anthropogenic sources of CH4, which is an important factor in investigating the human influence on the climate. Thus, the determination of independence is one step in further constraining methane emissions and increasing the accuracy of CH4 impact projections for the future climate.

The end goal of this project is to investigate the potential resolving power of individual

CH4 emissions for a new proposed satellite, the Geostationary Coastal and Air Pollution Events satellite (GEO-CAPE). Using the GEOS-Chem chemical transport model combined with an adjoint model (hereafter GEOS-Chem Adjoint), we can simulate GEO-CAPE data and test the

Page 4: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 4

resolving power. However, the first hurdle in this process requires ensembles of simulations using GEOS-Chem Adjoint. This is because the resolving power calculations necessitate unique input files and individual runs for different regions of the earth. By means of its original development, GEOS-Chem Adjoint is only capable of running a single GEOS-Chem Adjoint simulation at a time. With each model run requiring a user to set up the directory structure and input files manually, conducting large numbers of model runs is infeasible. The systems that will be running the GEOS-Chem Adjoint simulations have far more resources available than are currently utilized. To use these resources, a higher-level GEOS-Chem Adjoint simulation manager was created to run ensembles using the Python programming language. The ensemble manager (PyEnsemble) will allow the calculation of model resolution matrices determining emission parameter independence in grid cells. This will help to further constrain estimates of CH4 emissions.

What follows in this paper will be an overview of the GEOS-Chem Adjoint model, along

with the methods for constructing PyEnsemble, and methods for resolving power analysis. This will include a discussion on current tests performed using PyEnsemble, and potential future work on improving the program. 2. GEOS-Chem Adjoint

Testing the resolving power of CH4 surface emissions estimated with inverse models and satellite retrievals requires many GEOS-Chem Adjoint simulations. To run ensembles of simulations easily, a manager was created using the Python programming language. The following sections describe the GEOS-Chem Adjoint model, the reasons for constructing PyEnsemble, and methods of resolution testing. a. GEOS-Chem Adjoint

The GEOS-Chem Adjoint model is comprised of two separate models, the GEOS-Chem chemical transport model (CTM) and the adjoint sensitivity model. Both are community driven models with development contributions coming from a large base of different researchers. The GEOS-Chem CTM assimilates meteorological data from Goddard Earth Observation System (GEOS) run by the NASA Global Modeling Assimilation Office to drive chemical transport simulations (Bey et al., 2001a). In these simulations the atmospheric distribution and behavior of 87 different chemical species (aerosols and gases) are modeled with over 300 different reactions (Henze et al., 2007). GEOS-Chem allows for variation of the runs based on a user’s needs with global simulations that can be performed at horizontal resolutions of either 4°x5° or 2°x2.5° and up to 72 vertical levels. Finer scale simulations can also be run over smaller domains at the horizontal resolution of 0.5°x0.67°. (More information on GEOS-Chem and its updates can be found on the model group’s webpage at http://acmg.seas.harvard.edu/geos/index.html)

Page 5: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 5

The adjoint model developed by Henze et al. (2007) works in tandem with the GEOS-

Chem CTM by using the output of GEOS-Chem as input for adjoint equations. Adjoint models are used to find the gradient of a cost function with respect to forward model input parameters. A cost function quantifies the results of the simulation as a single value based on output values of the forward model (Henze et al., 2007, Errico, 1997). The gradient of the cost function found through inverse modeling can be used for two purposes. First, sensitivity testing can be done using the gradient of the cost function as a model sensitivity. For data assimilation, finding the minimum of the cost function can be used to optimize the forward model parameters. The adjoint model is based on the GEOS-Chem code and equations, and performs inverse sensitivity calculations, which iterate backwards in time, for dynamics, tropospheric chemistry, heterogeneous chemistry and aerosol thermodynamics (Henze et al., 2007). In doing so, the GEOS-Chem adjoint model provides a mechanism for investigating sensitivity with respect to control parameters, reaction rates, emissions, and more. (More detailed information on versions and updates can be found at http://wiki.seas.harvard.edu/geos-chem/index.php/GEOS-Chem_Adjoint)

b. GEOS-Chem Adjoint: OpenMP to MPI

GEOS-Chem Adjoint is a useful tool for running single simulations, but ensembles of similar, yet unique simulations are more difficult. The difficulty stems from GEOS-Chem Adjoint and the OpenMP framework used in its development. OpenMP allows for parallelization of a program’s functions across the multiple cores of a shared memory system; however, current high performance and supercomputing systems have distributed memory systems that link together many individual shared memory systems, or nodes, to act as a single computing system. OpenMP provides no framework for distribution of work across nodes of a distributed memory cluster computer, so GEOS-Chem Adjoint users are restricted in their ability to use the increased resources of such computer architectures. Multiple simulations can be run on separate nodes, but this makes ensembles of runs a tedious prospect because each individual simulation must be set up and initialized independently.

To run ensembles of GEOS-Chem Adjoint simulations, an automated mechanism for copying run directories, altering input files, and computer node notification is needed. Python fits perfectly into this role, as it natively supports robust string manipulation with file I/O, and operating system commands for directory copying (http://www.python.org). In addition, a large and growing scientific user base in the Python community provides a host of free tools and support for various scientific computing needs. A specific library central to this project is the MPI4py library. This library provides an application programming interface (API) for Python that allows communication between separate nodes of a system using the message passing interface (MPI) standard. (See more at http://mpi4py.scipy.org/docs/usrman/index.html) The

Page 6: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 6

ability to use the MPI allows for multiple simulation ensembles to be orchestrated by a single python script. This is an essential piece in overall project of investigating the resolving power of GEO-CAPE. c. GEO-CAPE Resolution Testing With the ability to run ensembles of simulations, all that remains is leveraging this ability in testing the resolving power of CH4 emissions with GEO-CAPE data. This is no trivial task, as a satellite that only exists in concept has no useable data to test. Thus, in order to investigate the resolving power, the GEO-CAPE data must be simulated. GEOS-Chem Adjoint can be used to simulate retrievals from GEO-CAPE, which can then be used in a sensitivity analysis. The entire resolving power analysis cannot be done within GEOS-Chem, so PyEnsemble is set up to compile part of this analysis, and the final calculations are performed in Matlab. The first portion of CH4 resolution testing for GEO-CAPE, is performed with a GEOS-Chem Adjoint model simulation (Figure 2). This simulation gives the sensitivity (λ) in each model grid cell (currently a 4°x5° latitude-longitude grid) with respect to the emissions (σ) in that grid cell. To start off, two forward CTM simulations focusing on CH4 and its associated chemistry are performed. One simulation runs with normal emissions parameters (1σ), and the second with all input emissions doubled (2σ). After both forward model runs complete, the resultant atmospheric concentrations of CH4 have a GEO-CAPE observational operator applied to them that transforms the data to look like GEO-CAPE data. Using both simulations, the cost function (J) is then calculated using a least squares method where the square of the residual (the error amount between the 1σ and 2σ simulations) is summed over all model cells. The square of the residual is normalized with respect to the observational error in this case. Adjoint inverse sensitivity calculations are then performed, finding how much emissions in a grid cell affect the associated cost function. The end result of this Adjoint model calculation is a global grid of sensitivities. However, this value is not output in an easy to manipulate format in standard GEOS-Chem Adjoint simulations. Therefore, the GEOS-Chem Adjoint code in the file geos_chem_adj_mod.f was altered to output the grid of sensitivities to a text file which can be read into PyEnsemble.

Page 7: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 7

Figure 2. Flow chart of a GEOS-Chem Adjoint simulation calculating the sensitivity of the cost function J with respect to input emission estimates at all model grid cell locations. The next step in testing the resolving power of the GEO-CAPE satellite involves using these GEOS-Chem Adjoint sensitivity grids in the compilation of a Hessian matrix. This method is used by Tziperman and Thacker 1989, and provides an easy way to access 2nd-order sensitivities at locations based off of the adjoint model output. In general terms, the 2nd-order sensitivity is how much the calculated sensitivity at a location is affected by perturbations to emissions in other grid cells. For Hessian matrix compilation, a list of grid indices must be provided to focus the resolving power investigation on certain locations. For each place in the list of indices, one GEOS-Chem Adjoint simulation must be performed with a perturbation to σ at only that location. One simulation is also performed as the base case, where no perturbations are applied to the emissions. From each simulation, the sensitivity at every specified location is placed in a row vector. The 2nd-order sensitivity is approximated using finite differencing between the perturbed and base case sensitivities.

2

2)()(σσ

λδσ

σλδσσλ∂∂

=∂∂

≈−+ J

xy

ij

xy

xyijxyxyij (1)

With the Hessian matrix compiled, the resolution matrix of the data can be found by investigating the eigenvalues of the matrix meeting a certain significance threshold. The diagonal of the resulting resolution matrix describes how well the corresponding parameter, in this case an emissions parameter, is resolved from the data (Tziperman and Thacker, 1989).

Page 8: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 8

3. PyEnsemble

PyEnsemble was constructed using the Python programming language version 2.4.3 and its standard libraries of sys, os, and shutil. Two external libraries, MPI4py and Numpy, were also installed and used to construct this tool. PyEnsemble consists of three different python files, a batch script, and a locations text file. Each file serves a specific purpose, and they are divided up as such to serve as templates for incorporation into the PyEnsemble workflow. The design is intended to be modular so users can add, remove, or alter files to accomplish different tasks that require many GEOS-Chem Adjoint simulations. This specific version of PyEnsemble is built to run the GEO-CAPE resolution testing simulations, but the principles used in this version can be applied to different cases. The details of PyEnsemble’s current usage and behavior are detailed in this section. a. PyEnsemble Execution

Figure 3. A flow diagram showing the basic operations of PyEnsemble.

PyEnsemble is initiated by submitting a batch script to the job queue of the computing

system (Figure 3, step 1). The cluster environment used in this project was the TORQUE Resource Manager, so syntax may differ for various environments. The batch script specifies

Page 9: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 9

important information about the ensemble to be run, such as important directory locations and the number of simulations being performed. Specifically, there are a set of flags set in the batch script, some of which will be passed into the python script RunEnsemble.py. These flags are the $simDir, $codeDir, $inputFiles, $rName, and $ppn. The flags $simDir, $codeDir, $inputFiles all specify important directory locations for PyEnsemble, $rName is the simulation name, and $ppn gives the computer system’s processors per node number. The placement of these flags in the batch script allows the user to change simulations, and set up PyEnsemble on different systems without having to alter any of the actual Python scripts. Another feature present in the batch script before the mpirun command is executed is the GEOS-Chem Adjoint code compilation. Compiling the code in the batch script saves processing time, and ensures that nodes are not competing with each other trying to compile in the same directory. After these batch script tasks complete, the mpirun command executes, and the script RunEnsemble.py is distributed across the computing nodes. RunEnsemble.py is initially run on each processor of each node. Though, by using modular arithmetic, only a single processor on each node will end up running the communication aspects of RunEnsemble.py. The rest of the RunEnsemble.py processes end and release the processor. Of the processors that are used as communication agents, one is designated as the ensemble manager and will perform the setup, and notification during the PyEnsemble execution. During the setup phase (Figure 3, step 2), the manager node will copy the simulation directory to a directory for each node. During this process the manager will also generate a unique input file for each directory. The InputGen.py module and locations file are used in the process of generating the input files. The first simulation in this version is always the base case simulation with no perturbations. The rest of the input files will be distinct locations specified in the locations file, and will run the perturbation sensitivity simulation discussed in section 2.c. If too many or too few locations compared to the total number of simulations are specified in the location file, the post-processing Hessian compilation will not take place. After directory setup completes, the manager node will then use the MPI communication to notify each other node of its respective run directory (Figure 2, step 3). Each node receives its directory and proceeds to execute the GEOS-Chem Adjoint executable file (Figure 2, step 4). At this point the RunEnsemble.py program is halted waiting for a return value from the simulation execution. Thus, GEOS-Chem Adjoint will have all the processors available as resources for the simulation itself. After simulations finish, the Hessian matrix is compiled using MPI communications to send an array of location sensitivities back to the manager node (Figure 3, step 5). The manager node will perform the finite differencing, compile the Hessian matrix, and then write out this matrix to a tab-delimited text file. The Hessian matrix compilation portion of code is located in the PostProcess.py module. After the Hessian matrix is compiled, RunEnsemble.py is finished and the PyEnsemble job ends. Information about RunEnsemble.py execution and the simulations are output to log

Page 10: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 10

files. The RunEnsemble.py logged information is piped out through stderr, the standard error output stream. Information about each simulation is located in a log file in its simulation directory. The rest of the steps to compile the resolution matrix from the Hessian matrix can be done using Matlab. This involves reading the Hessian matrix, finding the eigenvalues of that matrix that meet a significance threshold, and then using the resulting eigenvectors in compiling the resolution matrix. Matlab is used instead of Python during this step due to the robustness of Matlab's linear algebra functions, along with easy visualization of the resolution matrix data. b. PyEnsemble Behavior This version of PyEnsemble was tested on two distributed memory systems at University of Colorado – Boulder. The first system was a high performance computer, named Prospero, in use by the lab group of Daven Henze, and the second computer was the campus super computer, Janus. Most of the PyEnsemble development took place on Prospero, so there were minor difficulties when moving PyEnsemble to the Janus system. The Janus system did not allow the RunEnsemble.py process to create new process threads, so the code had to be altered accordingly. However, most of the difficulties in running PyEnsemble on Janus stemmed from ensuring the correct MPI libraries, and Fortran libraries were linked at runtime. On Prospero a maximum of 16 simulations were performed in a PyEnsemble run, which did a 15x15 Hessian matrix compilation of simulated GEO-CAPE CH4 data. The average run times for a simple Hessian matrix compilation with different numbers of simulations is detailed in Figure 4. These timings were compiled using a simulation length of only a single day. This is not representative of realistic resolving power tests, which will have a simulation length on the order of a few months. For each distinct group of simulation size, the timings represent the average of five PyEnsemble executions with that number of simulations. The relative speedup of parallelizing the simulations compared to running them sequentially is also given in Figure 4. The speedup

equation is given bypT

Ts 1= , where T1 represents the time of running simulations sequentially one

after the other, while Tp represents the parallelized time where all simulations are run together. The timings graph reveals that the current bottleneck of PyEnsemble is the directory setup. Directory setup involves sequential copying of folders, which explains why it takes longer with more simulations. The theoretical maximum speedup is equivalent to the number of processes running in parallel. Therefore, with 16 simulations, the maximum speedup value is 16. The file transfer time for a 16 simulation ensemble run, already degrades the speedup to below half of the theoretical maximum in this case. One needs to be mindful that these speedups are in fact a worst case situation. This is because file transfer takes up a relatively large amount of time when the simulation length is so short. In reality, the actual simulation length will be approximately 100 times longer making the file transfer time take 0.5% of the total time instead of 50% of the time. This would enhance speedup values significantly. The bottleneck in this

Page 11: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 11

case could be alleviated if PyEnsemble was altered to parallelize the directory creation. However, not all systems support parallel I/O functions, so implementation of this would require it to be a conditional feature.

Figure 4. A graph of average times to complete a PyEnsemble job with varying simulation numbers is displayed. The total time is compared with the time to setup the directory structure. Error bars represent one standard deviation from the mean. Another factor to be aware of with running PyEnsemble is the space requirements for the simulations. Depending on the complexity of the simulations, GEOS-Chem Adjoint saves a variable amount of temporary files to be used in the Adjoint model calculations. These files are deleted after they are no longer useful, but a simulation with full chemistry can generate hundreds of gigabytes or even terabytes worth of data. If a PyEnsemble job is creating even only 10 simulations, the storage requirements for the job quickly add up. The CH4 simulations have relatively simple chemistry, so the temporary files required should not pose a problem with the number of simulations that will be performed. However, for other users this may be an important factor to consider while setting up PyEnsemble runs. 4. Conclusion In order to better understand the behaviors of the atmospheric concentration of CH4, individual sources’ strengths and trends need to be better constrained. A proposed new satellite, GEO-CAPE, has the potential to give a much better ability to constrain individual sources by

0

50

100

150

200

250

1 4 8 16

Elap

sed

Tim

e (s

)

Number of Simulations

Simulation Time

File Tranfer Time

Total Simulation Time

Simulation Speedup 4: 3.303 8: 5.860 16: 7.881

Page 12: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 12

using its data with inverse modeling. This project seeks to support the construction of this satellite by investigating its potential resolving power of individual CH4 emission sources. However, to perform these tests a way to easily run many GEOS-Chem Adjoint sensitivity simulations was necessary. In light of this, PyEnsemble, a GEOS-Chem Adjoint model ensemble manager, was created using the Python programming language. PyEnsemble allows for many simulations to be run, while only requiring submission of a single job script. The Python library that made this tool possible was a library called MPI4py. This library allows python scripts to use the message passing interface protocol for communication between separate nodes of a distributed memory computing system. With this, PyEnsemble can pass information about simulation directory locations, and data from simulations back and forth to orchestrate the ensemble of runs. PyEnsemble was developed and tested on two distributed memory systems at the University of Colorado – Boulder. A 16 simulation ensemble test successfully compiled a 15x15 Hessian matrix for testing GEO-CAPE CH4 emission resolving power. A 100 simulation test ensemble on a supercomputer also completed successfully. The average timings of the simulations showed that sequential copying of simulation directories was responsible for degradation of the parallel simulation speedup values. This will account for a relatively small amount of time with regard to the GEOS-Chem Adjoint simulation time, but this process can be sped up through the use of parallel I/O. PyEnsemble is an important step in the project of investigating GEO-CAPE resolving power, and ultimately, further constraining individual sources of CH4. Though PyEnsemble is specifically built for finding and compiling a Hessian matrix, its use is by no means limited to this situation. Through minimal alterations to the main RunEnsemble.py script, a user will be able to alter modules the ensemble manager uses to fit specific needs. With a modular design like this, PyEnsemble will hopefully have application for many other GEOS-Chem users, and become a valuable tool for the overall community.

Page 13: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 13

REFERENCES Bergamaschi, P., Krol, M., Dentener, F., Vermeulen, A., Meinhardt, F., Graul, R., Ramonet, M.,

Peters, W. and Dlugokencky, E., 2005: Inverse modelling of national and European CH4 emissions using the atmospheric zoom model TM5. Atmos. Chem. Phys., 5, 2431-2460.

Bergamaschi, P., Frankenberg, C., Meirink, J. F., Krol, M., Den- tener, F., Wagner, T., Platt, U.,

Kaplan, J. O., Korner, S., Heimann, M., Dlugokencky, E. J. and Goede A., 2007: Satellite chartography of atmospheric methane from SCIAMACHYon board ENVISAT: 2. Evaluation based on inverse model simulations. J. Geophys. Res.-Atmos., 112.

Bergamaschi, P., Frankenberg, C., Meirink, J. F., Krol, M., Vil- lani, M. G., Houweling, S.,

Dentener, F., Dlugokencky, E. J., Miller, J. B., Gatti, L. V., Engel, A. and Levin I., 2009: Inverse modeling of global and regional CH4 emissions using SCIAMACHY satellite retrievals. J. Geophys. Res.-Atmos., 114.

Bey, I., and Coauthors, 2001: Global modeling of tropospheric chemistry with assimilated meteorology:

Model description and evaluation. J. Geophys. Res-Atmos., 106, 23073-3095. Dlugokencky, E. J., and Coauthors, 2009: Observational constraints on recent increases in the

atmospheric CH4 burden. Geophys. Res. Lett., 36, 5. Errico, R. M., 1997: What is an adjoint model? B. Am. Meteorol. Soc., 78, 2577-2591. Forster, P., V. Ramaswamy, P. Artaxo, T. Berntsen, R. Betts, D.W. Fahey, J. Haywood, J. Lean,

D.C. Lowe, G. Myhre, J. Nganga, R. Prinn, G. Raga, M. Schulz and R. Van Dorland, 2007: Changes in Atmospheric Constituents and in Radiative Forcing. In: Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [Solomon, S., D. Qin, M. Manning, Z. Chen, M. Marquis, K.B. Averyt, M.Tignor and H.L. Miller (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.

Frankenberg, C., J. F. Meirink, M. van Weele, U. Platt, and T. Wagner, 2005: Assessing methane

emissions from global space-borne observations. Science, 308, 1010-1014. Heimann, M., 2011: ATMOSPHERIC SCIENCE Enigma of the recent methane budget. Nature,

476, 157-158. Henze, D. K., A. Hakami, and J. H. Seinfeld, 2007: Development of the adjoint of GEOS-Chem. Atmos.

Chem. Phys., 7, 2413-2433. Prather, M.J., et al., 2001: Atmospheric chemistry and greenhouse gases. In: Climate Change

2001: The Scientific Basis. Contribution of Working Group I to the Third Assessment Report of the Intergovernmental Panel on Climate Change [Houghton, J.T., et al. (eds.)].

Page 14: Increasing the efficiency of GEOS-Chem Adjoint model runs

SOARS® 2012, Andre Perkins, 14

Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 239– 287.

Spahni, R., Chappellaz, J., Stocker, T., Loulergue, L., Hausammann, G., Kawamura, K.,

Flückiger, J., Schwander, J., Raynaud, D., Masson-Delmotte, V., Jouzel, J., 2005: Atmospheric methane and nitrous oxide of the late Pleistocene from Antarctic ice cores. Science, 310, 1317-1321.

Tziperman, E., and W. C. Thacker, 1989: An Optimal-Control / Adjoint-Equations Approach to

Studying the Oceanic General Circulation. J. Phys. Oceanogr., 19, 1471-1485. Villani, M. G., P. Bergamaschi, M. Krol, J. F. Meirink, and F. Dentener, 2010: Inverse modeling

of European CH4 emissions: sensitivity to the observational network. Atmos. Chem. Phys., 10, 1249-1267.

Wecht, K. J., Jacob, D., Wofsy, S., Kort, E., Worden, J., Kulawik, S., Henze, D., Kopacz, M.,

Payne, V., 2012: Validation of TES methane with HIPPO aircraft observations: implications for inverse modeling of methane sources. Atmos. Chem. Phys., 12, 1823-1832.