18
High-performance computing tools for advancing the integrated assessment and modelling of global environmental challenges Brett A. Bryan CSIRO Ecosystem Sciences Abstract Integrated assessment and modelling of complex social-ecological systems is required to address global environmental challenges such as climate change, food and energy security, natural resource management, and biodiversity conservation. Assessments need to capture high spatial and temporal resolution, cover large geographic extents, and quantify uncertainty. This places high computational demands that are unlikely to be met by traditional Geographic Information System (GIS) software tools in the near term. A range of high-performance computing (HPC) hardware has become increasingly affordable and accessible and open-source software tools exist which can exploit this computing power. Here, I evaluate a range of hardware and software tools available for the next generation of integrated spatio-temporal analysis on HPC resources. I developed a simple spatio-temporal model which simulated the net present value from agricultural land use over time given stochastic price, yield, and cost parameters. I implemented the model in a GIS scripting language (ESRI’s AML) using a single central processing unit (CPU) core. I implemented the same model in Python using the Numpy library on a single CPU core and on a single Graphics Processing Unit (GPU) using the PyCUDA GPUArray and ElementwiseKernel modules. I then parallelised the model using IPython and ran it on 1-256 CPU cores, and 1-64 GPUs, on a hybrid computer cluster. The GIS implementation effectively took 15.5 weeks to run. Migrating to an efficient in-memory array processing environment such as Numpy on a single CPU core led to a speed-up of 59x compared to the GIS. On a single GPU, speedups of 1,473x were achieved using GPUArray and 4,881x using ElementwiseKernel. Parallel processing over multiple CPU cores and GPUs led to further performance enhancements. At the fastest, the model tool just under 2.5 minutes to complete using the ElementwiseKernel module in parallel over 64 GPUs – a speed-up of 63,5643x. HPC in combination with open source tools such as Python offer truly transformational performance improvements for integrated assessment and modelling. Experience since suggests that other types of modelling tasks can still benefit greatly, performance improvements may not be quite so breathtaking. In addition, migration to new hardware and software environments has costs including a lag time to build capacity, tools, and workflows. In addition, parallel programming is inherently more difficult, as analysts need to start thinking in parallel. I provide code tools within to help other analysts start moving to HPC in the hope that our ability to address these global challenges can be similarly transformed. Keywords: Graphics processing unit, GPU, parallel, multi-core, cluster, grid, GIS, environmental Acknowledgements: The author acknowledges the support of CSIRO’s Integrated Carbon Pathways initiative, Sustainable Agriculture Flagship, and HPC Transformational Capability Platform and is grateful for help and advice from Sam Moskwa and others in CSIRO’s Advanced Scientific Computing group. The final definitive version of this manuscript was published in Environmental Modelling & Software. http://dx.doi.org/10.1016/j.envsoft.2012.02.006

High-performance computing tools for advancing the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-performance computing tools for advancing the

High-performance computing tools for advancing the integrated assessment and modelling of global environmental challenges

Brett A. Bryan

CSIRO Ecosystem Sciences

Abstract

Integrated assessment and modelling of complex social-ecological systems is required to address global environmental challenges such as climate change, food and energy security, natural resource management, and biodiversity conservation. Assessments need to capture high spatial and temporal resolution, cover large geographic extents, and quantify uncertainty. This places high computational demands that are unlikely to be met by traditional Geographic Information System (GIS) software tools in the near term. A range of high-performance computing (HPC) hardware has become increasingly affordable and accessible and open-source software tools exist which can exploit this computing power. Here, I evaluate a range of hardware and software tools available for the next generation of integrated spatio-temporal analysis on HPC resources. I developed a simple spatio-temporal model which simulated the net present value from agricultural land use over time given stochastic price, yield, and cost parameters. I implemented the model in a GIS scripting language (ESRI’s AML) using a single central processing unit (CPU) core. I implemented the same model in Python using the Numpy library on a single CPU core and on a single Graphics Processing Unit (GPU) using the PyCUDA GPUArray and ElementwiseKernel modules. I then parallelised the model using IPython and ran it on 1-256 CPU cores, and 1-64 GPUs, on a hybrid computer cluster. The GIS implementation effectively took 15.5 weeks to run. Migrating to an efficient in-memory array processing environment such as Numpy on a single CPU core led to a speed-up of 59x compared to the GIS. On a single GPU, speedups of 1,473x were achieved using GPUArray and 4,881x using ElementwiseKernel. Parallel processing over multiple CPU cores and GPUs led to further performance enhancements. At the fastest, the model tool just under 2.5 minutes to complete using the ElementwiseKernel module in parallel over 64 GPUs – a speed-up of 63,5643x. HPC in combination with open source tools such as Python offer truly transformational performance improvements for integrated assessment and modelling. Experience since suggests that other types of modelling tasks can still benefit greatly, performance improvements may not be quite so breathtaking. In addition, migration to new hardware and software environments has costs including a lag time to build capacity, tools, and workflows. In addition, parallel programming is inherently more difficult, as analysts need to start thinking in parallel. I provide code tools within to help other analysts start moving to HPC in the hope that our ability to address these global challenges can be similarly transformed.

Keywords: Graphics processing unit, GPU, parallel, multi-core, cluster, grid, GIS, environmental

Acknowledgements: The author acknowledges the support of CSIRO’s Integrated Carbon Pathways initiative, Sustainable Agriculture Flagship, and HPC Transformational Capability Platform and is grateful for help and advice from Sam Moskwa and others in CSIRO’s Advanced Scientific Computing group.

The final definitive version of this manuscript was published in Environmental Modelling & Software.http://dx.doi.org/10.1016/j.envsoft.2012.02.006

Page 2: High-performance computing tools for advancing the

1 Introduction

Addressing global environmental challenges such as climate change, food and energy security, natural resource management, and biodiversity conservation operating within complex social-ecological systems demands the integrated assessment and modelling of biophysical, ecological, economic, and social information (Costanza, 1996; Parker et al., 2002; Kumar et al., 2006; Bryan et al., 2010; Reid et al., 2010). Typically, this information is spatial (Bryan, 2003). Many of these processes are heterogeneous over the landscape and may display spatial interconnections (e.g. topographic processes, species dispersal, supply chain analysis). Time, or more specifically - temporal dynamics, may also be an important component in these processes (e.g. climate change, tree growth, cash flow) (Venevsky and Maksyutov, 2007). Data sets are often very large due to the increasing need to model processes over large extents (e.g. continental and global scale) whilst maintaining high enough spatial and temporal resolution (e.g. landscape scale, daily time-steps) to adequately capture the relevant dynamics (Harris, 2002). Further, uncertainty and sensitivity are an inherent part of models of social-ecological systems which policy-makers need to understand in order to make robust decisions given this associated risk (Rotmans and van Asselt, 2001; Kooistra et al., 2005; Bryan, 2010; Cheviron, 2010; Lilburne and North, 2010). A common way to quantify sensitivity and uncertainty is by undertaking multiple model simulations using varying input parameter values (Lilburne and Tarantola, 2009). In concert, these emerging characteristics of the integrated assessment and modelling of social-ecological systems has placed unprecedented demands upon both computer hardware and software. High-performance computing tools are required that enable us to tackle research problems of the scale and level of technological sophistication required to address global challenges (Openshaw, 1995; Openshaw and Turton, 2000; Armstrong et al., 2005; Yang and Raskin, 2009; Wang, 2010).

Following Moore’s law, technological development has continued to improve exponentially, with the number of transistors per central processing unit (CPU) chip doubling roughly every 2 years (Sutter 2005). However, CPU clock speeds have plateaued at around 3GHz due to physical limitations such as heat dissipation. To get around this barrier, chip designers have increased the number of cores on each chip (Herlihy and Shavit, 2008). The number of cores is now increasing exponentially (Sutter 2005). This increase in parallelism is also reflected in the development of the graphics processing unit (GPU). With hundreds of cores on a single chip GPUs have been developed for the parallel rendering of computer graphics. However, the development of interfaces such as NVIDIA’s Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) have made general purpose GPU processing possible for scientific applications. In addition, the accessibility of massively parallel high-performance computing (HPC) resources has increased (Wang et al. 2005). High-speed networks have enabled the connectivity of multiple compute nodes, each containing multi-core CPUs and GPUs in server, cluster, and grid facilities (Herlihy and Shavit, 2008).

However, in order to utilise the performance advantages offered by HPC, models must be specifically written such that processes are executed simultaneously across multiple cores and nodes (Rauber and Rünger, 2010). Two of the most common ways of achieving this in scientific programming are the data parallel (where each processor executes the full program on a share of the data) and task parallel (where each processor works on the full data set but performs a share of the program)

Page 3: High-performance computing tools for advancing the

(Rauber and Rünger, 2010). Whilst some integrated models may require communication between processes (e.g. agent-based models), many are embarrassingly parallel as they require little if any communication between processes. Integrated spatio-temporal modelling and analysis is often ideally suited to parallelization either through data parallel methods such as tiling, distributing, and processing geographic data, or through task parallel methods such as processing a subset of simulations (Hawick et al., 2003; Armstrong et al., 2005).

In the past, spatio-temporal analysis has been typically undertaken in raster-based GIS and image processing software packages such as ArcGIS, GRASS, Imagine, ENVI, and others (e.g. Crossman and Bryan, 2009). These packages have been very effective for manipulating, managing, and analysing spatio-temporal data, and summarising and presenting results, particularly through high-quality cartographic outputs. They are also able to handle large data sets. However, the ability to implement integrated models of complex social-ecological systems in these packages is limited by the reliance on serial processing on a single CPU core and heavy disk input-output (IO) transactions (Vokorokos et al. 2004). In the main, traditional GIS and image processing software has lagged behind computer hardware trends towards increased parallelization and larger memory, thereby curtailing the progress of integrated assessment and modelling of global environmental challenges. Whilst proprietary packages such as MathWorks’ Matlab and Wolfram’s Mathematica have recently included the ability to use multiple CPUs and GPUs in parallel, these packages are not widely used by the spatial modelling community. Further, large scale server or cluster implementations of these packages can add a significant cost to research projects. The capacity of open source packages such as R (R Development Core Team, 2011) and Python (van Rossum et al., 2010) for performing spatio-temporal modelling and analysis has also progressed substantially during recent years. In addition, both of these tools have packages that can also enable the use of multiple CPUs and GPUs in parallel across a spectrum of hardware from multi-core workstations to clusters (e.g. Schmidberger et al., 2009).

Parallel CPU and GPU processing on a range of HPC resources have been found to substantially enhance the performance of computationally- and data-intensive tasks in the medical sciences (e.g. automated tomographic image processing (van der Jeught et al. 2010) and physical sciences (e.g. fluid and particle simulation, seismic analysis (Wang et al. 2010)). The potential of HPC resources in spatio-temporal modelling of social-ecological systems was recognised early on (Costanza and Maxwell, 1991; Openshaw, 1995; Clematis et al. 1996, Turton and Openshaw 1998; Openshaw and Abrahart, 2000). Parallel computing on HPC resources has been to analyse several aspects of social-ecological systems including remote sensing image processing (Lilburne and North, 2010), route optimisation (Lanthier et al., 2003), spatial statistics (Armstrong et al., 2005), natural hazard simulation (Xie et al., 2010), atmospheric simulation (Wang et al. 2005), land surface modelling (Kumar et al., 2006), species population models (Wang et al., 2006), vegetation dynamics (von Bloh et al., 2010), water resources management (Sulis, 2009), topographic analysis (Chen et al., 2010), urban simulation (Li et al., 2010), and groundwater modelling (Mirghani et al., 2010). Several of these studies report performance improvements of orders of magnitude from applying models on a range of HPC resources. However, despite the substantial performance gains on offer, the use of HPC in the integrated assessment and modelling of social-ecological systems has not been widespread.

Page 4: High-performance computing tools for advancing the

With the increased availability of both software and hardware for HPC, I renew the call to rethink how we approach the integrated assessment and modelling of complex social-ecological systems. I present a critical and strategic evaluation of HPC software and hardware options for integrated assessment and modelling. I present this in a way that I hope is useful for practitioners thinking of migrating to these systems including presenting actual code tools. I evaluated the ability of open source Python-based tools to undertake spatio-temporal modelling and analysis on HPC resources (a 128-node hybrid CPU/GPU supercomputer cluster). I developed an illustrative spatial model of economic returns to agriculture across a hypothetical heterogeneous landscape of 100 million grid cells (roughly equivalent to modelling the whole of Australia at a spatial resolution of 250m). Yields, costs and prices were varied annually with climatic and market forces. Parameters for these are drawn from probability density functions. Net economic returns were calculated annually and discounted to net present value terms. The net present value of agricultural returns was modelled over a time period of 70 years and this was repeated for 1000 Monte Carlo iterations. The model was implemented using a range of software and hardware platforms including: a GIS on a single CPU core; Python with Numpy arrays using 1-256 CPU cores; Python with PyCUDA’s GPUArray function using 1-64 GPUs, and; Python with PyCUDA’s ElementwiseKernel function using 1-64 GPUs. Model performance was benchmarked and compared. The advantages and disadvantages of this approach to integrated assessment and modelling is discussed. My hope is that this technology can provide a much-needed boost in our capability to address global environmental challenges.

2 Methods

2.1 Strategic evaluation

There are a range of tools available for parallel processing on CPUs and GPUs. I undertook a strategic technology assessment where I evaluated a range of potential software and hardware options for integrated spatio-temporal modelling and analysis. Concentrating on open source tools, excellent array processing capability exists with both Python (van Rossum et al., 2010) in the Numpy package (Jones et al., 2001) and R (R Development Core Team, 2011). Within R, many packages offer some ability for parallel CPU processing (Schmidberger et al., 2009). I had limited success with several of these. The rmpi package (Yu, 2010) was an exception and enabled parallel processing on large scale implementations via Message Passing Interface (MPI). Within Python there are also several options for parallel processing (e.g. multiprocessing, subprocess) which can work well in smaller implementations (e.g. on multi-core machines). However, IPython (Perez and Granger, 2007) was found to enable scaling of parallel processing right up to highly parallel cluster implementations involving CPUs and GPUs. R has a few packages (e.g. gputools Buckner et al., 2010) which provide specific routines for GPUs. For Python, the PyCUDA library provided full access to the CUDA API and abstractions making the programming of common tasks easier. Other libraries (e.g. PyOpenCL (Klöckner et al. 2011), Theano (Bergstra et al. 2010)) can also be used for GPU processing through Python. Excellent graphical and mapping ability is available within both Python (Matplotlib; Hunter, 2007) and R (e.g. ggplot2; Wickham, 2009). Whilst R has a more comprehensive suite of analytical tools, Python tended to be faster and handle large data sets much better. This ability, as well as having more options for parallel

Page 5: High-performance computing tools for advancing the

processing on CPUs and GPUs, suggested that Python was the better choice for integrated modelling and assessment on high-performance computing.

2.2 The model

Economic returns to land use is a critical data layer underpinning many elements of the analysis of global challenges (Bryan et al., 2011). Profitability and comparative advantage are key elements determining the adoption of alternative land uses under different policy scenarios. As an example, economic returns to agriculture can provide an indication of the opportunity cost of changing land use to environmental plantings to sequester carbon and help mitigate climate change (Bryan et al., 2008, 2010, 2011), in review). Typically, as revenue and costs occur unevenly over time in agriculture and other land uses, economic returns are measured in net present value (NPV) terms and are discounted over time to include the cost of capital. This calculation is computationally intensive as the NPV is the sum of the net annual returns and as such must be calculated for each cell, for each year over the time horizon. Here we develop a simple example calculating the net present value of agriculture with a stochastic price, cost, and yield component, in addition to spatially varying yield in a hypothetical landscape.

Net present value was calculated as:

= ( , ) ( , ) − ( , )(1 + ) Equation 1

where ( , ) is the price per tonne of agricultural production modelled after the price of wheat and includes a volatility component. Price was a random variable drawn each year t in T = 0,1,...,69 from a normal distribution with a mean of 200 $/t and standard deviation of 30 $/t and , ≥ 0. Similarly, cost of production ( , ) is also a random variable drawn each year from a normal distribution with a mean of 100 $/ha and standard deviation of 20 $/ha and ( , ) ≥ 0. Annual yield ( , ) is the product of a random annual yield factor ( , ) also drawn from a normal distribution with a mean of 2 t/ha and standard deviation of 0.4 t/ha and , ≥ 0, and a spatially heterogeneous land capability factor Y that has both systematic and random components of variability. The python script for creating the Y layer is:

nRows = int(10000) nCols = int(10000) x = numpy.arange(1.0, nCols + 1) y = numpy.arange(1.0, nRows + 1) colNumArray, rowNumArray = numpy.meshgrid(x,y) mesh = numpy.float32(rowNumArray * colNumArray) spatialYield = (mesh - mesh.min()) / (mesh.max() - mesh.min()) + 0.5 + numpy.random.rand(nRows,nCols).astype(numpy.float32) / 4

The land capability factor layer Y had a minimum of 0.5 and a maximum of 1.75 (Figure 1). Hence, NPV is also a spatial layer of 100 million grid cells calculated over the 70-year time horizon. This was repeated in multiple simulations to enable the quantification of central tendency (mean) and uncertainty (standard deviation).

Page 6: High-performance computing tools for advancing the

Figure 1: The 10,000 x 10,000 cell spatial land capability factor layer spYield used in the model.

2.3 Implementation

I implemented the model in a GIS on a workstation and in Python (van Rossum et al. 2010) on a range of HPC resources. Three Python implementations were developed with one on CPU, one using a higher level abstraction on GPU (PyCUDA’s GPUArray module), and one using a lower level of abstraction on GPU (PyCUDA’s ElementwiseKernel module; Table 1). These were run over 1-256 CPU cores (32 nodes x 8 cores) and 1-64 GPUs (32 nodes x 2 GPUs). Each of the Python implementations was repeated 10 times. Processing time was recorded using Python’s time module and the mean calculated. The GIS implementation was run on a personal computer with an Intel Xeon 3GHz CPU, 4 GB memory, and running 32-bit windows XP. The HPC implementations were run on CSIROs linux GPU cluster which comprises 128 compute nodes, each with dual Xeon E5462 2.8GHz CPUs (8 CPU cores), 32 GB of memory, dual Tesla S2050 GPUs (each with 448 GPU cores, 3 GB onboard memory), and DDR InfiniBand interconnect.

Name Hardware Number of simulations

Software

GISCPU Single CPU core 10 ESRI ArcGIS Workstation Grid module, AML

PyCPU 1, 2, ..., 256 CPU cores (1-32 nodes each using 1-8 CPU cores)

100 IPython, Numpy

PyGPU 1, 2, ..., 64 GPUs (1-32 nodes each using 1-2 GPUs)

1000 IPython, Numpy, PyCUDA GPUArray

PyGPUe 1, 2, ..., 64 GPUs (1-32 nodes each using 1-2 GPUs)

1000 IPython, Numpy, PyCUDA ElementwiseKernel

Table 1: Summary of software and hardware implementations.

2.3.1 GISCPU - AML, single CPU core

The model was implemented in ESRI’s Grid raster GIS module and automated using Arc Macro Language (AML, Figure 2, Online Supporting Material). Random price, yield, and cost variables were calculated each year. The mean was calculated by summing two individual sum functions as these have a maximum limit of 50 grids. A

Page 7: High-performance computing tools for advancing the

running mean and standard deviation of NPV was calculated each iteration as there are no Grid functions that can calculate these statistics over 1,000 grids.

Figure 2: AML code for the NPV simulation.

2.3.2 PyCPU – IPython Numpy, 1-256 CPU cores

Implementation in serial on a single CPU core was done using Python (van Rossum et al., 2010) and Numpy (Jones et al., 2001) – a package for mathematical processing of numeric arrays. Random price, yield, and cost variables, and the Numpy array spYield were pre-calculated for each year before the main analytical loops (Figure 3). A running mean and standard deviation were calculated each iteration in the same way as for GISCPU.

Figure 3: Python code for the temporal and Monte Carlo iteration loop.

Page 8: High-performance computing tools for advancing the

2.3.3 PyGPU – IPython Numpy PyCUDA GPUArray, 1-64 GPUs

I used the PyCUDA (Klöckner et al. 2011) package to execute the NPV simulation on a GPU. PyCUDA provides a Python wrapper for NVIDIA’s CUDA API. PyCUDA integrates tightly with the Numpy package and provides tools ranging from low level access to CUDA functionality to high level abstractions. For PyGPU implementation, PyCUDA’s GPUArray functionality was used to compute the model on a GPU (Figure 4).

Figure 4: Python code for execution on a GPU using PyCUDA GPUArray.

2.3.4 PyGPUe - Python Numpy PyCUDA ElementwiseKernel, 1-64 GPUs

The ElementwiseKernel module in PyCUDA provides an interface to the development of lower level CUDA C custom kernels. Performance improvements can be achieved over GPUArray calculations through the ability to evaluate statically-typed, multi-stage expressions without the need to create a new temporary for each intermediate result (Klöckner et al. 2011, Figure 5).

Page 9: High-performance computing tools for advancing the

Figure 5: Python code for execution on a GPU using PyCUDA ElementwiseKernels.

2.3.5 Parallel processing over multiple CPUs and GPUs

IPython (Perez and Granger, 2007) was used to parallelize the NPV simulation model over multiple CPU cores and GPUs on the cluster. A Portable Batch Scheduler (PBS) script was developed to request the necessary resources, load the software, and start the IPthon ipcontroller and ipengine to enable parallel processing. The PBS script then executes the Python script. In the Python script the MultiEngineClient was started and the magic commands activated. The MultiEngineClient module provides the functionality for working in a master-worker parallel processing paradigm. IPython magic commands were used to import the required Numpy and PyCUDA modules on each worker. The scatter function was used to split up the spatial data array spYield and farm it out to workers. For the PyCPU and PyGPU implementations the process (Figure 3) and processCUDA (Figure 4) functions were defined on the master and then pushed out to workers along with other model variables. Using the data parallel model, each worker then runs all 1,000 simulations over 70 years but on only a share of the data (Figure 6). In PyGPUe, the ElementwiseKernel functions were defined on each worker.

Figure 6: Python code for parallel execution multiple CPUs and GPUs using IPython.

Page 10: High-performance computing tools for advancing the

3 Results

The GIS implementation GISCPU was only run for 10 simulations, taking over 26 hours which, by extrapolation, is approximately 15.5 weeks for 1000 simulations (Table 2). Using Python with Numpy arrays on a single CPU in PyCPU achieved a speed-up of 59x compared to the GISCPU implementation largely because of the in-memory processing. However, this still took the equivalent of over 44.5 hours for 1000 simulations. The single GPU implementation with PyCUDA GPUArray in PyGPU took just over 106 minutes - a speed-up of 25x compared to single core CPU processing in PyCPU and 1473x compared to GISCPU. Taking just over 32 minutes for 1000 simulations, processing on a single GPU with PyCUDA ElementwiseKernel implementation PyGPUe achieved a speed-up of 4881x compared to GISCPU, 83x compared to single core CPU processing in PyCPU, and 3.3x compared to the GPUArray implementation PyGPU.

Processing in parallel over multiple CPU cores and GPUs led to further performance improvements. Processing the PyCPU implementation over 256 cores took the equivalent of 31 minutes for 1000 simulations – comparable to the single GPU ElementwiseKernel implementation of PyGPUe. Processing over 64 GPUs using PyCUDA GPUArray in PyGPU took under 5 minutes, achieving speed-ups of 30553x over GISCPU, and 521x over the single CPU implementation of PyCPU. Processing over 64 GPUs using PyCUDA ElementwiseKernel in PyGPUe took under 2.5 minutes, achieving speed-ups of 63643x over GISCPU, and 1085x over the single CPU implementation of PyCPU. Diminishing marginal returns were evident in highly parallel implementations as parallelizing over 256 CPU cores in PyCPU resulted in a speed-up of only 86x compared with a single core (Table 2, Figure 7) and parallelizing over 64 GPUs resulted in a speed-up of only 21.8x and 13.0x compared with a single GPU for PyGPU and PyGPUe, respectively (Table 2, Figure 8).

Hardware

Impl

emen

tatio

n

Act

ual n

umbe

r of

sim

ulat

ions

Act

ual p

roce

ssin

g tim

e

Effe

ctiv

e tim

e fo

r 1,

000

sim

ulat

ions

Spee

d-up

from

G

ISC

PU

Spee

d-up

from

Py

CPU

(1 C

PU c

ore)

Spee

d-up

from

Py

GPU

(1 G

PU)

Spee

d-up

from

Py

GPU

e (1

GPU

)

1 CPU core GISCPU 10 94,103 9,410,300

1 CPU core PyCPU 100 16,042 160,425 59

1 GPU PyGPU 1,000 6,388 6,388 1,473 25

1 GPU PyGPUe 1,000 1,928 1,928 4,881 83 3.3

256 CPU cores PyCPU 100 187 1,865 5,046 86 3.4 1.0 64 GPUs PyGPU 1,000 293 293 30,553 521 21.8 6.6 64 GPUs PyGPUe 1,000 148 148 63,643 1,085 43.2 13.0

Table 2: Summary of processing times and speed-ups achieved.

Page 11: High-performance computing tools for advancing the

Figure 7: Processing times for PyCPU implementations.

Figure 8: Processing times for PyGPU and PyGPUe implementations. 1-G and 2-G refer to 1 and 2 GPUs per node for PyGPU and 1-E and 2-E refer to 1 and 2 GPUs per node for PyGPUe.

4 Discussion

The GIS implementation of the NPV simulation took a long time. Given the amount of data processing required, this is likely to be no surprise to seasoned analysts working with high-resolution spatio-temporal data. Like me, these poor souls have probably become accustomed to running models over periods extending to days and even weeks. However, the results of this study show that there are both software and hardware options available which offer up to 4 orders of magnitude greater performance for the types of common raster processing problem assessed in this study. I appreciate that these numbers are large and that not all of the variables could be controlled in this benchmarking comparison (e.g. GISCPU was run on Windows,

8642

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Seco

nds

Nodes

CPU processing time: nodes by CPU cores

1-G2-G1-E2-E

0

1,000

2,000

3,000

4,000

5,000

6,000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Seco

nds

Nodes

GPU processing times - nodes by GPUs

Page 12: High-performance computing tools for advancing the

the others on Linux and there was a slight difference in processor speeds, and different numbers of simulations performed between implementations). However, the fact remains that, in practical terms, the difference in performance between our traditional tools and the types of new alternative open source tools on high-performance computing offers analysts a truly transformational change.

An important finding in this study is that performance increases are not contingent upon access to multi-million-dollar computer clusters and proprietary software installations. A substantial performance enhancement was achieved simply by migrating from the disk-IO-reliant GIS language to the in-memory array processing environment of Python and Numpy. Tools like the Geospatial Data Abstraction Library (GDAL, 2010) can enable the painless transition of data between Numpy and the GIS. GPU processing, especially using lower level kernels, can offer another quantum leap in performance. Scaling up, parallelisation offers further improvements across multiple CPU cores and GPUs. Even more is possible. Further efficiencies in GPU processing may be achieved through lower level CUDA kernel implementation through PyCUDA’s SourceModule functionality. Initial experimentation suggests that this could perhaps achieve a further order of magnitude speed-up over the ElementwiseKernel implementation PyGPUe. CUDA compute kernels also offer greater flexibility in GPU programming beyond the functions provided in the GPUArray library. However, greater programming skills are demanded of the analyst to realise these additional capabilities and performance gains.

The existence of open source tools and a spectrum of inexpensive hardware options (e.g. multi-core processors, GPUs, grid computing, and cloud computing) make model performance enhancements widely available. This wide availability of inexpensive HPC tools signals the removal of what was probably the biggest barrier to the adoption of these technologies following earlier calls (Openshaw, 1995; Turton and Openshaw 1998; Openshaw and Abrahart, 2000; Openshaw and Turton, 2000). New initiatives such as the cyberinfrastructure for GIS and spatial analysis (Yang and Rankin, 2009; Wang, 2010), and established communities (e.g. the GeoComputation community http://www.geocomputation.org/), are essential to further increase availability and adoption.

Whilst the performance gains reported here are spectacular, they don’t come without cost. Without programmer support, the greatest cost is in the lag times associated not only with analysts having to learn new computer languages, but also new operating systems, job schedulers, and entirely new work flows. These low-productivity times can be reduced by training, peer support (i.e. whole research team moves to new platform simultaneously), discipline and commitment (i.e. don’t go back to old tools no matter how tempting). However, payback times are likely to be short as analyst productivity with the new flexible and powerful tools like Python will quickly surpass their former capacity.

Many costs are also related to parallel processing. Parallel computing requires non-trivial thinking about how to parallelize integrated assessment and modelling tasks. Whilst the data parallel model is preferred, especially with large data sets, due to the more efficient data transfer and memory use, there may also be cases (e.g. where spatial dependencies exist) where other models are more appropriate. In addition, Ahmdahl’s and Gustaffson’s laws are everpresent, working to diminish parallel performance improvements. Ahmdahl’s law states that improvements can only be achieved for that part of the program amenable to parallelization. For example, if half

Page 13: High-performance computing tools for advancing the

of the program (by run-time) can be reduced to a negligible processing time, we have still only achieved a 2x speed-up overall. Gustaffson’s law points to the diminishing returns achieved through parallel processing due to costs associated with data transfer and process coordination. For example, doubling the number of cores does not usually double the processing speed (see Figure 7, Figure 8). Long times can be spent waiting in job queues for over-subscribed shared resources, thereby eroding the benefits of HPC. Parallel code is also much more difficult to develop and debug. There is significant scope for a new cohort of errors such as mutual exclusion, race conditions, deadlock, and starvation (Herlihy and Shavit, 2008).

I should note here that only one specific type of problem is analysed here – that involving multiple, fairly simple mathematical operations on a large raster data set. However, in practice, analysts deal with a diverse range of modelling and analysis tasks. Whilst I have had repeated success with parallel processing over multiple CPUs across a range of analyses since this initial benchmarking exercise, some types of analysis have proved unsuitable for GPU processing. One reason is that for programs to realise GPU performance gains, there must be a high ratio of processing time to data transfer time to make the cost of transferring data to and from device memory worthwhile. In general, GPU processing is more suited to modelling and simulation rather than routine data processing. Data sets can also become too large for the GPU device memory, especially if intermediate processing steps need to be kept in memory. In addition, libraries such as PyCUDA and others are yet to incorporate all the functionality (e.g. n-dimensional and masked arrays) of their CPU cousins (e.g. Numpy, SciPy) at high levels of abstraction. This may be overcome through programming custom CUDA C kernels or more effort to work-around existing limitations. It seems just a matter of time until that functionality is realised in open source GPU programming (Klöckner pers. comm., 2011). Also, depending on requirements, other high-performance computing models may be more appropriate. For example, we have found that a computer grid may provide higher throughput than a cluster when running Windows models.

The speed-ups achieved through HPC have implications for the specific approaches we can take within the broad spectrum of integrated assessment and modelling. For example, in this paper we used the extra processing power to analyse uncertainty, providing an estimate of the range of possible outcomes and their likelihood. Typically, studies have used scenario analysis to cover a range of possible outcomes in complex social-ecological systems (e.g. Bryan et al., 2011) rather than quantifying probabilistic approaches enabled through simulation (e.g. Benedetti et al., 2008). The potential of HPC identified in this paper opens the door to the quantification of probabilities, rather than just possibilities. This can better inform robust policy and investment decision-making for better managing global environmental challenges that are naturally characterised by uncertainty and risk. Model parameter calibration and sensitivity analyses are another computationally-demanding application that may benefit from HPC (Lilburne and Tarantola, 2009). These analyses provide essential information on key parameters that most influence model outcomes and require multiple model runs over the parameter space which can be high-dimensional. Existing integrated assessment and modelling paradigms could be greatly benefitted by HPC. For example, multi-agent simulations could be greatly expanded. Agents could be assigned to individual processors and the communication models within the parallel programming environment (e.g. MPI) used to communicate between agents. HPC technology could also generate a wave of innovation in integrated assessment

Page 14: High-performance computing tools for advancing the

and modelling (Openshaw and Turton, 2000). Analysts, no longer constrained by processing and data concerns, can think about new approaches to addressing global environmental challenges.

5 References

Armstrong, M.P., Cowles, M.K., Wang, S.W., 2005. Using a computational Grid for geographic information analysis: A reconnaissance. Professional Geographer 57(3) 365-375.

Benedetti, L., Bixio, D., Claeys, F., Vanrolleghem, P.A., 2008. Tools to support a model-based methodology for emission/immission and benefit/cost/risk analysis of wastewater systems that considers uncertainty. Environmental Modelling & Software 23(8) 1082-1091.

Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y., 2010.“Theano: A CPU and GPU Math Expression Compiler”. Proceedings of the Python for Scientific Computing Conference (SciPy). June 30 - July 3, Austin, TX.

Bryan, B.A., 2003. Physical environmental modeling, visualization and query for supporting landscape planning decisions. Landscape and Urban Planning 65(4) 237-259.

Bryan, B.A., 2010. Development and application of a model for robust, cost-effective investment in natural capital and ecosystem services. Biological Conservation 143(7) 1737-1750.

Bryan, B.A., Crossman, N.D., King, D., Meyer, W.S., 2011a. Landscape futures analysis: Assessing the impacts of environmental targets under alternative spatial policy options and future scenarios. Environmental Modelling & Software 26(1) 83-91.

Bryan, B.A., King, D., Wang, E.L., 2010a. Biofuels agriculture: landscape-scale trade-offs between fuel, economics, carbon, energy, food, and fiber. Global Change Biology Bioenergy 2(6) 330-345.

Bryan, B.A., King, D., Wang, E.L., 2010b. Potential of woody biomass production for motivating widespread natural resource management under climate change. Land Use Policy 27(3) 713-725.

Bryan, B.A., King, D., Ward, J.R., 2011b. Modelling and mapping agricultural opportunity costs to guide landscape planning for natural resource management. Ecological Indicators 11(1) 199-208.

Bryan, B.A., Ward, J., Hobbs, T., 2008. An assessment of the economic and. environmental potential of biomass production in an agricultural region. Land Use Policy 25(4) 533-549.

Buckner, J., Wilson, J., Seligman, M., Athey, B., Watson, S., Meng, F., 2010. The gputools package enables GPU computing in R. Bioinformatics 26(1) 134-135.

Chen, Z., Shen, L., Zhao, Y.Q., Yang, C.J., 2010. Parallel algorithm for real-time contouring from grid DEM on modern GPUs. Science China-Technological Sciences 53 33-37.

Page 15: High-performance computing tools for advancing the

Cheviron, B., Gumiere, S.J., Le Bissonnais, Y., Moussa, R., Raclot, D., 2010. Sensitivity analysis of distributed erosion models: Framework. Water Resources Research 46.

Clematis, A., Falcidieno, B., Spagnuolo, M., 1996. Parallel processing on heterogeneous networks for GIS applications. International Journal of Geographical Information Systems 10(6) 747-767.

Costanza, R., 1996. Ecological economics: Reintegrating the study of humans and nature. Ecological Applications 6(4) 978-990.

Costanza, R., Maxwell, T., 1991. Spatial Ecosystem Modeling Using Parallel Processors. Ecological Modelling 58(1-4) 159-183.

Crossman, N.D., Bryan, B.A., 2009. Identifying cost-effective hotspots for restoring natural capital and enhancing landscape multifunctionality. Ecological Economics 68(3) 654-668.

GDAL, 2010. GDAL - Geospatial Data Abstraction Library. Open Source Geospatial Foundation. http://gdal.osgeo.org

Harris, G., 2002. Integrated assessment and modelling: an essential way of doing science. Environmental Modelling & Software 17(3) 201-207.

Hawick, K.A., Coddington, P.D., James, H.A., 2003. Distributed frameworks and parallel algorithms for processing large-scale geographic data. Parallel Computing 29(10) 1297-1333.

Herlihy, M., Shavit, N., 2008. The Art of Multiprocessor Programming. Morgan Kaufmann, Burlington, USA.

Hunter, J.D., 2007. Matplotlib: a 2D graphics environment. Computing in Science & Engineering 9(3) 90-95.

Jones, E., Oliphant, T., Peterson, P., SciPy community, 2001. SciPy: Open Source Scientific Tools for Python. http://www.scipy.org

Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., Fasih, A., 2011. PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. http://arxiv.org/abs/0911.3456

Kooistra, L., Huijbregts, M.A.J., Ragas, A.M.J., Wehrens, R., Leuven, R., 2005. Spatial variability and uncertainty in ecological risk assessment: A case study on the potential risk of cadmium for the little owl in a Dutch river flood plain. Environmental Science & Technology 39(7) 2177-2187.

Kumar, S.V., Peters-Lidard, C.D., Tian, Y., Houser, P.R., Geiger, J., Olden, S., Lighty, L., Eastman, J.L., Doty, B., Dirmeyer, P., Adams, J., Mitchell, K., Wood, E.F., Sheffield, J., 2006. Land information system: An interoperable framework for high resolution land surface modeling. Environmental Modelling & Software 21(10) 1402-1415.

Lanthier, M., Nussbaum, D., Sack, J.R., 2003. Parallel implementation of geometric shortest path algorithms. Parallel Computing 29(10) 1445-1479.

Li, X., Zhang, X.H., Yeh, A., Liu, X.P., 2010. Parallel cellular automata for large-scale urban simulation using load-balancing techniques. International Journal of Geographical Information Science 24(6) 803-820.

Page 16: High-performance computing tools for advancing the

Lilburne, L., Tarantola, S., 2009. Sensitivity analysis of spatial models. International Journal of Geographical Information Science 23(2) 151-168.

Lilburne, L.R., North, H.C., 2010. Modelling uncertainty of a land management map derived from a time series of satellite images. International Journal of Remote Sensing 31(3) 597-616.

Mirghani, B.Y., Tryby, M.E., Ranjithan, R.S., Karonis, N.T., Mahinthakumar, K.G., 2010. Grid-Enabled Simulation-Optimization Framework for Environmental Characterization. Journal of Computing in Civil Engineering 24(6) 488-498.

Openshaw, S., 1995. Human Systems Modeling as a New Grand Challenge Area in Science - What Has Happened to the Science in Social-Science. Environment and Planning A 27(2) 159-164.

Openshaw, S., Abrahart, R.J., 2000. GeoComputation. Taylor & Francis, London, UK.

Openshaw, S., Turton, I., 2000. High Performance Computing and the Art of Parallel Programming: An Introduction for Geographers, Social Scientists and Engineers. Taylor & Francis, London, UK.

Parker, P., Letcher, R., Jakeman, A., Beck, M.B., Harris, G., Argent, R.M., Hare, M., Pahl-Wostl, C., Voinov, A., Janssen, M., Sullivan, P., Scoccimarro, M., Friend, A., Sonnenshein, M., Baker, D., Matejicek, L., Odulaja, D., Deadman, P., Lim, K., Larocque, G., Tarikhi, P., Fletcher, C., Put, A., Maxwell, T., Charles, A., Breeze, H., Nakatani, N., Mudgal, S., Naito, W., Osidele, O., Eriksson, I., Kautsky, U., Kautsky, E., Naeslund, B., Kumblad, L., Park, R., Maltagliati, S., Girardin, P., Rizzoli, A., Mauriello, D., Hoch, R., Pelletier, D., Reilly, J., Olafsdottir, R., Bin, S., 2002. Progress in integrated assessment and modelling. Environmental Modelling & Software 17(3) 209-217.

Perez, F., Granger, B.E., 2007. IPython: a system for interactive scientific computing. Computing in Science and Engineering 9(3) 21-29.

R Development Core Team, 2011. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

Rauber, T., Rünger, G., 2010. Parallel Programming for Multicore and Cluster Systems. Springer, Heidelberg, Germany.

Reid, W.V., Chen, D., Goldfarb, L., Hackmann, H., Lee, Y.T., Mokhele, K., Ostrom, E., Raivio, K., Rockstrom, J., Schellnhuber, H.J., Whyte, A., 2010. Earth System Science for Global Sustainability: Grand Challenges. Science 330(6006) 916-917.

Rotmans, J., van Asselt, M.B.A., 2001. Uncertainty management in integrated assessment modeling: Towards a pluralistic approach. Environmental Monitoring and Assessment 69(2) 101-130.

Schmidberger, M., Morgan, M., Eddelbuettel, D., Yu, H., Tierney, L., Mansmann, U., 2009. State of the Art in Parallel Computing with R. Journal of Statistical Software 31(1) 1-27.

Sulis, A., 2009. GRID computing approach for multireservoir operating rules with uncertainty. Environmental Modelling & Software 24(7) 859-864.

Page 17: High-performance computing tools for advancing the

Sutter, H., 2005. The free lunch is over: a fundamental turn toward concurrency in software. Dr. Dobb's Journal 30(3), http://www.gotw.ca/publications/concurrency-ddj.htm

Turton, I., Openshaw, S., 1998. High-performance computing and geography: developments, issues, and case studies. Environment and Planning A 30(10) 1839-1856.

Van der Jeught, S., Bradu, A., Podoleanu, A.G., 2010. Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit. Journal of Biomedical Optics 15(3).

van Rossum, G., Python community, 2010. The Python Programming Language: Version 2.7.1. The Python Software Foundation. http://www.python.org.

Venevsky, S., Maksyutov, S., 2007. SEVER: A modification of the LPJ global dynamic vegetation model for daily time step and parallel computation. Environmental Modelling & Software 22(1) 104-109.

Vokorokos, L., Blistan, P., Petrik, S., Adam, N., 2004. Utilization of parallel computer system for modeling of geological phenomena in GIS. Metalurgija 43(4) 287-291.

von Bloh, W., Rost, S., Gerten, D., Lucht, W., 2010. Efficient parallelization of a dynamic global vegetation model with river routing. Environmental Modelling & Software 25(6) 685-690.

Wang, D.L., Berry, M.W., Gross, L.J., 2006. On parallelization of a spatially-explicit structured ecological model for integrated ecosystem simulation. International Journal of High Performance Computing Applications 20(4) 571-581.

Wang, K.Y., Shallcross, D.E., Hall, S.M., Lo, Y.H., Chou, C., Chen, D., 2005. DOBSON: a Pentium-based SMP Linux PC Beowulf for distributed-memory high resolution environment modelling. Environmental Modelling & Software 20(10) 1299-1306.

Wang, S.Q., Gao, X., Yao, Z.X., 2010. Accelerating POCS interpolation of 3D irregular seismic data with Graphics Processing Units. Computers & Geosciences 36(10) 1292-1300.

Wang, S.W., 2010. A CyberGIS Framework for the Synthesis of Cyberinfrastructure, GIS, and Spatial Analysis. Annals of the Association of American Geographers 100(3) 535-557.

Wickham, H., 2009. ggplot2: Elegant Graphics for Data Analysis. Springer, Dordrecht.

Xie, J.B., Yang, C.W., Zhou, B., Huang, Q.Y., 2010. High-performance computing for the simulation of dust storms. Computers Environment and Urban Systems 34(4) 278-290.

Yang, C., Raskin, R., 2009. Introduction to distributed geographic information processing research. International Journal of Geographical Information Science 23(5) 553-560.

Yu, H., 2010. ‘Rmpi’ package. http://cran.r-project.org/web/packages/Rmpi/Rmpi.pdf

Page 18: High-performance computing tools for advancing the