20
A Modern Blast Solver Strategy and Its Urban Application R. C. Ripley 1 , F. Zhang 2 , C. T. Cloney 1 , S. McClennan 1 , N. McCormick 1 1 Martec Limited, 400-1888 Brunswick St., Halifax, Nova Scotia, B3J3J8, Canada; 2 Defence R&D Canada – Suffield, PO Box 4000, Station Main, Medicine Hat, Alberta, T1A8K6, Canada Abstract The nature of simulating blast effects from explosives and loads in real urban environments requires a CFD- based approach solved on highly discretized 3D domains. First-principles hydrocodes and CFD codes have traditionally been adapted for high-performance computing in distributed environments. The current trend in modern hardware involves many-core architectures, such as multi-processor multi-core CPU and multiple graphics processing units (GPU), often configured together on individual compute nodes. Thus, a next- generation CFD blast code has been developed to employ hybrid GPU/CPU shared-memory computing, which makes optimal use of heterogeneous compute cores in either a workstation or server, and is also scalable to modern parallel clusters. The goal of this modern blast solver is a 50 times speedup using GPU acceleration as compared with its predecessor – e.g., the Chinook code – running on one core of a standard CPU. This up-to- date development necessitated novel solution strategies, including: hybrid shared/distributed memory structure, CFD code optimization for many-core general-purpose GPU, and concurrent CPU/GPU task parallelization. A compact storage technique based on Cartesian meshes combines a zonal meshing strategy, adaptive mesh refinement (AMR), and dynamic Eulerian remapping to ensure maximum resolution of blast effects. The fundamental solver calculation time is 8 – 14 times faster than Chinook’s unstructured mesh CFD solver, and the novel mesh strategies provide a further speedup factor of 6 – 9. Hardware acceleration using scientific NVIDIA GPU cards was 15 – 35 times faster than a single CPU core for solver tasks alone; however GPU speedup of the overall CFD code was only 1.5 – 4 times faster. Further speedup is achieved using the zonal AMR as compared with uniform meshes. Therefore the new fast CFD solver results in a potential combined total close to two orders-of-magnitude speedup. In this paper, the combined speedup and optimal use of features and computing capability is assessed for practical urban situations. This modernized CFD blast solver has been integrated into a tool called the Rapid City Planner for prediction of urban explosion events using real city geometry. Building structures and urban terrain are automatically meshed and are embedded in the new solver using the immersed boundary method (IBM). Validation with benchmark numerical solutions and urban blast experimental data has been conducted. Due to the high level of automation in the Rapid City Planner and the novel use of GPU computing in the new solver, uncertainty quantification is required in addition to accuracy assessment. Sensitivity of the urban blast results to mesh orientation, explosive placement, and environment geometry is evaluated using perturbation methods, while uncertainties in real city geometry are discussed. More rigorous uncertainty quantification in the field of CFD is particularly challenging for shock wave applications, and is left for future work in connection with the GPU numerical solutions. Key words: Urban explosion modeling – Blast load prediction – Numerical simulation – GPU computing Introduction Terrorist threat frequency and magnitude does not follow natural distribution theory and absolute blast load probability cannot be accurately predicted from previous isolated events. Therefore scenario-based and relative-risk analyses are most appropriate for blast load assessment, in addition to forensics applications. In general, user-specified explosion scenarios are required for real-world assessments. Many numerical blast modeling codes have been developed for these purposes; however, few tools incorporate the requisite combination of physics, ease of use, and speed of execution. One example is the Chinook Computational Fluid Dynamics (CFD) code, which has been focused on physical and chemical models for defence R&D of non-ideal explosives and public safety issues with improvised explosive © © DRDC-RDDC-2016-P044 Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2016 Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2016

A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Embed Size (px)

Citation preview

Page 1: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

A Modern Blast Solver Strategy and Its Urban Application

R. C. Ripley1, F. Zhang2, C. T. Cloney1, S. McClennan1, N. McCormick1

1Martec Limited, 400-1888 Brunswick St., Halifax, Nova Scotia, B3J3J8, Canada; 2Defence R&D Canada – Suffield, PO Box 4000, Station Main, Medicine Hat, Alberta,

T1A8K6, Canada

Abstract The nature of simulating blast effects from explosives and loads in real urban environments requires a CFD-based approach solved on highly discretized 3D domains. First-principles hydrocodes and CFD codes have traditionally been adapted for high-performance computing in distributed environments. The current trend in modern hardware involves many-core architectures, such as multi-processor multi-core CPU and multiple graphics processing units (GPU), often configured together on individual compute nodes. Thus, a next-generation CFD blast code has been developed to employ hybrid GPU/CPU shared-memory computing, which makes optimal use of heterogeneous compute cores in either a workstation or server, and is also scalable to modern parallel clusters. The goal of this modern blast solver is a 50 times speedup using GPU acceleration as compared with its predecessor – e.g., the Chinook code – running on one core of a standard CPU. This up-to-date development necessitated novel solution strategies, including: hybrid shared/distributed memory structure, CFD code optimization for many-core general-purpose GPU, and concurrent CPU/GPU task parallelization. A compact storage technique based on Cartesian meshes combines a zonal meshing strategy, adaptive mesh refinement (AMR), and dynamic Eulerian remapping to ensure maximum resolution of blast effects. The fundamental solver calculation time is 8 – 14 times faster than Chinook’s unstructured mesh CFD solver, and the novel mesh strategies provide a further speedup factor of 6 – 9. Hardware acceleration using scientific NVIDIA GPU cards was 15 – 35 times faster than a single CPU core for solver tasks alone; however GPU speedup of the overall CFD code was only 1.5 – 4 times faster. Further speedup is achieved using the zonal AMR as compared with uniform meshes. Therefore the new fast CFD solver results in a potential combined total close to two orders-of-magnitude speedup. In this paper, the combined speedup and optimal use of features and computing capability is assessed for practical urban situations. This modernized CFD blast solver has been integrated into a tool called the Rapid City Planner for prediction of urban explosion events using real city geometry. Building structures and urban terrain are automatically meshed and are embedded in the new solver using the immersed boundary method (IBM). Validation with benchmark numerical solutions and urban blast experimental data has been conducted. Due to the high level of automation in the Rapid City Planner and the novel use of GPU computing in the new solver, uncertainty quantification is required in addition to accuracy assessment. Sensitivity of the urban blast results to mesh orientation, explosive placement, and environment geometry is evaluated using perturbation methods, while uncertainties in real city geometry are discussed. More rigorous uncertainty quantification in the field of CFD is particularly challenging for shock wave applications, and is left for future work in connection with the GPU numerical solutions.

Key words: Urban explosion modeling – Blast load prediction – Numerical simulation – GPU computing

Introduction Terrorist threat frequency and magnitude does not follow natural distribution theory and absolute blast load probability cannot be accurately predicted from previous isolated events. Therefore scenario-based and relative-risk analyses are most appropriate for blast load assessment, in addition to forensics applications. In general, user-specified explosion scenarios are required for real-world assessments. Many numerical blast modeling codes have been developed for these purposes; however, few tools incorporate the requisite combination of physics, ease of use, and speed of execution. One example is the Chinook Computational Fluid Dynamics (CFD) code, which has been focused on physical and chemical models for defence R&D of non-ideal explosives and public safety issues with improvised explosive

© ©

DRDC-RDDC-2016-P044Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2016 Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2016

Page 2: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

devices. Other urban blast related applications of Chinook include forensics investigations (e.g., [1]). Such calculations usually require manual geometry manipulation and mesh generation, which is labour intensive and time consuming. Typical calculation times for 3D blast calculations are on the order of several days using up to 128 CPU. Figure 1 presents results from an example high-fidelity Chinook calculation.

Figure 1: Example high-fidelity Chinook calculation. Colour contours denote damage level.

Over the past decade, interest by public safety and incident response communities has further pushed the requirement for fast, yet accurate, solutions. This community necessitates a move away from expert analysis using large-scale computational clusters. Instead prompt response and in situ assessment using laptop computers are envisioned as applications. Modern general-purpose Graphics Processing Units (GPU), which were originally developed for the gaming industry, can provide thousands of compute cores on a single device and may be adapted to provide a viable alternative to the former computational clusters.

This paper presents a new fast CFD solver for urban blast assessment, which leverages GPU computing and modern software technologies. The solver is embedded in a tool called the Rapid City Planner which provides a framework for urban explosion assessment using physics-based models especially designed for the near-field regime.

First Principles Modeling of Blast Effects Studies of blast effects in urban environments originally focused on shock wave propagation in air, both in small-scale experiments and using gas dynamics codes. From simple point sources, the effects of blast channeling, focusing, diffraction, clearing, and sheltering are readily assessed for various geometric configurations. However, for large explosive threats at real-world scale, the building structures are often within close proximity of the charge and the resulting fireball may be fuel rich. Details of the explosion source, complex blast, and the effects of the environment on blast enhancement following detonation, necessitate first principles modeling using three-dimensional CFD. Even still, the near-field and confinement effects are particularly challenging for shock physics codes. Figure 2 illustrates the distribution of blast loads at ground level in an urban terrain showing the complex interaction of multiple building structures. Beyond the important physical models, the numerical resolution and mesh generation are common issues and routine challenges.

Page 3: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Figure 2: Distribution of urban blast loads. Overhead view of records of peak overpressure (left) and maximum impulse (right).

The confinement of urban settings has been extensively studied both experimentally [2] and numerically [3,4], where enhanced mixing and heating due to multiple blast reflection promotes afterburning of unburnt and shock-dispersed fuels. Reactive particles added to explosives may fragment and react on structures [5]. Loading was shown to be significantly enhanced due to non-ideal and heterogeneous blast explosives [6]. An extended near-field regime has been defined whereby blast scaling laws are not applicable. Under urban confinement, the blast is enhanced significantly within a scaled distance of up to 4 m/kg1/3 [3]. Thus, conventional scaling laws are not applicable and many tools become inaccurate when dealing with enhanced blast explosives, especially in the near-field and confinement of structures.

The first-generation Chinook CFD code has been developed by Martec and Defence R&D Canada since 2002 to address the above-mentioned complex blast phenomena and enhanced blast with multiphase flow using first-principles methods. Physical models for detonation, afterburning, and target interaction have been validated using experimental data from numerous fundamental and scaled urban tests [2,3,5,7,8].

Application of shock physics codes includes the ½ real-world scale high-fidelity TSWG Urban Canyon experimental facility at EMRTC (see [9-11]). Figure 3 shows a comparison of Chinook numerical to experimental data for C4 explosive located at the centre of the Enhanced Open Square configuration. In this case, the 75 gauges have been concentrated in the extended-near-field regime: 51% of gauges are within 1 – 2 m/kg1/3 and 37% of gauges are between 2-3 m/kg1/3. The close proximity makes this a particularly challenging test case both for numerical prediction and experimental gauge measurement. Figure 3 shows the results of a blind Chinook calculation, which on average had a +14% relative difference on impulse and -27% relative difference on peak overpressure. Since the shock time of arrival was only 1.9% different between numerical results and experimental measurements, the large pressure difference may be attributed to a combination of mesh resolution and gauge overshoot. The average absolute gauge error was +7.3% on impulse and -10% on peak pressure.

Page 4: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

This class of validation computation requires hours to days of CPU time on computational clusters. Such a predictive blast capability for real-world explosive blast assessment is required to be fast, easy to use, and deployable for rapid assessment.

Figure 3: Comparison of first principles numerical results to high-fidelity urban experimental data in the extended near-field regime.

A Modern Blast Solver Strategy Whereas the Chinook CFD code was designed for unstructured meshes tailored to each application, the major shift now is towards Cartesian meshes with embedded structures and adaptive mesh refinement (AMR). These are not new concepts and were commonplace for blast codes in the 1990s (e.g., IFSAS, MAZ, and SHAMRC) and more recently the Air3D, GDT and APOLLO codes, to name but a few. The present modernization lies in optimization and automation using the immersed boundary method (IBM), zonal based mesh adaption for refinement, dynamic rezoning, and hybrid parallelization strategies including GPU computing.

1. Mesh StrategiesThe new fast CFD solver is purpose built for three-dimensional applications. The Cartesian discretization does not require explicit storage of the mesh. Relative to an unstructured code, a significant speedup is expected due to uniform cell and face sizes, logical neighbour lookup, and no longer computing normal vectors or volumes. Speedup also results from vector memory structure, compact gradients, and in-memory remapping. For the free-field based on the classical Sedov problem (see [12]), a speedup of 8 – 14 times was achieved over the Chinook code when using uniform meshes.

1.1. Adaptive Mesh Refinement There are many AMR techniques based on parent-child analogies using tree data structures. The performance of any AMR technique, especially for transient simulations, is governed by the memory structure and numbering system for fast neighbour searching. Modern approaches include fully-threaded tree [13], cell-based structured AMR [14], patch-based AMR [15], and zonal AMR [16]. Tree structures that require 2:1 refinement between neighbours perform well

Page 5: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

when the number of refinement levels is relatively small. In order to resolve disparate length scales while ensuring potential for GPU computing in the new fast CFD solver, the present AMR strategy is instead zonal based, which is a generalization of patch-based AMR. The zone-based method [17] allows many-to-one differences in refinement level between adjacent cells. Block-structured meshes are used within each zone for compact storage. Similar to [14], a hash table is used for fast neighbour access.

1.2. Immersed Boundary Method For representation of structures, the immersed boundary method of Mittal [18] is used. This allows wall boundaries to be embedded in the mesh and computational cells are determined to be inside or outside the structure. Ghost cell solutions inside the structure are derived from image points in the fluid domain. Figure 4 illustrates the application of IBM to a scaled urban test case. This configuration from Skjold [19] involves four square buildings, each 2.3 m in dimension and separated by 2.3 m to form a simple intersection. The example calculation results in Figure 4 also show the zone-based AMR, highlighting the many-to-one refinement strategy. Three levels of refinement are shown for clarity.

Figure 4: Zone-based AMR with IBM applied to scaled urban environment of Christensen

(see [19]): (left) overhead view; and (right) side view. Colour contours denote pressure.

An important aspect is that the IBM does not require the structures to be grid aligned. The advantages of IBM in the present application include: avoiding small timesteps caused by small cells or cut cells; precise resolution of position and slope of boundary without an unstructured mesh, and no numerical noise problems from stair-cased boundaries.

The power of the IBM method is that the structure can be arbitrarily oriented and intersected with the CFD mesh. The generality of this approach can support complex geometric structures defined by shell geometries. Such target structures can be surface-meshed using triangular elements, which is much more efficient than volumetrically meshing the CFD domain with tetrahedral cells. CFD cells falling within the structure volumes can be deactivated to save computational effort.

Page 6: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

1.3. Automatic Zoning A priori zoning of the CFD mesh becomes impractical for multi-scale urban applications with complex geometric structures. Therefore automatic zoning methods were developed to resolve both near-field and far-field scales. These include dynamic mesh addition and rezoning (coarsening-type remapping) to transcend scales. The novel use of this concept is illustrated in Figure 5. This 3D example employs coarse unstructured building surface meshes and coarse uniform CFD meshes to aid in visualization (with AMR disabled for clarity). Conservation is maintained during successive remappings by enforcing perfectly nested meshes. The approach balances resolution with scale and operates within a memory quota established at the beginning of a run. The automatic management of number of cells in the mesh during the calculation is plotted in Figure 6. The advantage of the present mesh strategy is its intentional design for deployment on Graphical Processing Units.

Figure 5: Dynamic mesh strategy for transcending scales using a combination of automatic mesh addition and rezoning. Overhead view of blast evolution amongst three-dimensional city

buildings with increasing domain size with time.

Page 7: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Figure 6: Mesh statistics during calculation using automatic zoning.

2. Hardware AccelerationMuch promising data has been shown which demonstrates the GPU computing potential for CFD computation. The raw floating point calculation power of GPU is currently in excess of 1 TFLOP, which may allow significant speed enhancement for CFD applications. Using single precision, the throughput can be even higher (typically 2 – 4 times, depending on GPU card). Maximum throughput can only be achieved when the GPU is constantly overloaded with streams of calculation tasks. For explicit time-stepping codes for blast, the main calculation tasks are the second-order gradients, approximate flux solver and source terms.

Figure 8 shows the computational performance of a 2D shock wave code being executed entirely on a GPU card, using single precision floating point data. In this deployment configuration there is no program control, input/output, or reduction (e.g., timestep calculation) tasks, and therefore the results are indicative of the upper limit of performance. The results show that the CFD throughput increases with problem size, until reaching an upper limit when the GPU becomes saturated. For this application, the GPU performance was calculation bound above 1 M cells.

Speedup factors were estimated from the throughput at the large problem size limit of each GPU card. For reference, a single CPU core processed 6 Mcell/second on a workstation and 1-5 Mcell/second on a laptop. Therefore speedup ratios of 25-35 for high-end scientific and gaming GPU cards, and 4-5 on an entry-level laptop GPU, are possible for this idealized compressible flow code. Results for the same test case using double precision showed only 55-60% of the above described GPU speedup. The increased precision also limited the maximum number of cells to 3.6 – 4.8 M on the high-end GPU cards. The solution impact of using single precision for fast CFD solutions requires further investigation.

Page 8: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Figure 8: Computational performance of various market-available GPU cards from NVIDIA running 2D CFD in single precision.

3. Hybrid ParallelizationOne limiting factor is the GPU memory footprint. At the present time, GPU cards typically have 2 - 4 GB of on-board memory in which the entire CFD solution must fit to avoid expensive communication with the CPU memory. This puts a limit of approximately 5 M cells on the problem (even less with advanced physics and complex chemistry). For larger CFD simulations, the problem becomes bandwidth limited. For efficiency, data streaming to and from the GPU card needs occur concurrently with the GPU computation tasks. Shared memory parallelization of the CPU tasks may also lead to more efficient utilization of the GPU. Alternatively, distributed CPU calculation using MPI allows for domain decomposition of the CFD problem with GPU used as a co-processor. Design of CFD codes for GPU must include consideration of these factors. The approach used for the new fast CFD code involved design for both single and distributed multiple GPU using multiple distributed CPU as the host controller. The resulting hybrid parallel code is therefore deployable on systems ranging from GPU laptops to supercomputers. The Oak Ridge Lab Supercomputer is an example platform with distributed GPU.

4. Physical and Chemical ModelsA fast gas dynamics capability alone is not sufficient for the assessment of real-world explosive threats in urban and military situations. Physical models are essential for addressing the near-field and enhanced blast effects, which include post-detonation combustion. Casing fragmentation is often involved but is not the primary damage mechanism for blast loads on structures. In dealing with home-made explosives and improvised explosive devices, solid particulates, condensed species, and high vapour pressure fuels may be encountered. To predict full explosive output, the thermophysical models need to be connected to pressure and temperature.

Equations of state (EOS) for detonation products and shock-air are critical in the near-field. In connection with afterburning, the EOS parameters change with gas specie composition. An afterburning JWL EOS with variable gamma has been developed to handle this wide range in gas conditions during urban explosion events [7]. For reactive metal particles, phase change

Page 9: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

and aerodynamic breakup of molten particles play an important role in dispersal [5]. A reaction mechanism for aluminium particles has been developed for combustion in air [8]. Many of these advanced physics and chemistry are prominent in the near-field. To speed up the blast calculation, particles may be treated as equilibrated dilute species later in time.

The fast CFD solver provides the next-generation platform for the abovementioned physical and chemical models. Mixing and afterburning are conducive to being solved on GPU. Tasks such as searching for Lagrangian particles are better suited for CPU to avoid unbalancing the GPU.

5. Blast Load ReconstructionHigher-order schemes can provide resolution of shock waves only within the mesh discretization. Even with AMR, the most refined mesh can become coarse at building and city scales. To optimally preserve pressure peaks, a reconstruction scheme is employed. The accuracy of blast front pressure is plotted in Figure 7 on a typical grid convergence plot. The error and convergence rates for first and second order schemes are shown. Error is defined with respect to the asymptotically grid-converged solution. When using peak pressure reconstruction, the error is reduced by up to two orders of magnitude. Impulse is relatively independent of mesh resolution and no correction is required.

Figure 7: Grid convergence of numerical schemes and blast reconstruction technique.

Code Performance Evaluation The performance of the new fast CFD code is evaluated in terms of accuracy and speedup.

6. Accuracy Assessment6.1. Benchmark Test Case

A basic test case is blast on a single building structure. A small scale experiment was conducted by Ohrt [20] and has been used as a numerical validation exercise for several codes. This 3D case involves near-field blast interaction with a structure at a scaled distance of 0.35 m/kg1/3 and open-source data is available for the side wall, back wall and roof surface.

Page 10: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Figure 9 illustrates the blast interaction with the single building structure. Figure 10 compares the peak pressure and impulse recorded on the front face of the building with the experimental results.

Figure 9: Second-order accurate solution along symmetry plane. Results using a 5 mm mesh shown a time of 0.8 ms.

Figure 10: Pressure and impulse comparison with experimental data in [21]. The numerical mesh resolution was 2 mm.

6.2. Mesh Sensitivity The automatic meshing of city structures and the natural orientation of city streets with respect to the computational mesh may lead to additional error from the boundary discretization. The non-Cartesian grid alignment is obvious in Figure 5. To assess these effects, parametric testing of the blast load variation has been conducted by rotating the structure mesh around the blast threat.

The scaled urban test of Christensen (see Skjold [19]) was used where the charge has been placed in the centre of the intersection between four square buildings. The solution is symmetric in eight planes; however the full three-dimensional environment was simulated and redundant gauges were placed in quadruplicate in symmetric locations. The structure mesh and gauges were rotated from 0 to 45 degrees in 5 degree intervals, allowing the arithmetic mean and maximum deviation from the mean to be plotted using error bars. Figure

Page 11: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

11 shows the maximum overpressure and peak impulse measured at 60 gauges using a coarse mesh with a cell size of 10 cm. The average deviation was +12.9/-12.1% for pressure and +4.7/-4.2% for impulse. This represents both the IBM discretization error and gauge recording interpolation error. The quality of symmetry can be seen where each group of four gauges displays similar mean and deviation results. The errors will be reduced as the mesh is refined, but is useful to assess the deviation expected from automatic mesh embedding.

Figure 11: Error bars illustrating sensitivity of results to mesh orientation. Gauges sorted from highest to lowest load.

7. GPU Performance7.1. Accuracy Assessment

Figure 12 provides a plot of the difference between solutions to the Sedov spherical blast [12] test case computed on CPU and GPU, both using double precision. By design, the GPU performs excess identical calculations, some of which are not used. Differences are mainly a result of the order of operations in the solver code and barriers to race conditions. The test simulation involved 643 cells and the results are plotted in a histogram at t = 50 ms. The error between CPU and GPU is defined as E = (ρCPU – ρGPU)/ρCPU and shows up to 6% of the solution space may have a difference above 0.0001. Over 78% of the cells have a difference less than 10-10 and are not plotted in the figure.

Figure 12: Distribution of GPU deviation from the CPU solution with error greater than 10-10 using double precision floating point data.

Page 12: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

7.2. Speedup Assessment Basic parallelization concepts dictate that speedup is limited by serial processes and communication losses. Fig 13 shows the real speedup of the prototype solver. The flux solver alone shows very good speedup when using GPU. Based on relative card performance (e.g., Figure 8), high-performance scientific and gaming GPU cards are expected to show further speedup by a factor of two over the Quadro 4000. More importantly, the total system speedup is quite modest, being limited by CPU tasks that include dynamic mesh techniques, structure boundary detection, and timestep control. Certainly these tasks can themselves be parallelized, however a bottleneck remains for the data copy to and from the GPU card. Thus, at present, the practical GPU speedup is on par with a multi-core CPU workstation. One advantage is that the CFD problem size in the current hybrid GPU/CPU deployment is not limited by GPU memory. Additional physical and chemical models, and numerical techniques such as multiphase and fluid-structure interaction, that are non-linear or involve searching are not conducive to GPU calculation.

Figure 13: Practical GPU speedup of the prototype next-generation solver: (left) speedup of the solver tasks alone; and, (right) speedup of the total system. Results for workstation-class

GPU card and mobile laptop GPU card in double precision.

Some cautionary notes regarding GPU speedup should be made. In nearly all practical deployments, CPU is still being used as a host controller and is often overlooked in the GPU-to-CPU performance comparison. Furthermore, the GPU speedup is conventionally reported with respect to one CPU core. Thus, the total system CPU – often dual or quad CPU each with multiple-cores (6 to10 core) – is likely to outperform a single GPU card. Despite these facts, GPU development continues to be pursued in anticipation of future advancement in many-core compute environments.

7.3. AMR Performance Table 1 demonstrates the AMR speedup relative to a uniform mesh at the equivalent highest resolution. The test case is the free-field blast problem of Sedov [12]. The base mesh contains 323 zones. This result provides an upper limit to the expected AMR speedup in urban simulations. The level 5 refinement simulation required a peak memory usage of 12 GB. The results also indicate that models with the equivalent resolution of one billion cells are possible using AMR on a laptop.

Page 13: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Table 1: Adaptive mesh refinement speedup for free-field blast.

AMR Level Equivalent Resolution

Equivalent Uniform Cells

Maximum AMR Cells

Zonal AMR Speedup

2 1283 2.1 M 126 k 27.3

3 2563 16.8 M 698 k 52.6

4 5123 134 M 5.4 M 75.0

5 10243 1.07 B 40 M 91.3

7.4. Demonstration Case An urban performance comparison was demonstrated using the same configuration used to generate the numerical results shown in Figure 3. The model statistics and timing results are summarized in Table 2. The Chinook meshes feature grid expansion in the far field and were previously run in parallel using MPI on up to 64 CPU (Xeon E5420). The Chinook code has been shown to scale nearly linearly for this size of problem. The prototype fast solver was run on a single CPU (Xeon E5-1660). For a 7.62 cm uniform mesh, the results show that the new solver is nine times faster than Chinook with a hexahedral mesh, despite having nearly four times the number of cells. When using AMR (four levels with a finest cell size of 7.62 cm), the prototype solver is 71 times faster than Chinook running on a tetrahedral mesh. The problem sizes shown in Table 2 are in the mesh size range that is too large for GPU alone, further justifying the development of the hybrid parallelization scheme. Testing on GPU is underway for uniform and AMR meshes above 10 M cells.

Table 2: Model statistics and timing results for C4 explosive in EMRTC environment. Simulation end time of 40 ms. Second order accurate models unless otherwise indicated.

Mesh Resolution

(edge length)

Chinook (Unstructured) Prototype Code (Cartesian) Mesh size and

topology Wall-clock time

(CPU-hours) Mesh size and

topology Wall-clock time

(CPU-hours) 7.62 cm 10.4 M

hexahedral 61.5 (first order) - -

7.62 cm 12.7 M hexahedral

228.5 46 M uniform

26.1

7.62 cm 25.1 M tetrahedrals

1882 67 M AMR

26.5

3.81 cm 92 M tetrahedrals

5196 (first order) - -

3.81 cm 92 M tetrahedrals

47,867 - -

Page 14: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

The Rapid City Planner Tool This modernized CFD blast solver has been integrated into a tool called the Rapid City Planner for prediction of urban explosion events using real city geometries. The Rapid City Planner uses GIS for geographically-based threat location and realistic in situ scenario visualization. City streets and building structures are automatically meshed and are embedded in the new solver using the immersed boundary method technique. The fast CFD-based blast solver is setup and executed automatically. A graphical user interface guides this process though a series of steps in which the Google Earth environment is used to interact with the model. Results are stored in a database for fast access and easy modification.

8. Geo-Located Models and Automatic Mesh GenerationReal city environments feature terrain, three dimensional buildings, and other obstacles (e.g., vehicles, construction equipment, and architectural details). Terrain is constructed from digital elevation data; an example is shown in Figure 14. City buildings are created from various geo-located file types. An example structure mesh of city buildings is shown in Figure 15.

Figure 14: Elevation view of a model city to demonstrate the terrain.

Figure 15: Automatic surface mesh generation of city buildings from Trimble SketchUp 3D Warehouse models.

Page 15: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

9. Fast CFD Solution using IBMThe immersed boundary method can detect very detailed building structures and measure reflected blast loads within the discretization of the underlying CFD mesh. Higher-order interpolation of the reflected blast pressure is applied at these interfaces. In the IBM implementation, the numerical timestep is not affected by the level of detail in the structural model. Figure 16 presents an example of a detailed structural model covering several downtown city blocks. Additional levels of detail are easily handled using IBM for assessment of individual buildings, which is typically done for protection of critical infrastructure, national historical monuments, and iconic structures.

Figure 16: Fast CFD Solution using the immersed boundary method. Colour contours denote maximum overpressure.

10. Mapping-Based Graphical User InterfaceIn order to facilitate application and use of the fast CFD solver, a graphical user interface was developed which is based on Google Earth. This provides a familiar and easy-to-use environment, with ease of orientation through satellite photography and overset street names. Google Earth is embedded in the Rapid City Planner user interface, as shown in Figure 17. The environment allows user-specification of the threat location and visualization of the scenario.

Figure 18 illustrates sample blast results displayed in the Google Earth environment. Damage models based on pressure-impulse criteria employ these loads for estimating structural damage. Figure 19 demonstrates injury contours on street level in Google Earth.

Page 16: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Figure 17: Rapid City Planner user interface with embedded Google Earth environment.

Figure 18: Blast loads on structures displayed in the Google Earth environment.

Page 17: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

Figure 19: Damage/injury contours in city streets displayed in the Google Earth environment.

Discussion of Modeling Uncertainty With any new code, extensive verification and validation is required to provide confidence in the results. Modern numerical tool assessment often includes formal uncertainty quantification (UQ). The method of manufactured solutions [22] is one method of assessing numerical uncertainty. For CFD solvers, it is directly applicable to basic Euler and Navier-Stokes solvers, however the extension to multiphase chemically reacting flow is not straight forward and has not yet been undertaken.

In urban explosion modeling, there are several uncertain parameters, including: threat make-up (explosive mass, composition), threat placement (location, orientation, and stand-off distance), and structural resistance. Modeling error of the blast loads can come from the numerical schemes, physical/chemical models, and particularly the city geometry. The completeness, level of detail, resolution, and accuracy are among primary uncertainties in the city geometry. Beyond the load prediction, other uncertainties include user errors and applicability of damage models. The foregoing represents the epistemic uncertainties. Inherent variability, or alleatoric uncertainties, include building structure condition (built as designed, retrofits, age) and atmospheric conditions.

As briefly listed above, the number of uncertainties in urban assessment can be quite large. A screening strategy may be employed in which parameters with negligible influence on model predictions are first identified, and then the remaining significant parameters are then ranked in terms of importance, and finally UQ is performed on a smaller set of important parameters. Some important parameters are the structure boundary representation and the threat placement.

Forward propagation of variations in uncertain parameters allows quantification of uncertainty; however in blast applications, extreme non-linearity and bifurcation are expected. Parameter variation methods such as polynomial chaos have been applied to fluid flows with discontinuities, unfortunately these methods are very slow and require extensive

Page 18: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

modifications to the source code itself. Alternatively, UQ can be probabilistic using stochastic variations, Monte Carlo methods, or perturbation methods systematically using limits of confidence intervals. The fast CFD methodology affords the opportunity to rapidly vary these parameters, thereby reducing parameter uncertainty, inherent variability, and user precision.

Conclusions The next-generation Chinook shock physics code has been developed and embedded in a user-friendly environment for modeling explosions in city settings. The Rapid City Planner tool automates the geometric modeling and mesh discretization for real city buildings and terrain, and provides visualization of CFD results in the Google Earth mapping environment. The complete tool provides a system for rapid assessment of user-specified scenarios.

The modern blast solver strategy employs high-resolution schemes on Cartesian meshes with both immersed boundary method and adaptive mesh refinement. IBM was shown to handle very detailed and complex geometries applicable to real-world city environments. The zonal-based AMR method allowed maximum resolution to be applied efficiently to resolve blast fronts and other discontinuities. The AMR is very efficient and provides significant speedup factors over uniform meshes.

Solver development for computing using graphical processing units showed potential speedup factors of up to 35 over a single CPU core using scientific and gaming cards on a workstation. However, practical speedup using mobile GPU was on par with CPU performance. In general, the GPU is only efficient for simple CFD calculation and is currently limited by the graphics memory. For large-scale blast computations, the role of GPU is mainly as a math co-processor.

The new solver is also directly applicable to internal and fully confined blast. Continuous validation is underway using high quality urban experimental data, particularly for near field, and enhanced blast explosives. Walls are assumed to be rigid, however the new blast solver was designed to be directly extensible to fluid-structure interaction for frangible structures and rigid body obstacles.

The fast CFD methods, hardware acceleration and modern software technologies have shown the potential to exceed a two-orders-of-magnitude speedup over the predecessor Chinook code. This next-generation code will be deployable on workstations and parallel computing clusters for detailed assessments. For fast CFD solutions on a laptop, the overall solution time has been drastically reduced to the point where user input time is now on the same order as the solution time. Calculations suitable for decision making can be completed in the timeframe of minutes.

Acknowledgements Support is acknowledged from the Canadian Safety and Security Program (CSSP) project CSSP-2013-CP-1001 and Natural Sciences and Engineering Research Council (NSERC) of Canada Grant EGP#433737-12 (University of New Brunswick). The authors appreciate helpful discussion of fast CFD methods with Dr. Fue-Sang Lien (University of Waterloo) and Dr. Hua Ji (Waterloo CFD Consulting Inc.). The contributions of Yibo Li (Martec Limited) to the development of the graphical user interface are acknowledged.

Page 19: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

REFERENCES [1] Christensen, S. O. and Hjort, Ø.J.S. 2012, Back Calculation of the Oslo Bombing on July 22nd,

Using Window Breakage, Proc. 22nd Military Aspects of Blast and Shock Symposium, Bourges, France.

[2] Zhang, F., 2006, Confined Heterogeneous Blast, Proceedings of the 19th International Symposium on Military Aspects of Blast and Shock, Calgary, Canada.

[3] Ripley, R. C., Cloney, C. T., Donahue, L. and Zhang, F., 2012, Extended Near-Field Regime in Urban Confinement, Proc. 22nd Military Aspects of Blast and Shock Symposium, Bourges, France

[4] Ripley, R. C., Zhang, F., Donahue, L. and Dunbar, T. E., 2005, Modeling of Complex Blast Loading in Streets, Proceedings of the 6th Asia-Pacific Conference on Shock & Impact Loads on Structures (H. Hao, T. S. Lok and G. X. Lu, eds.), pp. 487-498.

[5] Ripley, R. C., Donahue, L. and Zhang, F., 2014, Fragmentation of metal particles during heterogeneous explosion, Shock Waves (Accepted).

[6] Ripley, R. C., Cloney, C., Donahue, L., Frost, D. L, and Zhang, F., 2010, Enhanced Loading due to Reflected Heterogeneous Blast, 21st Military Aspects of Blast and Shock Symposium, Jerusalem, Israel

[7] Donahue, L., Zhang, F. and Ripley, R. C., 2013, Numerical Models for Afterburning of TNT Detonation Products in Air, Shock Waves, Vol. 23, Issue 6, pp 559-573.

[8] Zhang, F., Gerrard, K. and Ripley, R.C., 2009, Reaction Mechanism of Aluminum-Particle–Air Detonation, Journal of Propulsion and Power, Vol. 25, No. 4, pp 845-858.

[9] Bowles, T., Stevens, D. and Stanley, M., 2012, Assessment of Blast Load Prediction in an Urban Environment, Proc. 22nd Military Aspects of Blast and Shock Symposium, Bourges, France.

[10] Lee, C. K. B., 2012, Application of Scaling to Explosions in a Model City, Proc. 22nd Military Aspects of Blast and Shock Symposium, Bourges, France.

[11] Department of Homeland Security, 2011, Preventing Structures from Collapsing - to Limit Damage to Adjacent Structures and Additional Loss of Life when Explosives Devices Impact Highly Populated Urban Centers, Buildings and Infrastructure Protection Series, BIPS05, DHS S&T.

[12] Sedov L.I., 1959, Similarity and Dimensional Methods in Mechanics, New York, Academic Press.

[13] Khokhlov, A.M., 1998, Fully Threaded Tree Algorithms for Adaptive Mesh Fluid Dynamics Simulations, J. Computational Physics 143, pp 519-543.

[14] Ji, H., Lien, F.-S. and Yee, E., 2010, A new adaptive mesh refinement data structure with an application to detonation, Journal of Computational Physics 229, pp. 8981–8993.

[15] Borrel, M., Ryan, J. and Billet G., 2006, A Generalized Patch AMR Platform that uses Cell Centered or Cell Vertex Solvers, European Conference on Computational Fluid Dynamics, ECCOMAS CFD 2006 (P. Wesseling, E. Onate and J. Periaux, eds), Technical University of Delft, The Netherlands.

[16] Klomfass, A., Zweigle, T., Fischer, K., 2012, A New AMR/ALE-Solver for the Efficient Simulation of Detonations and Blast Waves, Proc. 22nd Military Aspects of Blast and Shock Symposium, Bourges, France.

[17] Herzog, O and Klomfass, A., 2010, A Specialized, Automated Solver for Simulation of Blast Events in Urban Scenarios: New AMR Approach, 21st Military Aspects of Blast and Shock Symposium, Jerusalem, Israel.

[18] Mittal, R. and Iaccarino, G. 2005, Immersed boundary methods. Annu. Rev. Fluid Mech. 37, 239.

[19] Skjold, T., Christensen, S.O., Bernard, L., Pedersen, H.H. and Narasimhamurthy, V.D., 2012, Urban canyon blast load calculations. Proc. 22nd International Symposium on Military Aspects of Blast and Shock, Bourges, France.

[20] Ohrt, A. P., Lunderman, C. V., and Rickman, D. D, 1998, Minature-scale experiments of airblast diffraction oer barrier walls, Proc. 69th Shock and Vibration Symposium, Minneapolis, MN.

Page 20: A Modern Blast Solver Strategy and Its Urban …cradpdf.drdc-rddc.gc.ca/PDFS/unc228/p803838_A1b.pdfA Modern Blast Solver Strategy and Its Urban Application . ... CFD code optimization

[21] Joachim, C. E. Armstrong, B. J. and Rickman, D. D., 2002, Free-Field Airblast on Structures: Computational Model, Proc. 17th Military Applications of Blast Simulations, Las Vegas, NV.

[22] Roy C. J., Nelson, C. C., Smith, T. M. and Ober, C. C., 2004, Verification of Euler/Navier-Stokes codes using the method of manufactured solutions. International Journal for Numerical Methods in Fluids 44, pp. 599-620.