1
Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing multiple functions on a single dataset achieved even greater performance gains (Figure 4). A final test included slope processing of a 95GB, 25 billion pixel raster file in under 3,600 seconds, indicating continued linear performance even with enormous file sizes. Future work should include utilization of parallel CPU threads for reading data into RAM while the GPU processes data, which we believe will provide the most dramatic performance increase. Additionally, because the greatest efficiencies were demonstrated when performing multiple computations for the data on the GPU, future work should include exploration of similar models in climatology, ecology, remote sensing, and natural resource assessment. There are many GIS models that require multiple iterations that can see enormous speed improvements if redesigned within a GPU context. Results and Discussion: On average, the introduction of CUDA provided a 7x improvement over the same function used in ArcGIS. Execution time for the CUDA functions exhibited a very stable and predictable linear trend for all functions over the various size raster files, allowing us to better predict response times for larger data sets. To test the linear fit we forecasted execution time out to a 100GB (25 billion pixels) raster and predicted an execution time of XXX. Our actual processing of the file completed in under XXX seconds. Unfortunately, the larger file was too large to process in ArcGIS so a comparison between the products is impossible to determine. In contrast, while most of the execution times for ArcGIS appear linear with respect to size, both slope and aspect execution deviate substantially from a linear fit when the size of the input raster was between 1.4GB and 4.0GB. We suspect that the deviation be related to the ESRI ADF format’s separation of data into files between 1.2 GB and 2GB or separate algorithm for datasets over some threshold size. The greatest bottleneck was obtaining data from the disk and moving to RAM (Figure 1, step 1). One data was on the GPU, parallelization took over and made short work of the processing. Numerous strategies were used to improve the data acquisition step with noticeable results. Testing under other hardware configurations confirmed that having more GPU memory to load larger chunks of data played a bigger role in execution than having more processing streams. Parallelizing Raster-Based Functions in GIS with CUDA C Sean Kirby, University of Maryland, Eastern Shore William Kostan, University of Virginia Dr. Arthur J. Lembo, Jr., Salisbury University 0 2000 4000 6000 8000 10000 12000 14000 0 600 1200 1800 2400 3000 f(x) = 0.0328742296089803 x − 13.9670035270744 R² = 0.988504650491386 CUDA TRI Slope Aspect Min Max File Size (MB) Execution Time (s) 0 2000 4000 6000 8000 10000 12000 14000 0 600 1200 1800 2400 3000 f(x) = 0.18806418123137 x − 99.4028859542285 R² = 0.967263435141168 f(x) = 0.0911538740625812 x + 167.735267407916 R² = 0.685624568687554 f(x) = 0.0666212524987203 x + 189.904723696621 R² = 0.59544572413871 ArcGIS Slope Linear (Slope) Aspect Linear (Aspect) Min File Size (MB) Execution Time (s) 0 2000 4000 6000 8000 10000 12000 14000 0 1800 3600 5400 7200 9000 10800 12600 f(x) = 0.90764679334195 x − 39.0767393318556 R² = 0.969363114515503 f(x) = 0.13912135961337 x − 14.0458102674828 R² = 0.990910367459572 All Functions CUDA Linear (CUDA) File Size (MB) Execution Time (s) Model Instability Introduction: A continuing challenge in GIScience and many other fields is the efficient and economical processing of massive data sets. We describe how to leverage massive parallel processing through general purpose computing using the graphics processing unit (GPGPU) for raster GIS operations. Using video-gaming cards to create parallel processing solutions within the GPU is relatively new and almost nonexistent in GIScience. By designing a software add-on built to run select raster analysis operations on a CUDA-enabled GPU, we were able to achieve a 7x speed improvement compared to traditional (ArcGIS) raster processes. In addition, our approach allows us to perform raster processing in under an hour for 95GB files with over 25 billion pixels. Finally, we identify key areas of future work to improve execution time. File Size (M B) Speedup 24.719 22.27539 521.048 6.596666 976.563 3.784819 1202.559 5.237141 1285.400 8.960199 2321.548 9.899794 2595.520 3.531955 2682.037 8.564936 4154.205 4.903046 5802.155 4.802306 7724.762 5.624901 9536.743 6.120141 11539.459 5.392021 12470.003 4.390388 G lobal Average 7.148836 Average Speedups Figure 3: Comparison of ArcGIS and CUDA-C generated performance of several raster functions as measured by execution time show that CUDA processes the data 3-22 times faster than the corresponding ArcGIS process. In addition, the CUDA process was perfectly linear (R 2 = .96), while the ArcGIS process exhibited serious instability between 1.25GB and 4GB for Slope and Aspect. Figure 4: CUDA vs. ArcGIS performance in executing six spatial analysis transformations per dataset as measured by execution time Figure 1: CUDA execution model * Additional speed improvements may be achieved by running this step as separate, parallel threads on the CPU, thereby allowing the CPU to prepare the next data block while the GPU is processing data Data and Methods: Fourteen ESRI FLT raster files (ranging from 24MB to 12GB) were processed and analyzed using 3x3 kernel functions for slope, aspect, terrain ruggedness, min, max, range, and mean. These functions were chosen as they represent “embarrassingly” parallel functions that can be run independent from one another, ideal for GPU processing. Each file was processed 20 times to determine the average speed using both ArcGIS and CUDA- C. The CUDA-C process (Figure 1) included: 1. Partition data by rows and read into an array in memory* 2. Copy data array to GPU. 3. Launch GPU kernel for raster processing. 4. Process data in parallel in GPU through a combination of blocks and threads. 5. Upon completion of each kernel, copy data from GPU to the CPU. 6. Write output to the appropriate file (one file per raster function) 7. Additional kernels are launched from data already on the GPU Figure 2: Oxford County, MD, LiDAR Imagery of Oxford County before (black and white) and after (hues of green and red) slope transformation. Figure 5: CUDA program integrated into ArcMap 10.0 as a toolbox with full graphical user interface (GUI) CUDA Toolbox Final Software Interface: One of the primary goals of the project was to create a simplistic add-on to ArcGIS for dramatic performance improvement. This was accomplished by incorporating the CUDA executable into a model in ArcToolbox for use any computer with ArcGIS and a CUDA-enabled GPU

Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing

Embed Size (px)

Citation preview

Page 1: Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing

Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing multiple functions on a single dataset achieved even greater performance gains (Figure 4). A final test included slope processing of a 95GB, 25 billion pixel raster file in under 3,600 seconds, indicating continued linear performance even with enormous file sizes.

Future work should include utilization of parallel CPU threads for reading data into RAM while the GPU processes data, which we believe will provide the most dramatic performance increase. Additionally, because the greatest efficiencies were demonstrated when performing multiple computations for the data on the GPU, future work should include exploration of similar models in climatology, ecology, remote sensing, and natural resource assessment. There are many GIS models that require multiple iterations that can see enormous speed improvements if redesigned within a GPU context.

Results and Discussion:On average, the introduction of CUDA provided a 7x improvement over the same function used in ArcGIS.

Execution time for the CUDA functions exhibited a very stable and predictable linear trend for all functions over the various size raster files, allowing us to better predict response times for larger data sets. To test the linear fit we forecasted execution time out to a 100GB (25 billion pixels) raster and predicted an execution time of XXX. Our actual processing of the file completed in under XXX seconds. Unfortunately, the larger file was too large to process in ArcGIS so a comparison between the products is impossible to determine.

In contrast, while most of the execution times for ArcGIS appear linear with respect to size, both slope and aspect execution deviate substantially from a linear fit when the size of the input raster was between 1.4GB and 4.0GB. We suspect that the deviation be related to the ESRI ADF format’s separation of data into files between 1.2 GB and 2GB or separate algorithm for datasets over some threshold size.

The greatest bottleneck was obtaining data from the disk and moving to RAM (Figure 1, step 1). One data was on the GPU, parallelization took over and made short work of the processing. Numerous strategies were used to improve the data acquisition step with noticeable results. Testing under other hardware configurations confirmed that having more GPU memory to load larger chunks of data played a bigger role in execution than having more processing streams.

Parallelizing Raster-Based Functions in GIS with CUDA CSean Kirby, University of Maryland, Eastern Shore

William Kostan, University of VirginiaDr. Arthur J. Lembo, Jr., Salisbury University

0 2000 4000 6000 8000 10000 12000 140000

600

1200

1800

2400

3000

f(x) = 0.0328742296089803 x − 13.9670035270744R² = 0.988504650491386

CUDA

TRISlopeAspectMinMaxLinear (Max)MeanRange

File Size (MB)

Exec

ution

Tim

e (s

)

0 2000 4000 6000 8000 10000 12000 140000

600

1200

1800

2400

3000

f(x) = 0.18806418123137 x − 99.4028859542285R² = 0.967263435141168

f(x) = 0.0911538740625812 x + 167.735267407916R² = 0.685624568687554f(x) = 0.0666212524987203 x + 189.904723696621R² = 0.59544572413871

ArcGIS

SlopeLinear (Slope)AspectLinear (Aspect)MinMaxLinear (Max)MeanRange

File Size (MB)

Exec

ution

Tim

e (s

)

0 2000 4000 6000 8000 10000 12000 140000

1800

3600

5400

7200

9000

10800

12600

f(x) = 0.907646793341951 x − 39.0767393318556R² = 0.969363114515503

f(x) = 0.13912135961337 x − 14.0458102674828R² = 0.990910367459572

All Functions

CUDALinear (CUDA)ArcGISLinear (ArcGIS)

File Size (MB)Ex

ecuti

on T

ime

(s)

Model Instability

Introduction:A continuing challenge in GIScience and many other fields is the efficient and economical processing of massive data sets. We describe how to leverage massive parallel processing through general purpose computing using the graphics processing unit (GPGPU) for raster GIS operations. Using video-gaming cards to create parallel processing solutions within the GPU is relatively new and almost nonexistent in GIScience. By designing a software add-on built to run select raster analysis operations on a CUDA-enabled GPU, we were able to achieve a 7x speed improvement compared to traditional (ArcGIS) raster processes. In addition, our approach allows us to perform raster processing in under an hour for 95GB files with over 25 billion pixels. Finally, we identify key areas of future work to improve execution time.

File Size (MB) Speedup24.719 22.27539

521.048 6.596666976.563 3.784819

1202.559 5.2371411285.400 8.9601992321.548 9.8997942595.520 3.5319552682.037 8.5649364154.205 4.9030465802.155 4.8023067724.762 5.6249019536.743 6.120141

11539.459 5.39202112470.003 4.390388

Global Average 7.148836

Average Speedups

Figure 3: Comparison of ArcGIS and CUDA-C generated performance of several raster functions as measured by execution time show that CUDA processes the data 3-22 times faster than the corresponding ArcGIS process. In addition, the CUDA process was perfectly linear (R2 = .96), while the ArcGIS process exhibited serious instability between 1.25GB and 4GB for Slope and Aspect.

Figure 4: CUDA vs. ArcGIS performance in executing six spatial analysis transformations per dataset as measured by execution time

Figure 1: CUDA execution model

* Additional speed improvements may be achieved by running this step as separate, parallel threads on the CPU, thereby allowing the CPU to prepare the next data block while the GPU is processing data

Data and Methods:Fourteen ESRI FLT raster files (ranging from 24MB to 12GB) were processed and analyzed using 3x3 kernel functions for slope, aspect, terrain ruggedness, min, max, range, and mean. These functions were chosen as they represent “embarrassingly” parallel functions that can be run independent from one another, ideal for GPU processing. Each file was processed 20 times to determine the average speed using both ArcGIS and CUDA-C. The CUDA-C process (Figure 1) included:

1. Partition data by rows and read into an array in memory*2. Copy data array to GPU.3. Launch GPU kernel for raster processing.4. Process data in parallel in GPU through a combination of blocks and

threads.5. Upon completion of each kernel, copy data from GPU to the CPU.6. Write output to the appropriate file (one file per raster function)7. Additional kernels are launched from data already on the GPU

Figure 2: Oxford County, MD, LiDAR Imagery of Oxford County before (black and white) and after (hues of green and red) slope transformation.

Figure 5: CUDA program integrated into ArcMap 10.0 as a toolbox with full graphical user interface (GUI)

CUDA Toolbox

Final Software Interface:One of the primary goals of the project was to create a simplistic add-on to ArcGIS for dramatic performance improvement. This was accomplished by incorporating the CUDA executable into a model in ArcToolbox for use any computer with ArcGIS and a CUDA-enabled GPU