Upload
-office-saitoh
View
385
Download
4
Tags:
Embed Size (px)
DESCRIPTION
This evaluation to be continued, For future reference.
Citation preview
NVIDIA® CUDA™ 5.0 Sample evaluation result
PART Ⅱ
GPU: GTX 560 Ti
CPU: i5-3450S (TDP65W)
RAM: 16GB
OS: Windows 7 x64 Ultimate
Yukio Saitoh | FXFROG.com
24/Apr/2013
INDEX
Sample binary :19. concurrentKernels
20. conjugateGradient
21. concurrentKernels
22. conjugateGradient23. conjugateGradientPrecond24. convolutionFFT2D25. convolutionSeparable26. convolutionTexture27. cppIntegration28. cudaDecodeD3D9 (runaway)29. cudaDecodeGL30. cudaEncode (runaway)31. dct8x832. deviceQuery33. deviceQueryDrv34. dwtHaar1D35. dxtc
Sample target path and files
• C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release
concurrentKernels.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥concurrentKernels.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
> Detected Compute SM 2.1 hardware with 8 multi-processors
Expected time for serial execution of 8 kernels = 0.080s
Expected time for concurrent execution of 8 kernels = 0.010s
Measured time for sample = 0.010s
Test passed
conjugateGradient.exe
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
> GPU device has 8 Multi-Processors, SM 2.1 compute capabilities
iteration = 1, residual = 4.451374e+001
iteration = 2, residual = 3.248658e+000
iteration = 3, residual = 2.695777e-001
iteration = 4, residual = 2.314586e-002
iteration = 5, residual = 1.997625e-003
iteration = 6, residual = 1.852079e-004
iteration = 7, residual = 1.705767e-005
iteration = 8, residual = 1.618583e-006
Test Summary: Error amount = 0.000000
conjugateGradientPrecond.exe
conjugateGradientPrecond starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
GPU selected Device ID = 0
> GPU device has 8 Multi-Processors, SM 2.1 compute capabilities
laplace dimension = 128
Convergence of conjugate gradient without preconditioning:
iteration = 542, residual = 8.660636e-013
Convergence Test: OK
Convergence of conjugate gradient using incomplete LU preconditioning:
iteration = 188, residual = 9.056491e-013
Convergence Test: OK
Test Summary:
Counted total of 0 errors
qaerr1 = 0.000004 qaerr2 = 0.000003
convolutionFFT2D.exe 1/2
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionFFT2D.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Testing built-in R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating R2C & C2R FFT plans for 2048 x 2048
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1267.922657 MPix/s (3.154767 ms)
...reading back GPU convolution results
...running reference CPU convolution
...comparing the results: rel L2 = 7.179421E-008 (max delta = 4.808732E-007)
L2norm Error OK
...shutting down
Testing custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1261.058719 MPix/s (3.171938 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.505000E-008 (max delta = 4.873593E-007)
L2norm Error OK
...shutting down
convolutionFFT2D.exe 2/2
Testing updated custom R2C / C2R FFT-based convolution
...allocating memory
...generating random input data
...creating C2C FFT plan for 2048 x 1024
...uploading to GPU and padding convolution kernel and input data
...transforming convolution kernel
...running GPU FFT convolution: 1588.813202 MPix/s (2.517602 ms)
...reading back GPU FFT results
...running reference CPU convolution
...comparing the results: rel L2 = 7.470519E-008 (max delta = 5.276085E-007)
L2norm Error OK
...shutting down
Test Summary: 0 errors
Test passed
convolutionSeparable.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionSeparable.exe] -Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Width x Height = 3072 x 3072
Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...
convolutionSeparable, Throughput = 3179.0263 MPixels/sec, Time = 0.00297 s, Size = 9437184 Pixels, NumDevsUsed = 1, Work
group = 0
Reading back GPU results...
Checking the results...
...running convolutionRowCPU()
...running convolutionColumnCPU()
...comparing the results
...Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
convolutionTexture.exe
[C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥convolutionTexture.exe] - Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Initializing data...
Running GPU rows convolution (10 identical iterations)...
Average convolutionRowsGPU() time: 1.427774 msecs; //3304.859282 Mpix/s
Copying convolutionRowGPU() output back to the texture...
cudaMemcpyToArray() time: 0.481161 msecs; //9806.674660 Mpix/s
Running GPU columns convolution (10 iterations)
Average convolutionColumnsGPU() time: 1.429637 msecs; //3300.552071 Mpix/s
Reading back GPU results...
Checking the results...
...running convolutionRowsCPU()
...running convolutionColumnsCPU()
Relative L2 norm: 0.000000E+000
Shutting down...
Test passed
cppIntegration.exe
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Hello World.
Hello World.
cudaDecodeD3D9.exe (runaway)
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
cudaDecodeGL.exe 1/2
[CUDA/OpenGL Video Decode]
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe
[cudaDecodeGL]: input file: <../../../3_Imaging/cudaDecodeGL/data/plush1_720p_10s.m2v>
VideoCodec : MPEG-2
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Progressive
Coded frame size: [1280, 720]
Display area : [0, 0, 1280, 720]
Chroma format : 4:2:0
Bitrate : 14116kBit/s
Aspect ratio : 16:9
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeGL.exe
> Device 0: <GeForce GTX 560 Ti >, Compute SM 2.1 detected
-> GPU 0: < GeForce GTX 560 Ti > driver mode is: WDDM
>> initGL() creating window [1280 x 720]
> Using CUDA/GL Device [0]: GeForce GTX 560 Ti
> Using GPU Device: GeForce GTX 560 Ti has SM 2.1 compute capability
Total amount of global memory: 1024.0000 MB
>> modInitCTX<NV12ToARGB_drvapi_x64.ptx > initialized OK
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx >
CUDA Kernel Function (0x0a4c6660) = < NV12ToARGB_drvapi >
>> modGetCudaFunction< CUDA file: NV12ToARGB_drvapi_x64.ptx >
CUDA Kernel Function (0x0a4c6210) = < Passthru_drvapi >
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
cudaDecodeGL.exe 2/2setTextureFilterMode(GL_NEAREST,GL_NEAREST)
ImageGL::CUcontext = 02047fd0
ImageGL::CUdevice = 00000000
reshape() glViewport(0, 0, 1280, 720)
[cudaDecodeGL] - [Frame: 0016, 00.0 fps, frame time: 98854.47 (ms) ]
[cudaDecodeGL] - [Frame: 0032, 736.9 fps, frame time: 1.36 (ms) ]
[cudaDecodeGL] - [Frame: 0048, 687.3 fps, frame time: 1.45 (ms) ]
[cudaDecodeGL] - [Frame: 0064, 788.9 fps, frame time: 1.27 (ms) ]
[cudaDecodeGL] - [Frame: 0080, 748.5 fps, frame time: 1.34 (ms) ]
[cudaDecodeGL] - [Frame: 0096, 724.5 fps, frame time: 1.38 (ms) ]
[cudaDecodeGL] - [Frame: 0112, 747.5 fps, frame time: 1.34 (ms) ]
[cudaDecodeGL] - [Frame: 0128, 738.9 fps, frame time: 1.35 (ms) ]
[cudaDecodeGL] - [Frame: 0144, 749.4 fps, frame time: 1.33 (ms) ]
[cudaDecodeGL] - [Frame: 0160, 764.7 fps, frame time: 1.31 (ms) ]
[cudaDecodeGL] - [Frame: 0176, 802.6 fps, frame time: 1.25 (ms) ]
[cudaDecodeGL] - [Frame: 0192, 766.6 fps, frame time: 1.30 (ms) ]
[cudaDecodeGL] - [Frame: 0208, 827.8 fps, frame time: 1.21 (ms) ]
[cudaDecodeGL] - [Frame: 0224, 774.1 fps, frame time: 1.29 (ms) ]
[cudaDecodeGL] - [Frame: 0240, 793.3 fps, frame time: 1.26 (ms) ]
[cudaDecodeGL] - [Frame: 0256, 742.5 fps, frame time: 1.35 (ms) ]
[cudaDecodeGL] - [Frame: 0272, 789.0 fps, frame time: 1.27 (ms) ]
[cudaDecodeGL] - [Frame: 0288, 803.1 fps, frame time: 1.25 (ms) ]
[cudaDecodeGL] - [Frame: 0304, 723.6 fps, frame time: 1.38 (ms) ]
[cudaDecodeGL] - [Frame: 0320, 728.5 fps, frame time: 1.37 (ms) ]
[cudaDecodeGL] statistics
Video Length (hh:mm:ss.msec) = 00:00:00.440
Frames Presented (inc repeats) = 326
Average Present Rate (fps) = 739.44
Frames Decoded (hardware) = 327
Average Rate of Decoding (fps) = 741.71
cudaDecodeD3D9.exe 1/2
Command Line Arguments:
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaDecodeD3D9.exe
[cudaDecodeD3D9]: input file: <../../../3_Imaging/cudaDecodeD3D9/data/plush1_720p_10s.m2v>
VideoCodec : MPEG-2
Frame rate : 30000/1001fps ~ 29.97fps
Sequence format : Progressive
Coded frame size: [1280, 720]
Display area : [0, 0, 1280, 720]
Chroma format : 4:2:0
Bitrate : 14116kBit/s
Aspect ratio : 16:9
> Using GPU Device 0: GeForce GTX 560 Ti has SM 2.1 compute capability
Total amount of global memory: 1024.0000 MB
>> modInitCTX<NV12ToARGB_drvapi_x64.ptx> initialized SUCCESS!
>> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx>
CUDA Kernel Function = <NV12ToARGB_drvapi, 0x04439d20>
>> modGetCudaFunction<NV12ToARGB_drvapi_x64.ptx>
CUDA Kernel Function = <Passthru_drvapi, 0x044398d0>
> VideoDecoder::cudaVideoCreateFlags = <1>Use CUDA decoder
cudaDecodeD3D9.exe 2/2
[cudaDecodeD3D9] - [Frame: 0016, 833.6 fps, time: 1.20 (ms) ]
[cudaDecodeD3D9] - [Frame: 0032, 1031.0 fps, time: 0.97 (ms) ]
[cudaDecodeD3D9] - [Frame: 0048, 843.8 fps, time: 1.19 (ms) ]
[cudaDecodeD3D9] - [Frame: 0064, 864.4 fps, time: 1.16 (ms) ]
[cudaDecodeD3D9] - [Frame: 0080, 850.9 fps, time: 1.18 (ms) ]
[cudaDecodeD3D9] - [Frame: 0096, 819.0 fps, time: 1.22 (ms) ]
[cudaDecodeD3D9] - [Frame: 0112, 844.0 fps, time: 1.18 (ms) ]
[cudaDecodeD3D9] - [Frame: 0128, 815.6 fps, time: 1.23 (ms) ]
[cudaDecodeD3D9] - [Frame: 0144, 821.0 fps, time: 1.22 (ms) ]
[cudaDecodeD3D9] - [Frame: 0160, 874.7 fps, time: 1.14 (ms) ]
[cudaDecodeD3D9] - [Frame: 0176, 960.4 fps, time: 1.04 (ms) ]
[cudaDecodeD3D9] - [Frame: 0192, 947.7 fps, time: 1.06 (ms) ]
[cudaDecodeD3D9] - [Frame: 0208, 896.7 fps, time: 1.12 (ms) ]
[cudaDecodeD3D9] - [Frame: 0224, 872.5 fps, time: 1.15 (ms) ]
[cudaDecodeD3D9] - [Frame: 0240, 922.7 fps, time: 1.08 (ms) ]
[cudaDecodeD3D9] - [Frame: 0256, 943.2 fps, time: 1.06 (ms) ]
[cudaDecodeD3D9] - [Frame: 0272, 936.6 fps, time: 1.07 (ms) ]
[cudaDecodeD3D9] - [Frame: 0288, 899.8 fps, time: 1.11 (ms) ]
[cudaDecodeD3D9] - [Frame: 0304, 901.0 fps, time: 1.11 (ms) ]
[cudaDecodeD3D9] - [Frame: 0320, 813.1 fps, time: 1.23 (ms) ]
[cudaDecodeD3D9] statistics
Video Length (hh:mm:ss.msec) = 00:00:00.375
Frames Presented (inc repeats) = 326
Average Present FPS = 868.73
Frames Decoded (hardware) = 327
Average Decoder FPS = 871.40
cudaEncode.exe (runaway)
Starting cudaEncode...
[ CUDA H.264 Encoder ]
argv[0] = C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥cudaEncode.exe
dct8x8.exedct8x8.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
CUDA sample DCT/IDCT implementation
===================================
Loading test image: barbara.bmp... [512 x 512]... Success
Running Gold 1 (CPU) version... Success
Running Gold 2 (CPU) version... Success
Running CUDA 1 (GPU) version... Success
Running CUDA 2 (GPU) version... 10459.499992 MPix/s //0.025063 ms
Success
Running CUDA short (GPU) version... Success
Dumping result to barbara_gold1.bmp... Success
Dumping result to barbara_gold2.bmp... Success
Dumping result to barbara_cuda1.bmp... Success
Dumping result to barbara_cuda2.bmp... Success
Dumping result to barbara_cuda_short.bmp... Success
Processing time (CUDA 1) : 0.209782 ms
Processing time (CUDA 2) : 0.025063 ms
Processing time (CUDA short): 0.170617 ms
PSNR Original <---> CPU(Gold 1) : 32.777073
PSNR Original <---> CPU(Gold 2) : 32.777046
PSNR Original <---> GPU(CUDA 1) : 32.777092
PSNR Original <---> GPU(CUDA 2) : 32.777077
PSNR Original <---> GPU(CUDA short): 32.749447
PSNR CPU(Gold 1) <---> GPU(CUDA 1) : 64.019310
PSNR CPU(Gold 2) <---> GPU(CUDA 2) : 71.777740
PSNR CPU(Gold 2) <---> GPU(CUDA short): 42.258053
Test Summary...
Test passed
dct8x8.exe / result
barbara_cuda_short.bmp
dct8x8.exe / result
barbara_cuda1.bmp
dct8x8.exe / result
barbara_cuda2.bmp
dct8x8.exe / result
barbara_gold1.bmp
dct8x8.exe / result
barbara_gold2.bmp
deviceQuery.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
deviceQuery.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce
GTX 560 Ti
deviceQueryDrv.exe 1/2
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥deviceQueryDrv.exe Starting...
CUDA Device Query (Driver API) statically linked version
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 560 Ti"
CUDA Driver Version: 5.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 48) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1800 MHz (1.80 GHz)
Memory Clock rate: 2050 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536,65535) 3D=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
deviceQueryDrv.exe 2/2
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
dwtHaar1D.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dwtHaar1D.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
source file = "../../../3_Imaging/dwtHaar1D/data/signal.dat"
reference file = "result.dat"
gold file = "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Reading signal from "../../../3_Imaging/dwtHaar1D/data/signal.dat"
Writing result to "result.dat"
Reading reference result from "../../../3_Imaging/dwtHaar1D/data/regression.gold.dat"
Test success!
Signal.dat
9.5012929e-001
2.3113851e-001
6.0684258e-001
4.8598247e-001
8.9129897e-001
・・・
Regression.gold.dat
Result.dat
dxtc.exe
C:¥ProgramData¥NVIDIA Corporation¥CUDA Samples¥v5.0¥bin¥win64¥Release¥dxtc.exe Starting...
GPU Device 0: "GeForce GTX 560 Ti" with compute capability 2.1
Image Loaded '../../../3_Imaging/dxtc/data/lena_std.ppm', 512 x 512 pixels
Running DXT Compression on 512 x 512 image...
16384 Blocks, 64 Threads per Block, 1048576 Threads in Grid...
dxtc, Throughput = 17.7004 MPixels/s, Time = 0.01481 s, Size = 262144 Pixels, NumDevsUsed = 1, Workgroup = 64
dxtc.exe 1/4
Checking accuracy...
Deviation at ( 9, 1): 0.791667 rms
Deviation at ( 99, 1): 1.041667 rms
Deviation at ( 12, 2): 0.937500 rms
Deviation at ( 90, 3): 0.166667 rms
Deviation at ( 38, 4): 1.916667 rms
Deviation at ( 34, 7): 1.687500 rms
Deviation at ( 57, 7): 0.458333 rms
Deviation at ( 100, 8): 2.416667 rms
Deviation at ( 30, 9): 2.375000 rms
Deviation at ( 31, 9): 0.770833 rms
Deviation at ( 58, 9): 0.791667 rms
Deviation at ( 29, 10): 0.020833 rms
Deviation at ( 79, 10): 1.833333 rms
Deviation at ( 13, 11): 1.041667 rms
Deviation at ( 4, 13): 8.562500 rms
Deviation at ( 28, 13): 0.562500 rms
Deviation at ( 90, 13): 0.708333 rms
Deviation at ( 25, 14): 0.520833 rms
Deviation at ( 69, 14): 0.770833 rms
Deviation at ( 87, 16): 0.708333 rms
Deviation at ( 90, 17): 1.041667 rms
Deviation at ( 24, 19): 0.916667 rms
Deviation at ( 25, 19): 0.625000 rms
Deviation at ( 26, 19): 1.041667 rms
Deviation at ( 55, 20): 4.791667 rms
Deviation at ( 20, 23): 1.541667 rms
Deviation at ( 99, 23): 3.312500 rms
Deviation at ( 45, 24): 18.104166 rms
Deviation at ( 8, 28): 0.895833 rms
dxtc.exe 2/4
Deviation at ( 21, 30): 1.562500 rms
Deviation at ( 115, 32): 24.104166 rms
Deviation at ( 2, 33): 0.854167 rms
Deviation at ( 102, 33): 2.250000 rms
Deviation at ( 50, 35): 26.958334 rms
Deviation at ( 68, 35): 11.937500 rms
Deviation at ( 115, 36): 0.458333 rms
Deviation at ( 12, 38): 2.166667 rms
Deviation at ( 40, 40): 0.270833 rms
Deviation at ( 86, 43): 0.604167 rms
Deviation at ( 116, 43): 0.125000 rms
Deviation at ( 43, 44): 2.250000 rms
Deviation at ( 54, 44): 4.791667 rms
Deviation at ( 46, 46): 2.875000 rms
Deviation at ( 116, 46): 0.604167 rms
Deviation at ( 4, 47): 0.708333 rms
Deviation at ( 117, 48): 0.937500 rms
Deviation at ( 23, 51): 3.520833 rms
Deviation at ( 11, 52): 0.041667 rms
Deviation at ( 67, 54): 5.687500 rms
Deviation at ( 26, 55): 0.854167 rms
Deviation at ( 21, 56): 5.000000 rms
Deviation at ( 24, 56): 0.562500 rms
Deviation at ( 30, 57): 0.937500 rms
Deviation at ( 21, 59): 2.541667 rms
Deviation at ( 120, 59): 0.104167 rms
Deviation at ( 112, 60): 1.125000 rms
Deviation at ( 77, 61): 1.083333 rms
dxtc.exe 3/4
Deviation at ( 114, 62): 4.958333 rms
Deviation at ( 78, 66): 0.541667 rms
Deviation at ( 106, 68): 0.375000 rms
Deviation at ( 16, 70): 3.104167 rms
Deviation at ( 10, 71): 0.937500 rms
Deviation at ( 108, 71): 0.354167 rms
Deviation at ( 0, 72): 0.854167 rms
Deviation at ( 118, 72): 5.562500 rms
Deviation at ( 11, 73): 0.541667 rms
Deviation at ( 68, 74): 1.937500 rms
Deviation at ( 70, 76): 1.791667 rms
Deviation at ( 124, 76): 3.354167 rms
Deviation at ( 103, 78): 0.375000 rms
Deviation at ( 127, 78): 0.541667 rms
Deviation at ( 108, 79): 0.083333 rms
Deviation at ( 120, 81): 0.541667 rms
Deviation at ( 43, 82): 24.979166 rms
Deviation at ( 67, 82): 3.125000 rms
Deviation at ( 78, 82): 2.437500 rms
Deviation at ( 123, 84): 0.541667 rms
Deviation at ( 127, 85): 0.187500 rms
Deviation at ( 122, 87): 0.083333 rms
Deviation at ( 124, 87): 0.541667 rms
Deviation at ( 127, 88): 0.229167 rms
Deviation at ( 93, 91): 0.666667 rms
Deviation at ( 115, 93): 0.083333 rms
Deviation at ( 69, 95): 1.875000 rms
Deviation at ( 106, 95): 1.125000 rms
dxtc.exe 4/4
Deviation at ( 107, 95): 3.708333 rms
Deviation at ( 13, 96): 1.354167 rms
Deviation at ( 115, 98): 0.187500 rms
Deviation at ( 118, 98): 0.187500 rms
Deviation at ( 116, 101): 0.187500 rms
Deviation at ( 78, 105): 0.541667 rms
Deviation at ( 67, 107): 0.708333 rms
Deviation at ( 74, 107): 0.375000 rms
Deviation at ( 65, 109): 0.770833 rms
Deviation at ( 89, 109): 0.708333 rms
Deviation at ( 118, 109): 3.854167 rms
Deviation at ( 67, 110): 1.083333 rms
Deviation at ( 88, 111): 0.208333 rms
Deviation at ( 64, 113): 0.708333 rms
Deviation at ( 84, 113): 0.333333 rms
Deviation at ( 88, 113): 0.187500 rms
Deviation at ( 84, 114): 1.666667 rms
Deviation at ( 66, 115): 0.770833 rms
Deviation at ( 19, 118): 5.270833 rms
Deviation at ( 76, 121): 0.104167 rms
Deviation at ( 70, 122): 0.708333 rms
Deviation at ( 91, 122): 0.208333 rms
Deviation at ( 71, 123): 0.854167 rms
Deviation at ( 75, 123): 0.854167 rms
Deviation at ( 61, 124): 0.937500 rms
Deviation at ( 91, 124): 0.270833 rms
RMS(reference, result) = 0.015488
Test passed
Summary
GTX560, Some samples does not work fine.
→ MUST support CUDA compute capability 3.0.
→ Requires GPU devices with compute SM 3.5 or higher.
This evaluation to be continued, For future reference.