PyCuda: Come sfruttare la potenza delle schede video nelle applicazioni python

PyCUDA: Harnessing the power of GPU with Python

PyCon 4 – Florence 2010 – Fabrizio Milo

1. Why a GPU ? 2. How does It works ? 3. How do I Program it ? 4. Can I Use Python ?

Talk Structure

WHY A GPU ?

APPLICATIONS & DEMOS

Why GPU?

1. Why a GPU ? 2. How does it works ? 3. How do I Program it ? 4. Can I Use Python ?

Talk Structure

How does it works ?

Control

ALU Control

CPU GPU

Compute Unified Device Architecture

A Parallel Computing Architecture for NVIDIA GPUs

Direct X Compute

Execution Model

CUDA Device Model

EXECUTION MODEL

Thread

Smallest unit of logic

A Block

A Group of Threads

A Grid

A Group of Blocks

One Block can have many threads

One Grid can have many blocks

DEVICE MODEL The hardware

Scalar Processor

Many Scalar Processors

+ Register File

+ Shared Memory

Multiprocessor

Device

Real Example: 10-Series Architecture

"  240 Scalar Processor (SP) cores execute kernel threads "  30 Streaming Multiprocessors (SMs) each contain "  8 scalar processors "  1 double precision unit "  Shared memory

Software Hardware

Thread

Scalar Processor

Thread Block Multiprocessor

Grid Device

Global Memory

Host - Device

Global Memory CPU

Host – Multi Device

Software Hardware

Thread

Scalar Processor

Thread Block Multiprocessor

Grid Device

__global__ void multiply_them( float *dest, float *a, float *b )

{ const int i = threadIdx.x; dest[i] = a[i] * b[i];}

Kernel

Thread

Kernel

Thread

Kernel

__global__ void kernel( … ){ const int idx =

blockIdx.x * blockDim.x + threadIdx.x;…

Kernel

How do I Program it ?

.cubin CPU GPU

Kernel Main Logic

How do I Program it ?

.cubin

Kernel Main Logic

..bin .cubin

Host - Device

Global Memory CPU

Allocate Memory

cudaMalloc( pointer, size )

Copy to device

cudaMemcpy( dest, src, size, direction)

Kernel Launch

Kernel<<< # blocks, # threads >> (*params)

Get Back the Results

Kernel<<< # blocks, # threads >> (*params)

Error Handling

If(cudaMalloc( pointer, size ) != cudaSuccess){ handle_error()}

And soon it becomes …

if (cudaMemcpy( dest, src, size, direction ) == cudaSuccess) {}

If (Kernel<<< # blocks, # threads >> (*params) != cudaSuccess){ handle_error()}

If( cudaMemcpy( dest, src, size, direction) != cudaSuccess) { }

And soon it becomes …

= PYCUDA

& ANDREAS KLOCKNER

PyCuda Philosopy

Provide Complete Access

PyCuda Philosopy

AutoMatically Manage

Resources

PyCuda Philosopy

Check and Report Errors

PyCuda Philosopy

Cross Platform

PyCuda Philosopy

Allow Interactive

PyCuda Philosopy

NumPy Integration

NUMPY - ARRAY

import numpy

my_array = numpy.array([1,] * 100)

1 1 1 1 1 1

import numpy

my_array = numpy.array([1,] * 100)

my_array[3] = 0

0 1 1 1 1 1

PyCuda: Workflow

Memory Allocation

cuda.mem_alloc( size_bytes )

Memory Copy

gpu_mem = cuda.mem_alloc( size_bytes )

cuda.memcpy_htod( gpu_mem, cpu_mem )

Kernel

gpu_mem = cuda.mem_alloc( size_bytes )

cuda.memcpy_htod( gpu_mem, cpu_mem )

SourceModule(“””__global__ void multiply_them( float *dest, float *a,

float *b ){ const int i = threadIdx.x; dest[i] = a[i] * b[i];}”””)

Kernel Launch

mod = SourceModule(“””__global__ void multiply_them( float *dest, float *a,

float *b ){ const int i = threadIdx.x; dest[i] = a[i] * b[i];}”””)

multiply_them = mod.get_function(“multiply_them”)multiply_them ( *args, block=(30, 64, 1))

DEMO Hello Gpu

GPUARRAY

gpuarray

gpuarray.to_gpu(numpy array)

numpy array = gpuarray.get()

PyCuda: GpuArray

gpuarray.to_gpu(numpy array)

numpy array = gpuarray.get()

PyCuda: GpuArray

+, -, !, /, fill, sin, exp, rand, basic indexing, norm, inner product …

PyCuda: GpuArray: ElementWise

from pycuda.elementwise import ElementwiseKernel

lincomb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ”

lin comb = ElementwiseKernel( ” float a , float !x , float b , float !y , float !z”, ”z [ i ] = a !x[ i ] + b!y[i ] ”

c gpu = gpuarray. empty like (a gpu) lincomb (5, a gpu, 6, b gpu, c gpu)

assert la . norm((c gpu ! (5!a gpu+6!b gpu)).get()) < 1e!5

Meta-Programming

__kernel_template__ = “””__global__ void kernel( args ){

for (int i=0; i={{ iterations }}; i++){ {{operations}}}

}”””

See for example jinja2

Meta-Programming

Generate Source !

Performances ?

DEMO mandelbrot

PyCuda: Documentation

PyCuda

WebSite: http://mathema.tician.de/software/ pycuda

License: X Consortium License

(no warranty, free for all use)

Dependencies: Python 2.4+, numpy, Boost

In the Future …

OPENCL

THANK YOU & HAVE FUN !

PyCuda: Come sfruttare la potenza delle schede video nelle applicazioni python

Technology

schede tecniche 2018 02

ANALISI • SCHEDE FACILITATELOGICA

schede tecniche ENG

SCHEDE ITA ENG - omergroup.com

TexStyle 7 - schede-tecniche.it

High Performance Computing with Python (4 hour tutorial) · High Performance Computing with Python (4 hour tutorial) ... (calc_pure_python) ... pyCUDA demos • # ./pyCUDA

Schede Bibliografiche Complete

SFRUTTARE I MICROSOFT AZURE MOBILE SERVICES CON XAMARIN.FORMS

Schede tecniche copriwater CESAME

Schede tecniche home definitivo

Programming GPUs with PyCuda - SciPyconference.scipy.org/static/wiki/scipy09-pycuda-tut.pdfProgramming GPUs with PyCuda Nicolas Pinto (MIT) ... PyCuda Tutorial. IntroGPUsScriptingHands-on

La Farmacogenetica in oncologia: come sfruttare le ...documenti.fullday.com/public/Biologia/Presentazioni/7_9-1430_DelRe.pdf · La Farmacogenetica in oncologia: come sfruttare le

Luminae Schede Tecniche

PyCUDA: Even Simpler GPU Programming with Python · PDF fileAndreas Kl ockner PyCUDA: Even Simpler GPU Programming with Python. ... Python Code GPU Code GPU Compiler ... Even Simpler

Python + CUDA = PyCUDA 15 -pycuda.… · ¿ Qué es Python? Lenguaje multipropósito (Web, GUI, Scripting, cintífico). Orientado a Objetos. Interpretado. Producido para tener alta

Schede di scienze!

Schede tecniche ars imago

SCHEDE TECNICHE - SPECCHI TECHNICAL DRAWINGS - …

Comtec schede

COME SFRUTTARE L’OPPORTUNITA’ Realizzazione siti Web marketing APP LOPPORTUNITA INTERNET / MOBILE