gloabl proc(float arr,float brr){ float v; shared float shared[L]; shared[threadIdx.x] =...

__gloabl__ proc(float *arr,float *brr){ float v; __shared__ float shared[L]; shared[threadIdx.x] = brr[threadIdx.x]; __syncthreads();

if(threadIdx.x!=0){ v=arr[theadIdx.x]; v+=shared[threadIdx.x]; … }else{ … } … }

Modularity for HPC -WootinJ-

GPGPU in HPCMany SuperComputers support GPGPUTSUBAME2, Dawning Nebulae, …

Many non-functional concerns OptimizationHardware-awareFail-safe

Masayuki Ioki, Shumpei Hozumi,

Shigeru Chiba

Tokyo Institute of Technology

WootinJRuntime convertor from Java to CUDAGenerating CUDA code with runtime context

Delete some overheads in OOP DevirtualizationFlattening the structure of an objectto remove field access chains

Motivating exampleGPU has several types of memories.Global MemoryLarge but slow

Shared Memoryfast but small

__gloabl__ proc(float *arr, float *brr){ float v; if(threadIdx.x!=0){ v=arr[theadIdx.x]; v+= brr[theadIdx.x]; … }else{ … } … }

Global Memory

Shared Memory

SPStreaming Processors…

Streaming Multiprocessors

Non SharedMemory Ver.

__gloabl__ proc(float *arr,float *brr){ float v; __shared__ float shared[L]; shared[threadIdx.x] = arr[threadIdx.x]; __syncthreads();

if(threadIdx.x!=0){ v=shared[theadIdx.x]; v+=brr[threadIdx.x]; … }else{ … } … }

__gloabl__ proc(float *arr,float *brr){ float v; __shared__ float shared_a[L]; __shared__ float shared_b[L]; shared_a[threadIdx.x] = arr[threadIdx.x]; shared_b[threadIdx.x] = brr[threadIdx.x]; __syncthreads();

if(threadIdx.x!=0){ v=shared_a [theadIdx.x]; v+=shared_b[threadIdx.x]; }else{ … } ...}

brr -> SharedMemoryarr -> SharedMemory both -> SharedMemory

HPC Programmer hates OOP.OOP has rich modularitiesHowever OOP has many overheads.Dynamic method dispatchField access chain

class Calc{ Memory memA, memB …; @gloabl void proc(float[] arr, float[] brr) { … }}

Calc calc = new Calc();float[256] arr=..., brr= …; Dim3s dim3s = new Dim3s();dim3s.threadDim=new Dim3(256);

CUDAKicker　　　 .run(dim3s,calc,"proc",arr,brr);

Java bytecode

Java AST

CUDA code

Run on GPUs

WootinJ Sample Code

Memory memA = new SimpleSharedMem(256);Memory memB = new Memory();

memA.set(arr,theadIdx.x);memB.set(brr,theadIdx.x);

void set_memA(float[] arr,int i){ /* SimpleSharedMem method */ }void set_memB(float[] arr,int i){ /* Memory method body */ }…set_memA(arr,theadIdx.x);set_memB(brr,theadIdx.x);

DevirtualizationDynamic method dispatch to Staticfind all actual types from given objects Micro Benchmark

matrix product

WootinJ’s overheads is about 2 sec. JVM start up CUDA code generate and compile

TSUBAME2 Super Computer CPUs : Intel Xeon 2.93 GHz * 2 GPUs : NVIDIA Tesla M2050 * 3 Memory : 54GB

Compile-time check for Devirtualization

both types must be Strict-final

in @global method

assignment exp.

obj.m();

Return type must be Strict-final

return type of the method.Strict-final 1. primitive types are strict-final. 2. The class that is final class and all fields are strict-final, is strict-final. 3. An array that its element type is strictfinal, is strict-final.

@global is an annotation for

CUDA function.

My name is Wootin!

gloabl proc(float arr,float brr){ float v; shared float shared[L]; shared[threadIdx.x] =...

Documents

Cement Float Collar Float Shoe Catalog - drinol.com

I' FLOAT 'H' FLOAT · mermaid. MERMAID WORKSHOP LIFTER 1 TUGGER LIFTER 2 'A' FLOAT 'C' FLOAT 'D' FLOAT 'F' FLOAT 'G NORTH' 'I' FLOAT 'N' FLOAT 'M' FLOAT 'E' FLOAT 'G SOUTH' 'H' FLOAT

FG Magnetic Float Level Transmitter B0demoweb644.itopplus.com/Files/Name2/CONTENT415456704097.pdf · Float Sensing Element Magnet Float Sensing Rod INTRODUCTION The "Magnet Float

1 class Rectangle{ private: int numVertices; float xCoord, yCoord; public: void set(float x, float y, int nV); float area(); }; Inheritance Concept

BrR Load Rating Tools and Tips

Class Triangulo { float b,h; public: Rectangulo(float X1, float X2) { b = X1; h = X2; } float calcularArea() { return ( (float) b*h/2); } }; void main()

Why Do Countries Float the Way They Float

BRR Architecture

The Typed Racket Reference - Northwestern University · 2 days ago · Float-Zero Float-Complex Float Positive-Float Negative-Float Exact-Rational Float = Real The regions with a

geopower ds brr eng

Output devicesby brr

Computer fundamentals brr

BrR Load Rating Tools and Tips Load Rating Tools and Tips.pdf · BrR Load Rating Tools and Tips. BrR Report Tool. BrR will launch Internet Explorer and create a new tab for a report

BrR County Rating Example - Indiana County Examples... · BrR County Rating Example Calculate Live Load Distribution Factor BrR uses beam information and bridge geometry to compute

Drill Pipe Float Valves, Pullers, Baffle Plates & Float Subs · Drill Pipe Float Valves, Pullers, Baffle Plates & Float Subs . Integrated Approach ... • Available Sizes 1R thru

8b Bridge Load Rating - BrR Load Rating Tools and Tips Load Rating‐BrR Tools and Tips 2/16/2017 5 BrR Load Rating Tools and Tips Materials The BrR model should best represent the

Annual Report OEAE - Academic Year 2011-2012 [BRR]

Free Float Steam Trap Series - TLV - Global Gateway€œFree Float ® ” Free Float ® Steam Traps Revolutionizing Fluid Control Technology “Free Float ® ” Free Float ® Principle

Manual del usuario de 2003 FOX FORX - RIDEFOX€¦ · Manual del usuario de 2003 FOX FORX FLOAT 80R - FLOAT 80RL - FLOAT 80RLC FLOAT 100R - FLOAT 100RL - FLOAT100RLC ... Visa, MasterCard

BRR PRC-2090 Tactical HF radio system€¦ · BRR Features include Software Defined Transceiver Core . Advanced Digital Signal Processing (DSP) Lightweight and compact design. Detachable

__gloabl__ proc(float *arr,float *brr){ float v; __shared__ float shared[L]; shared[threadIdx.x] =...

gloabl proc(float arr,float brr){ float v; shared float shared[L]; shared[threadIdx.x] =...